Python - How To Speed Up Cosine Similarity With Counting Arrays

January 26, 2024 Post a Comment

I need to compute the cosine similarity function across a very big set. This set represents users and each user as an array of object id. An example below: user_1 = [1,4,6,100,3,1]

Solution 1:

Here's how you can get your short and concise cosine similarity vectors without looping over a million entries:

user_1 = [1,4,6,100,3,1]
user_2 = [4,7,8,3,3,2,200,9,100]

# Create a list of unique elements
uniq = list(set(user_1 + user_2))

# Map all unique entrees in user_1 and user_2
duniq = {k:0 for k in uniq}

def create_vector(duniq, l):
    dx = duniq.copy()
    dx.update(Counter(l)) # Count the values
    return list(dx.values()) # Return a list

u1 = create_vector(duniq, user_1)
u2 = create_vector(duniq, user_2)

# u1, u2:

u1 = [2, 0, 1, 1, 1, 0, 0, 0, 0, 1]
u2 = [0, 1, 2, 1, 0, 1, 1, 1, 1, 1]

You can then feed these 2 vectors into spatial.distance.cosine

Baca Juga

Python: Read Multiple Lines From A File And Make Instances Stored In An Dictionary
Transform Irregular Quadrilateral To Rectangle In Python Matplotlib
Creation Of Array Of Arrays Fails, When First Size Of First Dimension Matches

Learn Python Tutorials

Python - How To Speed Up Cosine Similarity With Counting Arrays

Solution 1:

Post a Comment for "Python - How To Speed Up Cosine Similarity With Counting Arrays"