Skip to content Skip to sidebar Skip to footer

Python - How To Speed Up Cosine Similarity With Counting Arrays

I need to compute the cosine similarity function across a very big set. This set represents users and each user as an array of object id. An example below: user_1 = [1,4,6,100,3,1]

Solution 1:

Here's how you can get your short and concise cosine similarity vectors without looping over a million entries:

user_1 = [1,4,6,100,3,1]
user_2 = [4,7,8,3,3,2,200,9,100]

# Create a list of unique elements
uniq = list(set(user_1 + user_2))

# Map all unique entrees in user_1 and user_2
duniq = {k:0 for k in uniq}

def create_vector(duniq, l):
    dx = duniq.copy()
    dx.update(Counter(l)) # Count the values
    return list(dx.values()) # Return a list

u1 = create_vector(duniq, user_1)
u2 = create_vector(duniq, user_2)

# u1, u2:

u1 = [2, 0, 1, 1, 1, 0, 0, 0, 0, 1]
u2 = [0, 1, 2, 1, 0, 1, 1, 1, 1, 1]

You can then feed these 2 vectors into spatial.distance.cosine

Post a Comment for "Python - How To Speed Up Cosine Similarity With Counting Arrays"