Python - How To Speed Up Cosine Similarity With Counting Arrays
I need to compute the cosine similarity function across a very big set. This set represents users and each user as an array of object id. An example below: user_1 = [1,4,6,100,3,1]
Solution 1:
Here's how you can get your short and concise cosine similarity vectors without looping over a million entries:
user_1 = [1,4,6,100,3,1]
user_2 = [4,7,8,3,3,2,200,9,100]
# Create a list of unique elements
uniq = list(set(user_1 + user_2))
# Map all unique entrees in user_1 and user_2
duniq = {k:0 for k in uniq}
def create_vector(duniq, l):
dx = duniq.copy()
dx.update(Counter(l)) # Count the values
return list(dx.values()) # Return a list
u1 = create_vector(duniq, user_1)
u2 = create_vector(duniq, user_2)
# u1, u2:
u1 = [2, 0, 1, 1, 1, 0, 0, 0, 0, 1]
u2 = [0, 1, 2, 1, 0, 1, 1, 1, 1, 1]
You can then feed these 2 vectors into spatial.distance.cosine
Post a Comment for "Python - How To Speed Up Cosine Similarity With Counting Arrays"