Skip to content Skip to sidebar Skip to footer

How To Compute The Similarity Between Lists Of Features?

I have users and resources. Each resource is described by a set of features and each user is related to a different set of resources. In my particular case, the resources are web p

Solution 1:

If each user is represented as a set of document-interaction vectors you can define the similarity of a pair of users as the similarity of the pair of document-interaction vector sets that represent the users.

You say you can get a similarity matrix of the documents. Then assume that user U1 visited documents D1, D2, D3, and user U2 visited documents D1,D3,D4. You would have two sets of vectors S1 = {U1(D1), U1(D2), U1(D3)} for user 1 and S2 = {U2(D1), U2(D3), U2(D4)}. Note that because each user's interaction with a document is different they are represented as such. If I understand correctly, the elements of these sets should correspond to the respective lines in the matrix of each user.

The similarity between these two sets can be computed in many different ways. One option is the average pair-wise similarity: You iterate over all pairings of the elements from each set, compute the document similarity of the pair, and average over all pairs.

Solution 2:

You could use the mean of the features in each user's set of resources seems a natural way to summarize a user. numpy.mean with an appropriate axis argument should get you the mean, then compute the Euclidean distance between the resulting "user vectors" (of length n_features) as you did before between document vectors.

Solution 3:

I would look at creating multiple dimensions of documents, so those documents that are visited at certain times of day, divide up by morning and night, and then plot users that are nite owls and early birds.

With any number of dimensions you can create a matrix of users, and use distance between users to help.

Post a Comment for "How To Compute The Similarity Between Lists Of Features?"