Mayhaps you have used the Facebook app Likeness. It’s a fluff app, but has wide appeal since it does two things most people like: easy quizzes and comparisons with our friends. The graphic design that went into the app is a bit low-scale, but it gets the job done. If you haven’t used it, the concept is simple. You are presented with a quiz topic, like “What’s your addiction?” You are then presented with ten items that you must rank in the order specified by the question page (usually most to least favorite, or whatever). Once you have ranked the ten items, you are shown a screen that easily allows you to goof up and spam all your friends. But after that, it produces some sort of similarity score between you and all your friends who have taken it. I’ve never had a similarity below 46% and never one above 98%.
But it got me thinking, how exactly are they measuring this similarity? The most naive thing you could do would be simple accuracy: count the number of items placed in the same rank by both users. I suspect this would result in accuracies that average in the low end of the spectrum. It also completely ignores the fact that we might rank 8 items in exactly the same order, except that you put my bottom item in second place, pushing 7 of the items down the list.
A slightly less naive thing might be to find all ordering pairs and compute accuracy across those. Given items {A, B, C, … }, construct all pairs of orderings in a user’s ranking. Let’s use three items as an example: {apples, oranges, bananas}. User 1 ranks them: {oranges, bananas, apples}. User 2 ranks them: {apples, oranges, bananas}.
The ordered pairs for user 1 are:
- (oranges, bananas)
- (oranges, apples)
- (bananas, apples)
The ordered pairs for user 2 are:
- (apples, oranges)
- (apples, bananas)
- (oranges, bananas)
The two users have only one match: (oranges, bananas). We then compute the similarity as the number of matches divided by the total number of unique pairs, for a similarity score of 1/5 = 20%. That seems fairly reasonable. The complexity of this technique is also very manageable at O(n2) unless you get into truly massive ranking lists. For Likeness, this would be ((n-1)2 + n - 1)/2 = 45 comparisons per user, which should be fine even for users with the maximum number of friends (currently 5000, I think). But how good of a measure is it?
The problem is, not very. Consider the case where we have ten examples. If you get the top ranked item correct, automatically all nine ordered pairs for that item are counted towards your similarity score. But if the two users match on everything except that user 2 put user 1’s top item at the bottom, the accuracy will drop instantly to 67%! So this method heavily weights the items at the top of the list.
- User 1: { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }
- User 2: { 10, 1, 2, 3, 4, 5, 6, 7, 8, 9 }
You might think that Spearman’s rank correlation coefficient is perhaps a better solution to this mess. It basically looks at the correlation between two lists once they have been converted into rankings (rather than looking directly at the correlation between the actual values). Spearman’s is also heavily affected by items from the top of the ranking being re-ordered. So instead of having a 67% similarity based on the method above, Spearman’s would give a correlation of 0.4545. Since the Spearman correlation can range between -1 and +1, we would need to divide the correlation by 2 and add 0.5 to get a similarity score. If we do that, we’re up to 73%. Spearman has the additional advantage of being O(n) complexity.
Should we stop there? Is that a reasonable enough similarity score for this case? It heavily weights the front of the list, but that is probably good behavior. People place more emphasis on what they pick first, and the bottom of the list might just be a jumbled mess of things that don’t apply or matter.






5 comments
Comments feed for this article
12 May 2008 at 21:57:37
jhumphries
I would imagine that their algorithm could be less intelligent and simpler: for instance, “likeness” could be inversely related to the magnitude of the sum of distances. If someone ranks an item in the exact same place as another then the distance is zero. If there are ten items, you rank item A as #1, the other ranks item A as #10, then the distance is 9. The maximum sum of variances (I think, no proof yet) would be (n*n)/2 where n is the number of items that are being ranked.
We’ll define the score as the total sum of variances between the two ordered lists. A score of zero (no difference) yields 100% likeness. A score of (n*n)/2 (i.e. maximum variance) is 0%. All scores in between can be easily mapped to values between 0 and 100%. This also means there are only (n*n)/4 possible discrete scores.
This would normally be O(n*n), but a hash table could be used to lookup distances which yields o(n). This isn’t remotely as sophisticated as what you describe, and in the case you mention (lists are identical except one person’s favorite is the other’s least favorite) yields a score of 64%. I don’t think that is particularly unreasonable, however. You subjectively decide that the two lists’ scores should be very similar, but the fact that this person’s favorite is the other’s least favorite deserves to bring the score down considerably I would think.
This method, too, heavily weights items towards the top of the list - or more accurately, heavily weights items as they approach one end of the list or the other. I stated above, I think that makes sense.
There is probably a bit of subjectivity involved in deciding which algorithm is “best”. I wouldn’t be surprised, however, if the developers of the app put decidedly less thought into the algorithm than you did though :)
12 May 2008 at 23:02:23
Jason Adams
You’re certainly right that my rejection of lower scores for the top/bottom mismatch are subjective. Of course, that’s the name of the game in rankings, so it’s all a matter of what exactly you want to capture in your likeness metric. The one you propose is probably just as good as any.
My interest in this area stems from other things, as you might imagine. Namely: machine translation evaluation and recommender systems. I’ll probably talk about those more in another post, though.
30 May 2008 at 12:40:59
Robin Dawes
Another technique I have seen suggested is to find the longest common subsequence.
Yet another is to count the number of swaps of adjacent elements you have to make in order to transform one permutation into the other.
7 July 2008 at 10:09:14
Pierre Bru
maybe the only way to guess the method used by the likeness app dev is to perform a lot of testings against a quiz made by a friend for which the order of the answers is known.
BTW, I found some articles on the net telling that a perfect match only gives 99%…
7 July 2008 at 21:27:16
Jason Adams
Yeah doing that would at least narrow down the possibilities. Anyone bored enough to do this? I’m not at the moment.. hehe