Mayhaps you have used the Facebook app Likeness. It’s a fluff app, but has wide appeal since it does two things most people like: easy quizzes and comparisons with our friends. The graphic design that went into the app is a bit low-scale, but it gets the job done. If you haven’t used it, the concept is simple. You are presented with a quiz topic, like “What’s your addiction?” You are then presented with ten items that you must rank in the order specified by the question page (usually most to least favorite, or whatever). Once you have ranked the ten items, you are shown a screen that easily allows you to goof up and spam all your friends. But after that, it produces some sort of similarity score between you and all your friends who have taken it. I’ve never had a similarity below 46% and never one above 98%.

But it got me thinking, how exactly are they measuring this similarity? The most naive thing you could do would be simple accuracy: count the number of items placed in the same rank by both users. I suspect this would result in accuracies that average in the low end of the spectrum. It also completely ignores the fact that we might rank 8 items in exactly the same order, except that you put my bottom item in second place, pushing 7 of the items down the list.

A slightly less naive thing might be to find all ordering pairs and compute accuracy across those. Given items {A, B, C, … }, construct all pairs of orderings in a user’s ranking. Let’s use three items as an example: {apples, oranges, bananas}. User 1 ranks them: {oranges, bananas, apples}. User 2 ranks them: {apples, oranges, bananas}.

The ordered pairs for user 1 are:

  • (oranges, bananas)
  • (oranges, apples)
  • (bananas, apples)

The ordered pairs for user 2 are:

  • (apples, oranges)
  • (apples, bananas)
  • (oranges, bananas)

The two users have only one match: (oranges, bananas).  We then compute the similarity as the number of matches divided by the total number of unique pairs, for a similarity score of 1/5 = 20%. That seems fairly reasonable. The complexity of this technique is also very manageable at O(n2) unless you get into truly massive ranking lists. For Likeness, this would be ((n-1)2 + n - 1)/2 = 45 comparisons per user, which should be fine even for users with the maximum number of friends (currently 5000, I think).  But how good of a measure is it?

The problem is, not very.  Consider the case where we have ten examples.  If you get the top ranked item correct, automatically all nine ordered pairs for that item are counted towards your similarity score.  But if the two users match on everything except that user 2 put user 1’s top item at the bottom, the accuracy will drop instantly to 67%!  So this method heavily weights the items at the top of the list.

  • User 1:  { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }
  • User 2:  { 10, 1, 2, 3, 4, 5, 6, 7, 8, 9 }

You might think that Spearman’s rank correlation coefficient is perhaps a better solution to this mess.  It basically looks at the correlation between two lists once they have been converted into rankings (rather than looking directly at the correlation between the actual values).  Spearman’s is also heavily affected by items from the top of the ranking being re-ordered.  So instead of having a 67% similarity based on the method above, Spearman’s would give a correlation of 0.4545.  Since the Spearman correlation can range between -1 and +1, we would need to divide the correlation by 2 and add 0.5 to get a similarity score.  If we do that, we’re up to 73%.  Spearman has the additional advantage of being O(n) complexity.

Should we stop there?  Is that a reasonable enough similarity score for this case?  It heavily weights the front of the list, but that is probably good behavior.  People place more emphasis on what they pick first, and the bottom of the list might just be a jumbled mess of things that don’t apply or matter.