## The limits of collaborative filtering?

Posted: 25 June 2008 in Uncategorized
Tags: , , , , , , , ,

Peter Turney posted recently on the logic of attributional and relational similarity. Attributes are features or characteristics of a single entity. Relations describe some connection between two entities, such as a comparison. We’ll denote a relation between two entities A and B as A:B. A relational similarity between two groups A, B and C,D will be denoted as A:B::C:D. This is the standard SAT-style proportional analogy: A is to B as C is to D. An attributional similarity indicates that two entities share the same attribute (this could be to varying degrees, but in the boolean case, it’s either shared or it isn’t). An attributional similarity between A and B will be denoted as A~B. This is like saying $\forall$ Z, A:Z::B:Z. I’m just giving a brief introduction here, but this is all in Peter’s post to greater detail, so I recommend reading that for more information.

This got me thinking about collaborative filtering (because, well, I’ve been thinking about it all the time for the past two years). Collaborative filtering exploits similarities between users to predict preferences for items the user has not seen. In the case of movie recommendations, like with Netflix, this means that users can recommend movies they have seen to similar users who have not seen those movies. There are many ways of doing this. At the heart of it, however, is this notion of relational and attributional similarity.

A: base user
B: movies rated by A
C: some set of other users
D: movies rated by C

We can’t just say that A:B::C:D, since A and C may be nothing like each other. If we constrain it to users with attributional similarity, then we arrive at the definition of collaborative filtering: A~C & A:B::C:D. Logically, it follows that B~D also holds. See Peter’s post for some properties of proportional analogies that make this more clear.

In the non-binary case, we can choose C to be a set of users whose similarity varies with A. Also, our measure of what exactly constitutes similarity can be any number of different metrics. From here, it seems pretty clear that the limit of collaborative filtering is bounded by the attributional similarity A~C. If (A~C) & (A = C) (complete similarity) then it follows that B = D or else A $\neq$ C. If A $\neq$ C then does it logically follow that B $\neq$ D? I guess it depends on the similarity metric and how we are defining the differences in the sets of movies and the differences in the sets of users.

I wonder if there has been any work done in this area? I wasn’t able to find anything, but maybe I’m just not searching for the right thing. Is it even worth pursuing?