It looks like some of the top players in the Netflix Prize competition have teamed up and finally broke the 10% improvement barrier. I know I’m a few days late on this, though not because I didn’t see when it happened. I’ve been battling an ear infection all week and it has left me dizzy, in pain, and with no energy when I get home from work. I hesitated before even posting anything about this, since there is little I can add at this point that hasn’t already been said. I’ll just share a few thoughts and experiences for posterity and leave it at that. I’m also going to eventually make the point that recommender systems are operating under a false assumption, if you read this all the way through. :)
I competed for the prize for a bit, trying out a few ideas with support vector machines and maximum margin matrix factorization [pdf] that never panned out. We were getting about a 4% improvement over Cinematch, which put us way down the list. Going further would mean investing a lot of effort into implementing other algorithms, working out the ensemble, etc., unless we came up with some novel algorithm that bridged the gap. That didn’t seem likely, so I stopped working on it just after leaving school. I learned a lot about machine learning, matrix factorization, and scaling thanks to the competition, so it was hardly a net loss for me.
The one thing I regret is that the prize encouraged me and my advisor to spend more effort on the competition than we should have, which in turn meant we didn’t spend more time working on something tangibly productive for research. Bluntly put, I think if we hadn’t wasted so much time on the competition, we could have worked on a different research problem more likely to produce a paper. The lack of published research on my CV was the main reason I didn’t move on to get my PhD at CMU (at least, that’s what I was told by those close to the decision). Hindsight is 20/20, and at the time, the shining glory of winning a million bucks and fame was delicious. It also seemed like we had ideas that “maybe kinda sorta” were going somewhere. That turned out to not be the case, but when admissions committees look at research experience, negative results = no results.
Many people have lauded the competition by saying that it has encouraged research in collaborative filtering and brought public attention to the field. I was one of those people. Others have criticized it for not focusing more on what people actually care about when using recommender systems — getting something useful and having a good experience! And yes, Daniel Lemire, I’m thinking of you. :) But I’m convinced that Daniel is right. I remember reading in the literature that a 10% improvement is about what’s needed for someone to actually be able to notice a difference in recommender systems. So maybe people will notice a slight improvement in the Netflix recommendations if these ideas are ever implemented. Which is another problem — most of the stuff that led to winning the prize is so computationally expensive, it’s not really feasible for production. Netflix recently released some improvements, and I didn’t notice a damned thing. They still recommended me Daft Punk’s Electroma, which was a mind-numbing screen-turd. And I must have seen every good sci-fi movie ever made, because there are no more recommendations for me in that category. I have trouble believing that.
The point of a recommender system really shouldn’t be just to guess what I might happen to rate something at a given time. The fact that introducing time makes such a big difference in improving performance in the competition seems like a ginormous red flag to me. Sure I can look back in time and say “on day X, people liked movies about killing terrorists.” The qualifying set in the competition asked you to predict the rating for a movie by a user on a given date in the past. Remember what I said about hindsight being 20/20? How about you predict what I will rate a movie this coming weekend. See the problem?
I will sound the HCIR trumpets and say that what recommender systems should really be looking at is improving exploration. When I go looking for a movie to a watch, or a pair of shoes to buy, I already know what I like in general. Let me pick a starting point and then show me useful ways of narrowing down my search to the cool thing I really want. Clerk dogs is a good first step on this path, though I think we’re going to have to move away from curated knowledge before this is going to catch fire.
Maybe I have this all wrong. Maybe we need to discard the notion of recommender systems, since they are operating under the wrong premise. We don’t need a machine to recommend something it thinks we’ll like. We need a machine that will help us discover something we’ll like. We need to be making discovery engines. (Replace recommender system with search engine in most of what I just said and you’ll find that I have really been sounding the HCIR trumpets.)