This is research I did a while ago and presented Monday to fulfill the requirements of my Masters degree.  The presentation only needed to be about 20 minutes, so it was a very short intro.  We have moved on since then, so when I say future work, I really mean future work.  The post is rather lengthy, so I have moved the main content below the jump.

Recommender System Types

Recommender systems come in two main flavors: content-based and collaborative filtering. Content-based recommenders use information about the items themselves to recommend new items to a user.  The intuition behind this approach is that users tend to like or have a need for similar things.  Of course, this breaks down when I buy Great Aunt Edna a puppy-embroidered sweater for Christmas and Amazon starts recommending me ugly sweaters left and right.  The alternative is collaborative filtering, which uses similarity between users to recommend new items.  Whereas content-based recommenders are limited to only finding items similar to those you “enjoyed” in the past, collaborative filtering can present novel things that users like you also liked.  Collaborative filtering suffers from the problem that you have to provide the system enough data to find other users similar enough to you that it can produce reasonable predictions.  This is the user cold start problem.  Likewise, if a new item is added to the system and no one has rated it, collaborative filtering can’t recommend it either.  This is the item cold start problem.  Content-based recommenders are more robust to this.  However, the user must rate at least one item for a content-based recommender to get started (otherwise, you just recommend popular items or base it on the user demographics).  If there are no similar items, content-based fails as well.  That is less of a problem for collaborative filtering.

In collaborative filtering, the users and items can be thought of as a matrix.  The rows represent users and the columns represent items.  The value in each cell is either zero, if no rating has been given, or the value of the rating assigned by the user.  It is typically asserted that items in the matrix are not missing at random.  This means there is some function that maps a user to the items they will rate.  There will be items they will never rate because those items lie outside of the area of interests of the user.  This is fairly intuitive.  I will never read a Harlequin romance, and so I would never rate a book of that type in a book recommender system.  Also, a system can never present every item a user could rate to the user.  Users get bored entering ratings and will stop after a while.  If these two facts weren’t true, we wouldn’t need a recommender system in the first place.  This means that there are items a user could still rate if given the opportunity.

The ratio of the total items in the system to the items a user will rate is usually very large.  There are many times more missing values than present for the average user.  So the collaborative filtering matrix is very sparse.  This has the advantage of allowing us to use sparse matrix techniques to speed up some operations on the matrix, but it has the disadvantage normally associated with lack of data:  it is more difficult to build reliable models with less information.  The more data you have, the better off your machine learning algorithm is.

The Stacked Agents Model

The Stacked Agents Model (Adams et al, 2007)

Our idea was to supplement the user-provided information in the collaborative filtering matrix with machine-generated predictions.  In this way, we can turn a sparse matrix into a full one.  To generate the predictions, we construct a content-based model for each user.  Each item the user has rated is assigned a feature vector \vec{x}.  The features depend on the domain, but if we’re recommending songs they might be artist, genre, record sales, year released, etc.  The label y is the rating the user gave that item.   The feature weights can be chosen in a number of ways.  We used tf-idf to weight the features we collected (see the paper for details). A content-based agent for each user constructs a model and learns the user’s preferences.  It then predicts a rating for each and every item the user did not rate.

Returning to the idea of missing at random, we also would like an idea of how confident the system is that the user would rate the item at all.  For this we construct a confidence model for the user.  Again, we can use the content-based feature vector \vec{x}, but instead of the label y being the rating, it is positive or negative (binary) to indicate whether the user has rated it or not.  The confidence model was the weakest part of our work and improving that is important moving forward.  The problem with this approach is that we tell the learner that an unrated item is a negative example.  This is true of some items, but not for the cases where the user would rate the item but hasn’t taken the time or had the chance.  We automatically bias our learner against the very examples we are seeking to learn.  So like I said, this part needs work.  There was a track at last year’s SIGKDD on predicting whether a user would rate an item, if you’re interested in more about this topic.

We constructed confidence agents for each user to predict the confidence the system has that the user would rate the item.  These confidence predictions are real values in the interval [0,1] (not confident, confident).  Once we have generated a prediction and confidence value for an unrated item, we can combine those into a new rating score:  \hat{r}_{u,i} = p_{u,i} q_{u,i}, where p_{u,i} is the predicted rating for user u and item i and q_{u,i} is the confidence score.  These form the basis of the collaborative filtering matrix in the stacked agents model.  Stacking is the process of using predictions from a previous machine learning method as the training input for another.  The combined observed data (user ratings) and machine predicted data are used to train each user’s stacked agent.

For details about the implementation, please see the paper.

Conclusions

While the results weren’t state of the art overall, we did manage to show that this method could improve plain content-based recommendations.  I see two main paths for future research on this method.  First of all, the confidence model was very weak and needs to be improved.  Secondly, we are combining collaborative filtering with content-based recommenders, to form a hybrid recommender system.  In theory, we should overcome some of the problems with cold start recommendations.  Further experiments would be needed to verify that.

Reference

Jason M. Adams, Paul N. Bennett, Anthony Tomasic. Combining Personalized Agents to Improve Content-Based Recommendations. Technical Report CMU-LTI-07-015, Carnegie Mellon University, December 2007. [pdf]