## Stacked Agents Model

Posted: 3 July 2008 in Uncategorized
Tags: , , , , , , ,

This is research I did a while ago and presented Monday to fulfill the requirements of my Masters degree.  The presentation only needed to be about 20 minutes, so it was a very short intro.  We have moved on since then, so when I say future work, I really mean future work.  The post is rather lengthy, so I have moved the main content below the jump.

### Recommender System Types

Recommender systems come in two main flavors: content-based and collaborative filtering. Content-based recommenders use information about the items themselves to recommend new items to a user.  The intuition behind this approach is that users tend to like or have a need for similar things.  Of course, this breaks down when I buy Great Aunt Edna a puppy-embroidered sweater for Christmas and Amazon starts recommending me ugly sweaters left and right.  The alternative is collaborative filtering, which uses similarity between users to recommend new items.  Whereas content-based recommenders are limited to only finding items similar to those you “enjoyed” in the past, collaborative filtering can present novel things that users like you also liked.  Collaborative filtering suffers from the problem that you have to provide the system enough data to find other users similar enough to you that it can produce reasonable predictions.  This is the user cold start problem.  Likewise, if a new item is added to the system and no one has rated it, collaborative filtering can’t recommend it either.  This is the item cold start problem.  Content-based recommenders are more robust to this.  However, the user must rate at least one item for a content-based recommender to get started (otherwise, you just recommend popular items or base it on the user demographics).  If there are no similar items, content-based fails as well.  That is less of a problem for collaborative filtering.

In collaborative filtering, the users and items can be thought of as a matrix.  The rows represent users and the columns represent items.  The value in each cell is either zero, if no rating has been given, or the value of the rating assigned by the user.  It is typically asserted that items in the matrix are not missing at random.  This means there is some function that maps a user to the items they will rate.  There will be items they will never rate because those items lie outside of the area of interests of the user.  This is fairly intuitive.  I will never read a Harlequin romance, and so I would never rate a book of that type in a book recommender system.  Also, a system can never present every item a user could rate to the user.  Users get bored entering ratings and will stop after a while.  If these two facts weren’t true, we wouldn’t need a recommender system in the first place.  This means that there are items a user could still rate if given the opportunity.

The ratio of the total items in the system to the items a user will rate is usually very large.  There are many times more missing values than present for the average user.  So the collaborative filtering matrix is very sparse.  This has the advantage of allowing us to use sparse matrix techniques to speed up some operations on the matrix, but it has the disadvantage normally associated with lack of data:  it is more difficult to build reliable models with less information.  The more data you have, the better off your machine learning algorithm is.

### The Stacked Agents Model

Our idea was to supplement the user-provided information in the collaborative filtering matrix with machine-generated predictions.  In this way, we can turn a sparse matrix into a full one.  To generate the predictions, we construct a content-based model for each user.  Each item the user has rated is assigned a feature vector $\vec{x}$.  The features depend on the domain, but if we’re recommending songs they might be artist, genre, record sales, year released, etc.  The label $y$ is the rating the user gave that item.   The feature weights can be chosen in a number of ways.  We used tf-idf to weight the features we collected (see the paper for details). A content-based agent for each user constructs a model and learns the user’s preferences.  It then predicts a rating for each and every item the user did not rate.

Returning to the idea of missing at random, we also would like an idea of how confident the system is that the user would rate the item at all.  For this we construct a confidence model for the user.  Again, we can use the content-based feature vector $\vec{x}$, but instead of the label $y$ being the rating, it is positive or negative (binary) to indicate whether the user has rated it or not.  The confidence model was the weakest part of our work and improving that is important moving forward.  The problem with this approach is that we tell the learner that an unrated item is a negative example.  This is true of some items, but not for the cases where the user would rate the item but hasn’t taken the time or had the chance.  We automatically bias our learner against the very examples we are seeking to learn.  So like I said, this part needs work.  There was a track at last year’s SIGKDD on predicting whether a user would rate an item, if you’re interested in more about this topic.

We constructed confidence agents for each user to predict the confidence the system has that the user would rate the item.  These confidence predictions are real values in the interval [0,1] (not confident, confident).  Once we have generated a prediction and confidence value for an unrated item, we can combine those into a new rating score:  $\hat{r}_{u,i} = p_{u,i} q_{u,i}$, where $p_{u,i}$ is the predicted rating for user $u$ and item $i$ and $q_{u,i}$ is the confidence score.  These form the basis of the collaborative filtering matrix in the stacked agents model.  Stacking is the process of using predictions from a previous machine learning method as the training input for another.  The combined observed data (user ratings) and machine predicted data are used to train each user’s stacked agent.

### Conclusions

While the results weren’t state of the art overall, we did manage to show that this method could improve plain content-based recommendations.  I see two main paths for future research on this method.  First of all, the confidence model was very weak and needs to be improved.  Secondly, we are combining collaborative filtering with content-based recommenders, to form a hybrid recommender system.  In theory, we should overcome some of the problems with cold start recommendations.  Further experiments would be needed to verify that.

### Reference

Jason M. Adams, Paul N. Bennett, Anthony Tomasic. Combining Personalized Agents to Improve Content-Based Recommendations. Technical Report CMU-LTI-07-015, Carnegie Mellon University, December 2007. [pdf]

1. Ray Uzwyshyn says:

I found your research intriguing here but by simply by reading about it, hard to envision, without more practical implementation examples. Do implementations exist online yet? I should take some more time with the paper here but would rather see a practical implementation. I also scanned through the ppt but would have liked to see a working implementation. Perhaps that’s the Ph.D!

I’ve been intrigued with Netflix recommendation systems lately as we’ve been watching some of these videos online and the system is ‘recommending’ choices – not overly well I might add: lots of room for improvement. Earlier I had read about the Netflix prize contest through a Wired article http://www.wired.com/techbiz/media/magazine/16-03/mf_netflix – more focused on psychologiist applying this by syntehsizing behavioralist economics ‘decision making’ heurisitcs (Tversky, Kahneman). This seems like a worthy line with these ‘heuristic’ decision methods to be quantified through algorithms. I do also think there’s a lot of room to move with these recommender systems by combining Luis von Ahn’s game idea’s regarding ‘games with a purpose’ with more old guard computer science recommender system challenges and say slying reconfiguring these by appropriating uthe Netflix platform to build these databases such as von Ahn is currently achieve with the Google Image Labeler.

A friend of mine sent me his recent Communications of the ACM Volume 51, Number 8 (2008), Pages 57-67, games with a purpose structural article which isn’t overly great (I would have liked more structural step by step alogrithmic recipe rather than justificatory methodology – I suppose this is more of a hard sell with the doyenne of the ACM crrowd but perhaps this will arrive later with the book! As you were on one of his design teams, why not implement some of these methodologies as an online ‘film recommendation’ game system that also takes advantage of other users: apparently the Netflix database is open through the prize. (On another note, there seems to be a whole level of paratextual data – other users blogging on movies they ‘liked’, mentioning others that could also be incorporated into this level of the database and this data doesn’t seem to be ‘in’ the recommender system).

Here’s what I presently don’t like about the online Netflix recommendation system (suggestions for future fixes improvements)

1) It doesn’t take into account that there are different users in my house making use of the subscription (a) baby daughter b)wife c)myself. – I like science fiction and documentary, my wife likes romantic comedy and my baby likes Scooby Doo and Word World – this diversity really throws wrench into the recommendations and the system is not intelligent enough to see the ‘large probabilistic’ differences between ‘genres’

2) the star rating system seems too facile: challenge of making this more defined while not alienating the user

3)the film and generic taxonomies seems too rigid and from the top down – perhaps this taxonomy can be organically generated and built somehow as ‘part’ of the recommendation system, more of a ‘folksonomy’ rather than taxonomy: this could be part of the game

4) There is no hiearchical definitions in the current ratings system between movies watched through the mail and movies watched instantly – we are more likely to watch the ones instantly as we watch everything through the computer and can’t wait for the mail: we limit our choices in the mail to those we really want; this should be taken into account in the system weightings.

5) There is a book I believe called “Discrimination” about Discriminating taste by Pierre Bourdieu – French Poststructuralist – some of this info I believe which is ‘class’ related could be useful worked into algorithms and quantified – the algorithm should quantify some of his ideas similar to the behavioral/quantification that the wired psychologist is doing.

This is also related to and combines a couple of your other recent ‘jog search’ posts but I also believe perhaps Netflix would be a good place to be working on these kind of problems as this is a great challenge and I do believe their ‘watch movies’ instantly will soon takeover any ‘physical’ based media ‘mail’ transfer. You also mentioned concept maps though in a recent entry and I believe this place here http://www.ihmc.us/openings.php has a history of working on some areas involving computational linguistics – its located in the city where I’m currently at and they always have an interesting lecture series. We also brought in their “concept maps” tools people http://cmap.ihmc.us/ for a couple lectures here at the univeristy for our digital learning and technology series http://www.uwf.edu/ruzwyshyn/2007Workshops/DigitalForumSeries.html -Concept maps seem to have been around and while I would have liked to use these for our ‘subject guides’ here – this hasn’t overly taken root with the appropriate sub departments – there does seem to room also for synergies between say recommender systems/games/concept maps and taking this to the next level.

Finally, though we don’t have the money to hire someone like you currently, I’m intrigued by what a computational linguist would do with some of our university library catalog tracking data http://library.uwf.edu/uwf_2007_08_report.htm which currently has the ability to track every keystroke into our system in interesting manners. We started on a statistical project regarding this here http://library.uwf.edu/endecastatistics.htm but this really does need a higher level of say application of mathematics with computational linguistics to think more synthetically about what the system is producing. Perhaps a meta- recommender system could be built connecting academics in disparate parts of the world on the same topics ‘from different angles’ at the same moment better together by the knowledge they are searching for and research they are doing (kind of like Netflix for academic research rather than movies: at any rate, ever you ever want to use any of this data feel free – but please combine it with von Ahn’s game idea. In his article, he does stress ‘the enjoyability’ factor and I do think he is onto something here to say the least