Posts Tagged ‘sentiment analysis’

Twitrratr

Posted: 27 October 2008 in Uncategorized
Tags: , , , , ,

Twitrratr is a new service that attempts to do sentiment analysis on Twitter (follow me while you’re at it).  According to their about page, they started off by tracking opinions on Obama but have since expanded to any term.  Enter a keyword and it searches twitter for occurrences.  It then assigns a sentiment to each post and returns percentages of positive, neutral, and negative tweets for that word.  You can also track your own sentiment by searching for @your-username.  I come up neutral, but there’s not a lot of data to go on there.

Their method appears to be fairly simple.  They have a collection of adjectives with sentiment values (negative, positive) and based on what appears in a given tweet, they can classify a sentence.  Of course, this is probably low recall (meaning it misses a lot of tweets that do express sentiment) since sentiment can be expressed without using adjectives.  I’m not sure if it tries to do anything with negation, but so far my scans of results look like it ignores it.

So even though it’s pretty ghetto, it’s a nice toy.  If they care to extend the algorithm, they have some pretty cool data to work with.  I think it would be cool to get some (possibly donated, probably not paid) human effort together to tag some of their data to release as a research dataset.

In my previous post about sentiment polarity, I talked about results from Pang et al (2002).  One of the conclusions in that paper was that the presence of sentiment words led to better classification results than the frequency of words.  In my experiment in that post, I used tf-idf, a frequency-based measure.  I ran some additional experiments a few days ago when I woke up way too early using presence (binary) weights.  The result was a slight improvement over tf-idf:  86.1% versus 85.7%.  If we ignore document frequency and just use term frequency, the results were terrible:  about 76%.  So presence versus term frequency is much better, but presence versus tf-idf isn’t much better.

Or is it?  Even more experiments with tf-idf produced an accuracy of 86.8%.  All of this is based on 10-fold cross validation using the Pang and Lee (2004) data set, just so we’re clear.  This seems to contradict their results. Of course, I wasn’t able to reproduce their results identically, even though I am using the folds exactly as they described.  This may be due to a pre-processing step I am skipping (or doing extra).  They mention length-normalizing the vectors, which I don’t usually bother with.  It’s an oft-suggested thing to do with svms, but I have yet to have it actually help me.

So I tried normalizing.  It hurt results for tf-idf, dropping it to 86.6%.  It made no difference for presence, which stayed at 86.1%.  No surprises there.

My results contradict Pang et al (2002) in that tf-idf (frequency-based) out-performs presence.  If I made a mistake, where was it?  I wish their source code were made available.  I guess I could always ask. There is usually some voodoo involved that isn’t obvious (to me) in the paper.  This is a-whole-nother topic, one discussed with far more eloquence (pdf warning) by Ted Pedersen in the latest issue of Computational Linguistics.

References

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. “Thumbs Up? Sentiment Classification Using Machine Learning Techniques.” In Proceedings of the ACL 02 conference on Empirical Methods in Natural Language Processing – Volume 10, July 2002. [pdf]

Bo Pang and Lillian Lee. “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts.” In Proceedings of the ACL, 2004. [pdf]

I’ve begun learning ruby for my new job, a language that doesn’t seem to have really gotten any traction in the NLP community (at least not that I’ve heard).  I had been using python for my NLP stuff (homework and projects) and Java for my recommender system stuff.  In retrospect, I could have used python for the recommender stuff, but I wasn’t aware of some speed-ups so resorted to Java.  Of course, the recommender stuff isn’t strictly NLP.  Ruby is just as well suited as python and seems a lot better than Java for many tasks (though Java certainly has its place).  At the very least, a scripting language like ruby or python is great for prototyping.  It’s easy to test new ideas quickly.

I was reading through Pang et al (2002), which deals with classifying movie reviews as positive or negative.  They look at three machine learning approaches:  Naive Bayes, Maximum Entropy classifier and Support Vector Machines.  This seemed like a good opportunity to try out my nascent ruby skills, since it’s the kind of crap I can roll together in python in short order (and do all the time).  So I downloaded the data for the paper (actually I downloaded the later data from the 2004 paper).  There are 1000 positive and 1000 negative movie reviews.  The task is to train a classifier to determine whether a review expresses a positive opinion (the author liked the movie) or a negative opinion (the author did not like the movie).  I chose to just use SVMs since they do best for this task according to the paper, they do really well for text categorization, and they are easy to use and download.

The results were quite nice.  Ruby turned out to be just as handy as python at manipulating text and dealing with crossfold validation:  the two main “challenges” in implementing this paper.  I used tf-idf for weighting the features and thresholded document frequency to discard words that didn’t appear in at least three reviews.  The result was that I achieved about 85.7% accuracy using the same cross validation setup described in their followup work (Pang and Lee, 2004).  In other words, the classifier could correctly guess the opinion orientation of reviews as positive or negative nearly 86% of the time.

Pang et al (2002) discussed some of their errors and hypothesized that discourse analysis might improve results, since reviewers often use sarcasm.  There’s also the case where authors use a “thwarted expectations” narrative.  This offered me one of the few chuckles I’ve ever had while reading a research paper:

“I hate the Spice Girls. … [3 things the author hates about them] …  Why I saw this movie is a really, really, really long story, but I did and one would think I’d despise every minute of it.  But… Okay, I’m really ashamed of it, but I enjoyed it.  I mean, I admit it’s a really awful movie …the ninth floor of hell… The plot is such a mess that it’s terrible.  But I loved it.”

References

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.  ”Thumbs Up?  Sentiment Classification Using Machine Learning Techniques.”  In Proceedings of the ACL 02 conference on Empirical Methods in Natural Language Processing – Volume 10, July 2002. [pdf]

Bo Pang and Lillian Lee.  ”A Sentimental Education:  Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts.”  In Proceedings of the ACL, 2004. [pdf]