I’ve begun learning ruby for my new job, a language that doesn’t seem to have really gotten any traction in the NLP community (at least not that I’ve heard). I had been using python for my NLP stuff (homework and projects) and Java for my recommender system stuff. In retrospect, I could have used python for the recommender stuff, but I wasn’t aware of some speed-ups so resorted to Java. Of course, the recommender stuff isn’t strictly NLP. Ruby is just as well suited as python and seems a lot better than Java for many tasks (though Java certainly has its place). At the very least, a scripting language like ruby or python is great for prototyping. It’s easy to test new ideas quickly.
I was reading through Pang et al (2002), which deals with classifying movie reviews as positive or negative. They look at three machine learning approaches: Naive Bayes, Maximum Entropy classifier and Support Vector Machines. This seemed like a good opportunity to try out my nascent ruby skills, since it’s the kind of crap I can roll together in python in short order (and do all the time). So I downloaded the data for the paper (actually I downloaded the later data from the 2004 paper). There are 1000 positive and 1000 negative movie reviews. The task is to train a classifier to determine whether a review expresses a positive opinion (the author liked the movie) or a negative opinion (the author did not like the movie). I chose to just use SVMs since they do best for this task according to the paper, they do really well for text categorization, and they are easy to use and download.
The results were quite nice. Ruby turned out to be just as handy as python at manipulating text and dealing with crossfold validation: the two main “challenges” in implementing this paper. I used tf-idf for weighting the features and thresholded document frequency to discard words that didn’t appear in at least three reviews. The result was that I achieved about 85.7% accuracy using the same cross validation setup described in their followup work (Pang and Lee, 2004). In other words, the classifier could correctly guess the opinion orientation of reviews as positive or negative nearly 86% of the time.
Pang et al (2002) discussed some of their errors and hypothesized that discourse analysis might improve results, since reviewers often use sarcasm. There’s also the case where authors use a “thwarted expectations” narrative. This offered me one of the few chuckles I’ve ever had while reading a research paper:
“I hate the Spice Girls. … [3 things the author hates about them] … Why I saw this movie is a really, really, really long story, but I did and one would think I’d despise every minute of it. But… Okay, I’m really ashamed of it, but I enjoyed it. I mean, I admit it’s a really awful movie …the ninth floor of hell… The plot is such a mess that it’s terrible. But I loved it.”
References
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. ”Thumbs Up? Sentiment Classification Using Machine Learning Techniques.” In Proceedings of the ACL 02 conference on Empirical Methods in Natural Language Processing – Volume 10, July 2002. [pdf]
Bo Pang and Lillian Lee. ”A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts.” In Proceedings of the ACL, 2004. [pdf]