A while back I ported David Blei’s lda-c code for performing Latent Dirichlet Allocation to Ruby. Basically I just wrapped the C methods in a Ruby class, turned it into a gem, and called it a day. The result was a bit ugly and unwieldy, like most research code. A few months later, Todd Fisher [...]
Posts Tagged ‘machine learning’
Updates to lda-ruby gem
Posted: 30 July 2009 in UncategorizedTags: c, computational linguistics, latent dirichlet allocation, lda, machine learning, nlp, ruby, rubygems, topic modeling
Netflix Prize just about wrapped up
Posted: 2 July 2009 in UncategorizedTags: clerk dogs, cmu, collaborative filtering, discovery engines, graduate school, hcir, human computer information retrieval, machine learning, movies, netflix, netflix prize, recommender systems, research
Image via CrunchBase It looks like some of the top players in the Netflix Prize competition have teamed up and finally broke the 10% improvement barrier. I know I’m a few days late on this, though not because I didn’t see when it happened. I’ve been battling an ear infection all week and it has [...]
LDA in Ruby
Posted: 17 November 2008 in UncategorizedTags: c, code, computational linguistics, git, github, latent dirichlet allocation, machine learning, programming, ruby, ruby gems, rubyforge, topic modeling
Since Ruby is my new favorite toy, I thought it would be fun to try my hand at C extensions. I came across David Blei’s C code for Latent Dirichlet Allocation and it looked simple enough to convert into a Ruby module. Ruby makes it very easy to wrap some C functions (which is good [...]
Latent Dirichlet Allocation
Posted: 16 November 2008 in UncategorizedTags: computational linguistics, computer science, em algorithm, latent dirichlet allocation, machine learning, statistics, topic modeling, topics, unsupervised learning
Latent Dirichlet Allocation (LDA) is an unsupervised method of finding topics in a collection of documents. It posits a set of possible topics from which a subset are selected for each document. This selected mixture of topics represents the topics discussed in the document, and each word in the document is generated by this mixture. [...]
Is presence really better than frequency?
Posted: 30 September 2008 in UncategorizedTags: computational linguistics, machine learning, opinion mining, reproducibility, sentiment analysis, svms
In my previous post about sentiment polarity, I talked about results from Pang et al (2002). One of the conclusions in that paper was that the presence of sentiment words led to better classification results than the frequency of words. In my experiment in that post, I used tf-idf, a frequency-based measure. I ran some [...]
Sentiment Polarity
Posted: 16 September 2008 in UncategorizedTags: classification, computational linguistics, java, machine learning, opinion mining, python, ruby, sentiment analysis, support vector machines, svms
I’ve begun learning ruby for my new job, a language that doesn’t seem to have really gotten any traction in the NLP community (at least not that I’ve heard). I had been using python for my NLP stuff (homework and projects) and Java for my recommender system stuff. In retrospect, I could have used python [...]
Opinion Mining
Posted: 14 August 2008 in UncategorizedTags: computational linguistics, data mining, machine learning, opinion mining, opinions, sentiment
Digging through customer review information appears to be a hot topic these days. There are a multitude of tasks that fall under the umbrella opinion mining, a few of which are: Feature identification – identifying features belonging to products in unstructured data Opinion word identification – identifying which words actually indicate a statement of opinion [...]
Stacked Agents Model
Posted: 3 July 2008 in UncategorizedTags: cmu, collaborative filtering, computational linguistics, information retrieval, machine learning, presentations, recommender systems, research
This is research I did a while ago and presented Monday to fulfill the requirements of my Masters degree. The presentation only needed to be about 20 minutes, so it was a very short intro. We have moved on since then, so when I say future work, I really mean future work. The post is [...]
The limits of collaborative filtering?
Posted: 25 June 2008 in UncategorizedTags: attributes, collaborative filtering, logic, machine learning, netflix prize, proportional analogies, recommender systems, relations, similarity
Peter Turney posted recently on the logic of attributional and relational similarity. Attributes are features or characteristics of a single entity. Relations describe some connection between two entities, such as a comparison. We’ll denote a relation between two entities A and B as A:B. A relational similarity between two groups A, B and C,D will [...]


