Archive for July, 2009

Github just announced their own version of the Netflix Prize.  Instead of predicting movie ratings, Github wants you to suggest repositories for users to watch.  This is different from the Netflix Prize in a number of ways: a user watching a repo is similar to a user visiting a page from a search engine – [...]

A while back I ported David Blei’s lda-c code for performing Latent Dirichlet Allocation to Ruby.  Basically I just wrapped the C methods in a Ruby class, turned it into a gem, and called it a day.  The result was a bit ugly and unwieldy, like most research code.  A few months later, Todd Fisher [...]

A twitter friend (@communicating) tipped me off to the UEA-Lite Stemmer by Marie-Claire Jenkins and Dan J. Smith.  Stemmers are NLP tools that get rid of inflectional and derivational affixes from words.  In English, that usually means getting rid of the plural -s, progressive -ing, and preterite -ed.  Depending on the type of stemmer, that [...]

Mendicant Bug Podcast

Posted: 12 July 2009 in Uncategorized
Tags: , , , ,

Thanks to Odiogo.com (via @johndcook), this blog now has a podcast powered by speech synthesis.  Not having heard any decent speech synthesis for open domain text (maybe I’m behind the times here), I was pretty impressed with it.  John had a post with a quote from The Agony and the Ecstasy and Odiogo got it [...]

Learning Scala

Posted: 11 July 2009 in Uncategorized
Tags: , , , , ,

Two weeks ago, I picked up my copy of Programming in Scala, which had been languishing on my shelf for months.  I pre-purchased it since I went to high school with one of the authors (Lex Spoon).  His mother, incidentally, was also my favorite math teacher.  When I started my new job back in September [...]

There is no longer any reason to bother researching new ways of predicting the ratings users will give to movies.  It’s time to move on to more interesting things.  But seriously, given the fact that the last few miles of the Netflix competition were hard-fought by combining hundreds of different algorithms, is there much value [...]

Image via CrunchBase It looks like some of the top players in the Netflix Prize competition have teamed up and finally broke the 10% improvement barrier.  I know I’m a few days late on this, though not because I didn’t see when it happened.  I’ve been battling an ear infection all week and it has [...]