A while back I ported David Blei’s lda-c code for performing Latent Dirichlet Allocation to Ruby. Basically I just wrapped the C methods in a Ruby class, turned it into a gem, and called it a day. The result was a bit ugly and unwieldy, like most research code. A few months later, Todd Fisher came along and discovered a couple bugs and memory leaks in the C code, for which I am very grateful. I had been toying with the idea of improving the Ruby code, and embarked on a mission to do so. The result is a hopefully much cleaner gem that can be used right out of the box with little screwing around.
Unfortunately, I did something I’m ashamed of. Ruby gems are notorious for breaking backwards compatibility, and I have done just that. The good news is, your code will almost work, assuming you didn’t start diving into the Document and Corpus classes too heavily. If you did, then you will probably experience a lot of breakage. The result, I hope is a more sensical implementation, however, so maybe you won’t hate me. Of course, I could be wrong and my implementation is still crap. If that’s the case, please let me know what needs to be improved.
To install the gem:
gem sources -a http://gems.github.com
sudo gem install ealdent-lda-ruby