Updates to lda-ruby gem

Posted: 30 July 2009 in Uncategorized
Tags: , , , , , , , ,

works-on-my-machine-starburstA while back I ported David Blei’s lda-c code for performing Latent Dirichlet Allocation to Ruby.  Basically I just wrapped the C methods in a Ruby class, turned it into a gem, and called it a day.  The result was a bit ugly and unwieldy, like most research code.  A few months later, Todd Fisher came along and discovered a couple bugs and memory leaks in the C code, for which I am very grateful.  I had been toying with the idea of improving the Ruby code, and embarked on a mission to do so.  The result is a hopefully much cleaner gem that can be used right out of the box with little screwing around.

Unfortunately, I did something I’m ashamed of.  Ruby gems are notorious for breaking backwards compatibility, and I have done just that.  The good news is, your code will almost work, assuming you didn’t start diving into the Document and Corpus classes too heavily.  If you did, then you will probably experience a lot of breakage.  The result, I hope is a more sensical implementation, however, so maybe you won’t hate me.  Of course, I could be wrong and my implementation is still crap.  If that’s the case, please let me know what needs to be improved.

To install the gem:

gem sources -a http://gems.github.com
sudo gem install ealdent-lda-ruby

Enjoy!

Reblog this post [with Zemanta]

Advertisement
Comments
  1. Plotti says:

    Hi Jason,

    I’ve just installed your gem and played around with it:

    require ‘rubygems’
    require ‘lda-ruby’

    corpus = Lda::TextCorpus.new(“wiki.yml”) #your wiki.yml sample from githum
    lda = Lda::Lda.new(corpus)
    lda.em(“random”)
    lda.print_topics(20)

    I am still getting mem-errors:
    **** em iteration 18 ****
    document 0
    alpha maximization : -4287345.78872 -43327.81103
    alpha maximization : -1548702.32296 -43323.21111
    alpha maximization : -541382.27277 -43310.57142
    alpha maximization : -170984.67685 -43275.21738
    alpha maximization : -34908.82190 -43172.08169
    alpha maximization : 14938.21277 -42849.82881
    alpha maximization : 33017.35118 -41815.29136
    alpha maximization : 39371.99323 -38767.30725
    alpha maximization : 41421.31191 -31264.63627
    alpha maximization : 41950.37711 -17795.42690
    alpha maximization : 42029.01551 -4632.65941
    alpha maximization : 42032.30393 -256.10453
    alpha maximization : 42032.31270 -0.73255
    alpha maximization : 42032.31270 -0.00001
    new alpha = 0.01187
    *** glibc detected *** ruby: double free or corruption (!prev): 0x085f9938 ***
    ======= Backtrace: =========
    /lib/i686/cmov/libc.so.6[0xb7e2b764]
    /lib/i686/cmov/libc.so.6(cfree+0×96)[0xb7e2d966]
    /usr/local/lib/ruby/gems/1.8/gems/ealdent-lda-ruby-0.3.1/lib/lda-ruby/lda.so(free_lda_suffstats+0x4a)[0xb793b55f]
    /usr/local/lib/ruby/gems/1.8/gems/ealdent-lda-ruby-0.3.1/lib/lda-ruby/lda.so(run_quiet_em+0x60b)[0xb793dfe1]
    /usr/local/lib/ruby/gems/1.8/gems/ealdent-lda-ruby-0.3.1/lib/lda-ruby/lda.so[0xb793e4c8]
    ruby[0x805fbd2]
    ruby[0x8060891]
    ruby[0x805e024]
    ruby[0x806b24e]
    ruby(ruby_exec+0×16)[0x806b286]
    ruby(ruby_run+0×21)[0x806b2b1]
    ruby[0x805363f]
    /lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7dd3455]
    ruby[0x8053571]
    ======= Memory map: ========
    08048000-080fe000 r-xp 00000000 08:03 16942342 /usr/local/bin/ruby
    080fe000-080ff000 rw-p 000b6000 08:03 16942342 /usr/local/bin/ruby
    080ff000-08862000 rw-p 080ff000 00:00 0 [heap]
    b7800000-b7821000 rw-p b7800000 00:00 0
    b7821000-b7900000 —p b7821000 00:00 0
    b791f000-b792b000 r-xp 00000000 08:03 5729301 /lib/libgcc_s.so.1
    b792b000-b792c000 rw-p 0000b000 08:03 5729301 /lib/libgcc_s.so.1
    b7939000-b7941000 r-xp 00000000 08:03 17928557 /usr/local/lib/ruby/gems/1.8/gems/ealdent-lda-ruby-0.3.1/lib/lda-ruby/lda.so
    b7941000-b7942000 rw-p 00007000 08:03 17928557 /usr/local/lib/ruby/gems/1.8/gems/ealdent-lda-ruby-0.3.1/lib/lda-ruby/lda.so
    b7942000-b7cff000 rw-p b7942000 00:00 0
    b7cff000-b7d18000 r-xp 00000000 08:03 16977996 /usr/local/lib/ruby/1.8/i686-linux/syck.so
    b7d18000-b7d19000 rw-p 00019000 08:03 16977996 /usr/local/lib/ruby/1.8/i686-linux/syck.so
    b7d19000-b7da4000 rw-p b7d19000 00:00 0
    b7da4000-b7db9000 r-xp 00000000 08:03 5755525 /lib/i686/cmov/libpthread-2.7.so
    b7db9000-b7dbb000 rw-p 00014000 08:03 5755525 /lib/i686/cmov/libpthread-2.7.so
    b7dbb000-b7dbd000 rw-p b7dbb000 00:00 0
    b7dbd000-b7f12000 r-xp 00000000 08:03 5755533 /lib/i686/cmov/libc-2.7.so
    b7f12000-b7f13000 r–p 00155000 08:03 5755533 /lib/i686/cmov/libc-2.7.so
    b7f13000-b7f15000 rw-p 00156000 08:03 5755533 /lib/i686/cmov/libc-2.7.so
    b7f15000-b7f18000 rw-p b7f15000 00:00 0
    b7f18000-b7f3c000 r-xp 00000000 08:03 5755523 /lib/i686/cmov/libm-2.7.so
    b7f3c000-b7f3e000 rw-p 00023000 08:03 5755523 /lib/i686/cmov/libm-2.7.so
    b7f3e000-b7f47000 r-xp 00000000 08:03 5755524 /lib/i686/cmov/libcrypt-2.7.so
    b7f47000-b7f49000 rw-p 00008000 08:03 5755524 /lib/i686/cmov/libcrypt-2.7.so
    b7f49000-b7f71000 rw-p b7f49000 00:00 0
    b7f71000-b7f73000 r-xp 00000000 08:03 5755527 /lib/i686/cmov/libdl-2.7.so
    b7f73000-b7f75000 rw-p 00001000 08:03 5755527 /lib/i686/cmov/libdl-2.7.so
    b7f75000-b7f7c000 r-xp 00000000 08:03 5755538 /lib/i686/cmov/librt-2.7.so
    b7f7c000-b7f7e000 rw-p 00006000 08:03 5755538 /lib/i686/cmov/librt-2.7.so
    b7f7e000-b7f7f000 rw-p b7f7e000 00:00 0
    b7f7f000-b7f83000 r-xp 00000000 08:03 16977998 /usr/local/lib/ruby/1.8/i686-linux/stringio.so
    b7f83000-b7f84000 rw-p 00003000 08:03 16977998 /usr/local/lib/ruby/1.8/i686-linux/stringio.so
    b7f84000-b7f86000 r-xp 00000000 08:03 16978006 /usr/local/lib/ruby/1.8/i686-linux/etc.so
    b7f86000-b7f87000 rw-p 00001000 08:03 16978006 /usr/local/lib/ruby/1.8/i686-linux/etc.so
    b7f87000-b7f8a000 r-xp 00000000 08:03 16978002 /usr/local/lib/ruby/1.8/i686-linux/thread.so
    b7f8a000-b7f8b000 rw-p 00002000 08:03 16978002 /usr/local/lib/ruby/1.8/i686-linux/thread.so
    b7f8b000-b7f8d000 rw-p b7f8b000 00:00 0
    b7f8d000-b7f8e000 r-xp b7f8d000 00:00 0 [vdso]
    b7f8e000-b7fa8000 r-xp 00000000 08:03 5732794 /lib/ld-2.7.so
    b7fa8000-b7faa000 rw-p 0001a000 08:03 5732794 /lib/ld-2.7.so
    bfbd1000-bfbe6000 rw-p bfbd1000 00:00 0 [stack]

    • Jason Adams says:

      If you could post the issue on github with the code, I’d appreciate it. That way I can keep track of it better when I have a chance to look at it. When you do also post your OS info. thanks

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

Please log in to WordPress.com to post a comment to your blog.

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s