NLP Resources for Ruby

Posted: 13 September 2009 in Uncategorized
Tags: , , , , , , , ,

There are quite a few well-known libraries for doing various NLP tasks in Java and Python, such as the Stanford Parser (Java) and the Natural Language Toolkit (Python).  For Ruby, there are a few resources out there, but they are usually derivative or not as mature.  By derivative, I mean they are ports from other languages or extensions using code from another language.  And I’m responsible for two of them! :)

  • Treat – Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ve encountered so far for Ruby
    • Text extractors for various document formats
    • Chunkers, segmenters, tokenizers
    • LDA
    • much more – the list is big
  • Ruby Linguistics – this is one of the more ambitious projects, but is not as mature as NLTK
    • interface for WordNet
    • Link grammar parser
    • some inflection stuff
  • Stanford Core NLP – if you’ve gotten a headache trying to use the Java bridge, this is your answer
  • Stanford Parser interface – uses a Java bridge to access the Stanford Parser library
  • Mark Watson has a part of speech tagger [zip], a text categorizer [zip], and some text extraction utilities [zip], but I haven’t tried to use them yet
  • LDA Ruby Gem- Ruby port of David Blei’s lda-c library by yours truly
    • Uses Blei’s c-code for the actual LDA but I include some wrappers to make using it a bit easier
  • UEA Stemmer – Ruby port (again by yours truly) of a conservative stemmer based on Jenkins and Smith’s UEA Stemmer
  • Stemmer gemPorter stemmer
  • Lingua Stemmer - another stemming library, Porter stemmer
  • Ruby WordNet - basically what’s included in Ruby Linguistics
  • Raspell – Ruby interface to Aspell spell checker

There are also a number of fledgling or orphaned projects out there purporting to be ports or interfaces for various other libraries like Stanford POS Tagger and Named Entity Recognizer.  Ruby (straight Ruby, not just JRuby) can interface just about any Java library using the Ruby Java Bridge (RJB).  RJB can be a pain, and I could only initialize it once per run (a second attempt never succeeds), so there are some limitations.  But using it, I was able to easily interface with the Stanford POS tagger.

So while there aren’t terribly many libraries for NLP tasks in Ruby, the availability of interfacing with Java directly widens the scope quite a bit.  You can also incorporate a c library using extensions.

Naturally, if I missed anything, no matter how small, please let me know.

Update: Here is a great list of AI-related ruby libraries from Dustin Smith.

About these ads
Comments
  1. [...] NLP Resources for Ruby « The Mendicant Bug mendicantbug.com/2009/09/13/nlp-resources-for-ruby – view page – cached Posted by Jason Adams in computational linguistics, java, natural language processing, nlp, parsers, python, ruby, stemmers, wordnet. Leave a Comment — From the page [...]

  2. Thank you so much for compiling this list. I just discovered your blog and will spend a lot of time reading it in the next few months :)

    – Thibaut

  3. Александр says:

    Сайт очень качественный. Вручить бы Вам награду за него или просто почетный орден. =)

  4. Ryan Stout says:

    Jason,
    Do any of these libs support breaking a word up into syllables?

    Thanks,
    Ryan

    • Jason Adams says:

      Not that I’m aware of, unfortunately. You might be able to derive them from the CMU pronouncing dictionary (http://www.speech.cs.cmu.edu/cgi-bin/cmudict), but it’s not going to be easy since English orthography has diverged so much from pronunciation. There might be other dictionary resources out there that already have this. If you find it I’d appreciate it if you could leave another comment with a link.

  5. Mark Essel says:

    Heyo, just came across your post and I’m very interested in running a local name entity recognizer/extractor. I have some code that relies on third party apis but it’s far too slow. I was hoping for a solid ruby gem that handled NER fairly well (80%+ identification) but haven’t come across one yet.

    As a fall back I can use connector words and urls as delimiters, as well as look for capitalized words to identify named entities.

  6. [...] about ruby specifically…eek..I have to think on that oneThere are some other resources on the webhttp://mendicantbug.com/2009/09/…But this is very sparseThis answer .Please specify the necessary improvements. Edit Link Text [...]

  7. [...] Jason Adams, who does “opinion mining for a startup in Atlanta”, has a list of NLP Resources for Ruby. [...]

  8. [...] Treat egy NLP Ruby library, amit ki akartam próbálni [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s