Posts Tagged ‘openephyra’

While watching the 2000 version of Henry James’ The Golden Bowl, I heard the once-common phrase “The deuce only knows…”  I’m always looking for vintage profanity, and this appealed to me strongly.  I’ve heard it hundreds or thousands of times before, of course, but here it was brought to the fore of my attention.  After some brief research, I found ties to 16th Century Northern German, Family Guy, and playing dice.  The word deuce seems most strongly tied in meaning to “the devil,” and is used interchangeably in old-fashioned profanity (cf. What the devil and What the deuce).

There are attested uses of the phrase “Was der Daus!” in German from the 16th Century, which has my money for being the real origin of the phrase.  Daus meant “devil” though the modern German is “Teufel.”  Deuce also means “two” and comes from the French deux.  Supposedly, the combination of the German phrase and the playing of dice led to the phrase entering English usage.  Rolling two (the Devil’s eyes) inspired the curse, since that was the lowest score and therefore, a loss.  I’m not sold on this particular coincidence.  It seems too much like folk etymology of the sort you hear in email forwards.  Lastly, while I enjoy Family Guy enormously when I hear it, I very seldomly get the opportunity to watch an episode, so the tie to Stewie was lost on me until Google unearthed it.

And when OpenEphyra is given the question What is the origin of the word deuce? the answer is “Watkins.”  It offers as evidence this page.  That page poses the question What does the word deuce mean? but the answer has nothing to do with my information need.  Also, the word Watkins never even appears on that page, so no idea where it came from.

OpenEphyra is a question answering (QA) system developed here at the Language Technologies Institute by Nico Schlaefer. He began his work at the University of Karlsruhe in Germany, but has since continued it at CMU and is currently a PhD student here. Since it is a home-grown language technologies package, I decided to check it out and play around. This is the first QA system I have used that wasn’t integrated in a search engine, so this isn’t exactly an expert review.

Getting started in Windows (or Linux or whatever) is pretty easy if you already have Apache ant and Java installed. Ant isn’t necessary, but I recommend getting it if you don’t have it already. It’s just handy. First, download the OpenEphyra package from sourceforge. The download is about 59 MB and once it’s done unpack it in whatever directory you want. Assuming you have ant installed, all you have to do is type ant to build it, though you may want to issue ant clean first. I had to make one slight change to the build.xml file to get it to run, which was on line 55: <jvmarg line="-server& #13;-Xms512m& #13;-Xmx1024m"/>, which had to be changed to <jvmarg line="-server -Xms512m -Xmx1024m"/>. Easy enough. Then to run it, all you have to do is type ant OpenEphyra.

After taking a short bit to load up, you can enter questions on the command line. Based on what I can tell from the output, it begins by normalizing the question (removing morphology, getting rid of punctuation). Then it determines the type of answer it is looking for, like a person’s name or a place and assigns certain properties to what it expects to find. Next it automatically creates a list of queries that are sent to the search engine(s). The documentation indicates that the AQUAINT, AQUAINT-2 and BLOG06 corpora are included (at least preprocessing is supported), but there are searchers for Google, Wikipedia, Yahoo and several others. Indri is a search engine which supports structured queries and OpenEphyra auto-generates some structured queries from what I saw playing around. After generating the queries, they are sent to the various searchers and results are obtained and scored. Finally, if you’re lucky, you get an answer to your question.

Here are the results of screwing around with it for a few minutes:

  • Who created OpenEphyra?
    • no answer (sorry, Nico)
  • Who invented the cotton gin?
    • Eli Whitney
  • Who created man?
    • God
  • What is the capital of Mongolia?
    • Ulaanbaatar
  • Who invented the flux capacitor?
    • Doc Brown (awesome!)
  • Who is the author of the Mendicant Bug?
    • Zuckerberg — damn you, Facebook! :(
  • How much wood can a woodchuck chuck?
    • no answer (correct)
  • What is the atomic number of Curium?
    • 96 (also correct)
  • Who killed Lord Voldemort?
    • Harry (correct, but partial)
  • How many rings for elven kings?
    • 3021 (so, so very wrong)

Fun stuff! It’s not anywhere near perfect, but there are definite uses and the thing is ridiculously easy to install and use. Also, it’s in Java, so you can integrate it with your own system with very little effort. Depending on what sort of question you are looking for answers to, you get various levels of results. Factual questions about geography and people tend to do better than questions about numbers and fiction, as you might expect. Also, why-questions are not supported.

Another bonus is the project is open source, so if you’re into QA, you can help develop it.