You are currently browsing the monthly archive for February, 2008.

Rant warning.

Craig Venter is a geneticist who has been working on engineering new organisms and recently spoke at TED.  He made news (as every news story you see about him over the past couple days is happy to point out) in 2001 for sequencing his own genome.  His current project is in creating a single-celled organism that eats CO2 as fuel.  This notion of creating an artificial life form is very hot these days.  I’ve seen a number of estimates that say within 3-5 years we will have our first artificial life form.  Craig says 1-2 years.

He makes a claim that is fairly Earth-shattering:

“We have modest goals of replacing the whole petrochemical industry and becoming a major source of energy.  We think we will have fourth-generation fuels in about 18 months, with CO2 as the fuel stock.”

If he is right, this could mean the end of the peak oil problem.  So what about ethical concerns?  Like all researchers in this area, he takes the good-human worldview:

“Fortunately, there’s not that many people on this planet wanting to do harm with these tools. Very few biological agents that we work with … could be weaponized. But it is an important issue. Every new technology has the ability to be abused.” [source]

I am, admittedly, cynical.  I personally believe that if a technology can be weaponized, not only will it be, but the government is probably already funding it.  And also, let’s be honest, it only takes one person wanting to do harm with this technology to be successful for it to be a serious problem.  He also points out that only two countries had programs for creating designer viruses and those are supposedly discontinued (the US and the former Soviet Union).  Discontinued?  Riiight.

Venter also said he performed a large bioethical study involving many religious groups and no one found anything in their “law books” to prohibit the creation of artificial life forms.  I take a very dim view of so-called bioethicists and anyone referring to bioethics with authority.  They are often-times just so much smoke in the wind, and who are they to say something is ethical or not?  I deny their authority.  I have never heard the output of any bioethical unit (the drones calling themselves bioethicists) that has struck me as particularly useful or unbiased.  They are always reported in the media as “So-and-so, an expert in bioethics, says it’s ok to do X.”  Umm, no.  I think what I find lacking is the attention by bioethicists to the catastrophic cost of abuse.  If you think I’m over-reacting, I have two words for you:  Hiroshima and Nagasaki.  As cynical as I am, I still can’t believe humans have stockpiled as many nuclear weapons as we have.  It is madness.

I also won’t deny that this stuff is seriously cool.  One way or another, it will change the world.

Systran is one of the oldest companies around that provide machine translation software.  They power some language-pairs of Microsoft’s translation service, Altavista’s Babelfish, and quite a few others (including, until recently, Google).  In the past, their software has been rule-based, so translation is done with a bilingual dictionary and a set of rules of how to change text from one language into another.  Based on a recent bevy of jobs postings on Linguist List, it appears they are going statistical.  Maybe they have been for a while, I don’t know, since I don’t actually follow what they do, but this piqued my interest.

If your interest is piqued too, the listings are for:

  1. Research Scientist in computational linguistics
  2. Program manager
  3. Software Engineer

And, of course, salary ranges are not provided.

Whenever I hear the word enormity used to describe how gi-freakin-normous something is, I always willfully misinterpret it to mean an act of extreme evil or extreme wickedness.  Now before you start screaming prescriptivist and throwing Kleenexes drenched in the snot of sociolinguistics at me — I’m not being a prescriptivist.  Of course people have the right to use enormity that way.  It is certainly the trend for that word and it probably will be within my generation that almost everyone forgets its original meaning.  I just so like the meaning of extreme wickedness that I want to be able to use it to mean that without being misinterpreted.  And a lot of people only know that word to mean gigantic.

So I was listening to a promo video (below) by Richard Branson of Virgin Galactic.  Branson opens up with this line:

 ”Astronauts of the past 45 years have all returned to Earth struggling to convey the enormity of what they have discovered and with their perceptions clearly changed.”

And quite frankly, the sinister music blends with my interpretation of enormity far better.  Astronauts have all returned overwhelmed by the vast wickedness they encountered in space.  Awesome!  I totally wanna go now.  Actually, I’ve always wanted to go and probably would go even if I was told I had a 50/50 chance of making it back alive, so enormity just ups the thrill level.

Go (围棋, 碁, 바둑) is one of my obsessions. I’ve been playing for a year, mostly as ealdent on Online Go Server (OGS) and am currently about 12.5 kyu, though I shift around a bit. At the moment, I’m in a bit of downswing, mostly because stress and not concentrating is leading me to make foolish moves, plus I don’t have a lot of time to devote to analyzing what I’m doing wrong. One of the coolest things about Go to me is the fact that it is an accepted fact in the Go world that your health and mental state contribute to your ability. It makes sense: when you sit down to a game that requires hours of concentration, if your health isn’t good, you will be distracted.

Two snapback symmetries in a game of Go.

So in one of my games against a lower-strength player (about 7 kyu lower), I just noticed the emergence of a really cool symmetry. I have a double snapback (I am the white stones) set up right now. If he plays at E12, I can kill the three stones at F11, E11 and E12 by playing again at F12. If he kills my stone at G13 by playing at H13, I can kill those three stones. Two identical snapbacks back to back. Cool huh? Plus, if he plays at F14, he will put my stones at E16 and F16 in the exact same snapback position by playing again at E15. Go is a beautiful game.

I recorded what this would look like via my cell phone, so sorry for the crappy video. I need to look into some sort of desktop recording software.

Unfortunately, no release date yet on this book. I am dying, dying waiting…


Cover art for A Dance with Dragons by George R R Martin

Green logo

The best criticism Obama can level against him is that he criticizes people for not living up to his standards. When it comes to choosing a president, I want someone with high standards.

Stepping back in time in MT Eval from my last post, Liu and Gildea (2005) were among the first to really bring syntactic information to evaluating machine translation output. They proposed three metrics for evaluating machine hypotheses: the subtree metric (STM), the tree kernel metric (TKM), and the headword chain metric (HWCM). STM and TKM also had variants for dependency trees, which HWCM relies on. Owczarzak et al. (2007) extended HWCM from dependency parses to LFG parses. HWCM has attracted more attention since it showed better correlation at the sentence level than either STM and TKM (both versions) and outperformed BLEU on longer n-grams. It’s interesting to note, though, that the dependency-based tree kernel metric performed best of all at the corpus level. Sentence level granularity is typically more important for helping you tune your MT system.

The subtree metric is a fairly straightforward idea. You begin by parsing both the hypothesis and the reference sentences using a parser like Charniak or Collins to get a Penn TreeBank style phrase structure tree. You then compare all subtrees in the hypothesis to the reference trees, thresholding the number of matches by the best match in the reference trees. The formula is given below:

subtree metric formula

The tree kernel metric uses convolution kernels discussed by Collins and Duffy (2001). For the specifics of this method, I refer you to the respective papers (and I may post on it at a later date), but the general idea is that you can transform structured data (a tree) into a feature vector by using the kernel trick. Finding all subtrees of a tree can be exponential in the size of the sentence, which would make computation infeasible for large sentences. The kernel trick lets us operate in this exponentially-high-dimensional space with a polynomial time algorithm. Once we have constructed the feature vectors for the hypothesis and refernece trees, we can score them with their cosine similarity:

tree kernel metric

H(T1) and H(T2) are vectors with non-zero values for subtrees (dimensions) that appear in each tree, so the dot product of the two is the number of subtrees in common. The score is computed as the maximum cosine similarity between the hypothesis and the references.

Finally, the headword chain metric (HWCM) relies on dependency parses, which I touched on in my previous post.

In dependency grammars, a tree is built by linking a word to its head. So a determiner would be linked to the noun it modifies, the direct object would be linked to the verb, etc. Each link of this sort is a headword chain of length 2. As you build up the tree, you can construct longer and longer headword chains.

The HWCM score is calculated just like the STM except by comparing headword chains. The difference between the HWCM and the dependency version of the STM is that STM considers all subtrees whereas HWCM only looks at direct mother-daughter relations (no cousins or sisters).

References

Michael Collins and Nigel Duffy. 2001. Convolution kernels for natural language. In Advances in Neural Information Processing Systems.

Ding Liu and Daniel Gildea. 2005. Syntactic Features for Evaluation of Machine Translation. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization at the Association for Computational Linguistics Conference 2005, Ann Arbor, Michigan.

Karolina Owczarzak, Josef van Genabith, and Andy Way. 2007. Labelled Dependencies in Machine Translation Evaluation. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 104-111, Prague, June 2007.

We’ve all had these days.  But if we were in France, the outcome might have been different.

It seems to me that the first question on any free online IQ test is whether to take the test. You automatically fail to achieve high marks if your answer is yes (take the test).

How do you best visualize the struggle between two competing standards?  Light sabre battles between children, of course!

HD DVD vs. Blu-ray

Last night was the last total lunar eclipse for two years and it was quite good. Pittsburgh weather cleared long enough for me to snap a couple shots of the unobstructed moon with Regulus (the brightest star in the constellation Leo) bright above it and Saturn even brighter to the bottom left. There was still a light haze that I think made it difficult for me to get the focus right. I was able to capture the rich, red color while the moon was still exposing a sliver of sun-drenched rock. Then the clouds came in earnest and I was getting tired, so I went to bed, missing the full umbra. But at least I got to see some of it this time. Last time there was a lunar eclipse, I was completely out of luck.

Total lunar eclipse from February 20, 2008.  A sliver is still exposed to the sun.

Read the rest of this entry »

Since Papineni et al. (2002) introduced the BLEU metric for machine translation evaluation, string matching functions have dominated the field. These metrics work well enough, but there are cases where they break down and more and more research is revealing their biases. Also, BLEU does not correlate especially well with human judgments, so the quality of MT would benefit from a metric that better captures what makes a good translation.

A recent trend in this direction has been to introduce linguistic information in MT eval. Liu and Gildea (2005) used unlabeled dependency trees to extract headword chains from machine and reference translations to evaluate MT output. To define a few terms, reference translations are human translations that machine translations are compared to during evaluation. In dependency grammars, a tree is built by linking a word to its head. So a determiner would be linked to the noun it modifies, the direct object would be linked to the verb, etc. Each link of this sort is a headword chain of length 2. As you build up the tree, you can construct longer and longer headword chains. Liu and Gildea compared the headword chains constructed for both machine and reference translations and produced a metric based on comparing the two sets of headword chains. These chains were not annotated with any sort of grammatical relation (subject, object, etc), so they are unlabeled dependencies.

Owczarzak et al. (2007) have extended the work by Liu and Gildea (2005) using labeled dependencies. They parsed the pairs of sentences with a Lexical Functional Grammar (LFG) parser by Cahill et al (2004). In LFG, there are two components of every parse: a c-structure (i.e. a parse tree) and an f-structure, which describes the features of the lexical items. An example of an LFG parse from their paper is given below. F-structures are recursive structures with a head containing all of its constituents. From the f-structure it is easy to construct dependency trees. The bonus is that the f-structure provides the grammatical relations between items in the dependency trees. In the example below, the dependency subj(resign, john) has the grammatical relation of subject. That is, John is the subject of the sentence headed by the verb resigned.

c structure and f structure of two sentences with the same meaning from Owczarzak et al 2007

Their metric is then simply a comparison of these labeled dependency headword chains using precision and recall to compute the f-score (harmonic mean). One of the coolest things in the paper is how they handle parser noise. Statistical parsers are not perfect. They estimate probabilities for rules from labeled data. In natural language, variation is pretty much unlimited, so no matter how big the training corpus, there will always be things the parser has never seen before. Also, we are dealing with imperfect input (by the MT systems or humans) so the problem of noise could be even greater. They address this by running 100 sentences through the various MT metrics they are comparing (including their own) as both the reference machine translation. This produces the “perfect score” for each metric since they are identical. Next, adjuncts are rearranged in the sentence so that the resulting meaning has not been changed, but the structure has. Each MT metric now evaluates the new sentence compared to the original and computes a score. For the LFG parse, the f-structure should remain the same in both cases, so any divergence can be attributed to parser noise. In order to this noise, they used the n-best parses and were able to increase the f-score, bringing it closer to the baesline (ideal). So instead of just comparing the best parse for the reference and machine translation, they combine the n-best parses to compute the f-score.

The result is that they get correlations with human judgments competitive with the best system they compare themselves to (METEOR, Banerjee and Lavie, 2005), beating it for fluency and coming in a close second overall. As far as future work goes, there are quite a few extensions they mention in the paper. The LFG parser produces 32 different types of grammatical relations. In the current setup, they are all weighted the same, but they would like to try tuning the weights to see how that affects the score. Another extension they propose is using paraphrases derived from a parallel corpus. There has been other work done on paraphrasing for MT evaluation (notably Russo-Lassner et al., 2005). One thing I am curious about is whether changing the weight on the harmonic mean would have an impact on correlation. METEOR uses the F9-score while the typical thing to do is F1. It’s not clear that weighting precision and recall equally is the best thing to do.

Interesting stuff, though. I hope they continue the work and maybe we’ll see something in this year’s ACL.

Update

Karolina Owczarzak has confirmed they were using the F1 score and that different F-scores did not lead to significant improvements. I also added the image I forgot to include in the original post.

References

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the Association for Computational Linguistics Conference 2005, pages 65-73, Ann Arbor, Michigan.

Aoife Cahill, Michael Burke, Ruth O’Donovan, Josef van Genabith, and Andy Way. 2004. Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), July 21-26, pages 320-327, Barcelona, Spain.

Ding Liu and Daniel Gildea. 2005. Syntactic Features for Evaluation of Machine Translation. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization at the Association for Computational Linguistics Conference 2005, Ann Arbor, Michigan.

Karolina Owczarzak, Josef van Genabith, and Andy Way. 2007. Labelled Dependencies in Machine Translation Evaluation. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 104-111, Prague, June 2007.

Grazia Russo-Lassner, Jimmy Lin and Philip Resnik. 2005. A Paraphrase-based Approach to Machine Translation Evaluation. Technical Report LAMP-TR-125/CS-TR-4754/UMIACS-TR-2005-57, University of Maryland, College Park, Maryland.

Rumors are brewing that Microsoft is going to announce the release of a new product called Worldwide Telescope later this month. WT should allow users to zoom in on parts of the sky for which data exists. Data will be drawn from a number of ground-based telescopes as well as Hubble. Google Sky does this already in a nauseatingly ugly way. It’s bad. Epic fail there. Stellarium, on the other hand, is an open source star charting program that blows Google Sky away. I’ve been using it for a few years now and have been very happy with it. From the sound of the TechCrunch article, though, Worldwide Telescope could blow Stellarium away. I really hope so. And if it’s free, I’ll be forced to give Microsoft props for doing something right for a change.

My australian shepherd Willow with ice in her whiskers

Willow in her natural environment. She has taken to eating all the snow she can find. Right now it’s melting, and there is an ice path leading along the side of my apartment that is covered in water. (Water on top, ice on bottom.) No picture of that, though. This is her on Valentine’s Day.

While watching the 2000 version of Henry James’ The Golden Bowl, I heard the once-common phrase “The deuce only knows…”  I’m always looking for vintage profanity, and this appealed to me strongly.  I’ve heard it hundreds or thousands of times before, of course, but here it was brought to the fore of my attention.  After some brief research, I found ties to 16th Century Northern German, Family Guy, and playing dice.  The word deuce seems most strongly tied in meaning to “the devil,” and is used interchangeably in old-fashioned profanity (cf. What the devil and What the deuce).

There are attested uses of the phrase “Was der Daus!” in German from the 16th Century, which has my money for being the real origin of the phrase.  Daus meant “devil” though the modern German is “Teufel.”  Deuce also means “two” and comes from the French deux.  Supposedly, the combination of the German phrase and the playing of dice led to the phrase entering English usage.  Rolling two (the Devil’s eyes) inspired the curse, since that was the lowest score and therefore, a loss.  I’m not sold on this particular coincidence.  It seems too much like folk etymology of the sort you hear in email forwards.  Lastly, while I enjoy Family Guy enormously when I hear it, I very seldomly get the opportunity to watch an episode, so the tie to Stewie was lost on me until Google unearthed it.

And when OpenEphyra is given the question What is the origin of the word deuce? the answer is “Watkins.”  It offers as evidence this page.  That page poses the question What does the word deuce mean? but the answer has nothing to do with my information need.  Also, the word Watkins never even appears on that page, so no idea where it came from.

OpenEphyra is a question answering (QA) system developed here at the Language Technologies Institute by Nico Schlaefer. He began his work at the University of Karlsruhe in Germany, but has since continued it at CMU and is currently a PhD student here. Since it is a home-grown language technologies package, I decided to check it out and play around. This is the first QA system I have used that wasn’t integrated in a search engine, so this isn’t exactly an expert review.

Getting started in Windows (or Linux or whatever) is pretty easy if you already have Apache ant and Java installed. Ant isn’t necessary, but I recommend getting it if you don’t have it already. It’s just handy. First, download the OpenEphyra package from sourceforge. The download is about 59 MB and once it’s done unpack it in whatever directory you want. Assuming you have ant installed, all you have to do is type ant to build it, though you may want to issue ant clean first. I had to make one slight change to the build.xml file to get it to run, which was on line 55: <jvmarg line="-server& #13;-Xms512m& #13;-Xmx1024m"/>, which had to be changed to <jvmarg line="-server -Xms512m -Xmx1024m"/>. Easy enough. Then to run it, all you have to do is type ant OpenEphyra.

After taking a short bit to load up, you can enter questions on the command line. Based on what I can tell from the output, it begins by normalizing the question (removing morphology, getting rid of punctuation). Then it determines the type of answer it is looking for, like a person’s name or a place and assigns certain properties to what it expects to find. Next it automatically creates a list of queries that are sent to the search engine(s). The documentation indicates that the AQUAINT, AQUAINT-2 and BLOG06 corpora are included (at least preprocessing is supported), but there are searchers for Google, Wikipedia, Yahoo and several others. Indri is a search engine which supports structured queries and OpenEphyra auto-generates some structured queries from what I saw playing around. After generating the queries, they are sent to the various searchers and results are obtained and scored. Finally, if you’re lucky, you get an answer to your question.

Here are the results of screwing around with it for a few minutes:

  • Who created OpenEphyra?
    • no answer (sorry, Nico)
  • Who invented the cotton gin?
    • Eli Whitney
  • Who created man?
    • God
  • What is the capital of Mongolia?
    • Ulaanbaatar
  • Who invented the flux capacitor?
    • Doc Brown (awesome!)
  • Who is the author of the Mendicant Bug?
    • Zuckerberg — damn you, Facebook! :(
  • How much wood can a woodchuck chuck?
    • no answer (correct)
  • What is the atomic number of Curium?
    • 96 (also correct)
  • Who killed Lord Voldemort?
    • Harry (correct, but partial)
  • How many rings for elven kings?
    • 3021 (so, so very wrong)

Fun stuff! It’s not anywhere near perfect, but there are definite uses and the thing is ridiculously easy to install and use. Also, it’s in Java, so you can integrate it with your own system with very little effort. Depending on what sort of question you are looking for answers to, you get various levels of results. Factual questions about geography and people tend to do better than questions about numbers and fiction, as you might expect. Also, why-questions are not supported.

Another bonus is the project is open source, so if you’re into QA, you can help develop it.

This conference looks like it might be fun if you’re a student working on some area of AI/machine learning and going to a university in the Northeastern US. NESCAI is the North East Student Colloquium on Artificial Intelligence and will be held at Cornell May 2-4, 2008. The deadline for papers is March 7, 2008, so the date is fast approaching. The full CFP is below the fold.

Read the rest of this entry »

Happy Valentine’s Day (from Wondermark) - You can ride my turtle anytime.

Thank you for your constant support and encouragement, unfailing love, and all the hard work you do to help make my dreams come true. You are my dream come true!

At ACL this year, the Third Workshop on Stastical Machine Translation will be held and they are featuring a shared task on MT evaluation. The shared task will involve evaluating output from the shared translation task, which will be released on March 24th, with short papers and rankings due on April 4th. I created an MT evaluation system (pdf) last year for a class (on MT, no less), though I doubt it would do particularly well. I outperformed BLEU, but fell short of METEOR. In any case, it might be interesting to play with the data and certainly will be interesting to read the papers. My system does perform sentence-level ranking as one of its primary goals, which is also a goal stated by the shared task.

It doesn’t inspire confidence in a jobs posting site when you get results for job salaries like this:

Comparison of salaries for rapists, serial killers and republicans from Indeed.com

I must admit, I am surprised. I thought for sure being a Republican paid better than raping people, or at least paid the same. Time travelers have an unfortunately low salary. Obviously, they are too stupid to realize they could be hocking artifacts from the past for millions. Oh well.

Dan Reed has posted an interesting article both on his blog and on the Computing Research Policy Blog about the many problems in computing education. Ever since the dotcom bubble burst, computer science enrollment at universities has declined and even more so for women. So many ideas have been tossed around out there, trying to figure out just where we’re going wrong. Recently I wrote about Robert Dewar’s views on where CS education has failed. He made the case that graduates of most CS programs are incompetent and that employers have to go through a period of re-education. Whereas Dewar sees the problem more in the fact that core principles are not being taught to students, Dan Reed makes the case that core principles are really not necessary for everyone.

Both viewpoints are nuanced and so lumping them into polar categories like that results in major inaccuracies. Reed is not making the point that students shouldn’t be taught about operating systems and Dewar is not making the point that students must be taught assembly language. While many CS graduates are incompetent, learning about operating systems and compiler design is totally worthless to most programmers. Sure, there are certain skills that could applied to other areas and learning stuff like that will give you an appreciation for the various aspects of the field, but most programmers are never going to build a compiler or an operating system. As computer science is increasingly being applied to other fields (biology, chemistry, physics, astronomy, etc), it is crucial for new software engineers to have specific skillsets that aren’t being taught (and I mean CS skills). Reed makes the point very clearly:

First, as researchers and technologists we seek to reproduce students in our technical image, failing to acknowledge that most of our students will not develop compilers, write operating systems or design computer chips. Rather, they benefit from training in logical problem solving, knowledge of computing tools and their applicability to new domains.

Like any entrenched system (bureaucracy), it is easy for computer science educators to fall prey to the lament that “CS grads these days are not like they used to be.” I’m going to go out on an anthropological limb and say that’s a human universal. The day will come (and I think it already has) when there is just too much core CS information to feed into our brains and to continue to try to cram that into young learners is going to result in spillover and disillusionment. There will always be people capable of soaking all of it up (though they will become rarer as the volume increases), but we must be aware of the futility of over-educating. Let me be clear, in a four year program, I believe it is more of a disservice to students to give them a shallow but broad understanding of the computing field (thereby making them incompetent) than it is to give them a deeper understanding of a subfield where they will be competent but lacking in other so-called core areas.

So I have a couple off-the-cuff ideas that need to be refined but which I want to put out there. All of these core principles can be boiled down into the true essentials, the things programmers actually need to know to do their jobs. Instead of having classes on computer architecture, operating systems, compilers, etc., combine those concepts into one or two classes with a name like “Core Computing Principles.” As Reed points out, the focus should be on teaching algorithmic problem solving skills and logic. From there, students can pursue different directions like theoretical CS, natural language processing, or large-scale systems. An undergraduate education that puts a stronger focus on statistical methods would have been hugely helpful for me. Having a broad range of options that are mapped out for students who really don’t have a clue how to get there, but know basically where they want to go, would be great.

In any case, there are many views and some will side with Dewar, some with Reed. Ultimately I think the field will settle closer to Reed’s side. I’m looking forward to hearing some of the ideas the CRA-E committee that Reed mentioned (pdf) will come up with.

When you go to a search engine, you have an information need. There is something you are searching for that you can only articulate imprecisely and you do so in a few words. People are good at determining if something satisfies their information need, but not so great at stating it clearly. Librarians are trained to elicit this information need from you, by force if necessary. (Just kidding, librarian mafia, don’t hurt me!) Their method is a dialogue where they probe the various aspects of what you are searching for, what you are not searching for, what you already know about it, etc.

A search engine can’t engage in this dialogue, yet, but think about how you interact with a search engine. You start off with this information need (at whatever degree of vagueness) in mind and probably compose a short 2-3 word query. How often do you do one word queries? We’ve been trained by search engines that this rarely succeeds unless it’s a low-frequency word (or a brand name or jargon). Our first query brings up some useful stuff perhaps, but usually we see that we weren’t thinking clearly about our information need and we begin honing it over the next couple queries until we find what we need. Some people are better at forming this mental picture and stating clear queries from the beginning [citation needed], but most people need to narrow it down.

These queries we use for Google are often purely keyword queries, though sometimes we use slightly more sophisticated queries with link: or site: (etc) operators. You can make sure terms are included with the + operator and excluded with the - operator. You can even use wildcard operators (*) which can be nice (but also touchy). What you can’t do are structured queries. You can’t search for things like (nice or sweet) and (man or guy). You can’t search for words that co-occur in certain spans of documents (like 50-word windows). These things can be very helpful to an experienced researcher and having this ability over a web corpus the size of Google’s would be enormously helpful. Unfortunately, the computational and storage costs of such a thing are much higher.

So my question for you, reader, is would you even use this?  Would this be used by very many people or just the odd few researchers, paralegals, etc?  Computationally, I think Google could handle this.  The problem would come from the larger index to handle supporting such queries.  Even this would probably not be unreasonable for Google at this point.  So… why not?  My guess is the cost of doing such a thing (moderate to high) versus the customer demand (low to nil).

Am I wrong?

It is about 11 degrees Fahrenheit (-11.7 Celsius) with a wind chill of -1 (-18.3) degrees here in Pittsburgh at the moment. As such, the dogs should get sweaters, right? Well, Willow doesn’t really need it. She loves the cold, but she’s cute in it anyway.

My australian shepherd Willow in her winter sweater.

“polyteny”

Definition: the state where a cell contains all polytene chromosomes. [source]

One common tactic in spamming is to use words whose frequencies in English are really low. Frequent spamming words are easily detected and so if an email contains the words “viagra” and “pharmacy,” the chances of it being spam are really high. Words like polyteny, if they appear in any email, are probably in conversations between specialists. The hope spammers have is that low frequency technojargon will not have been weighted as spam in whatever spam filter the message encounters. In fact, this particular spam was one of the first false negatives gmail has let slip through in days.

Also, the SOTD idea is becoming a little boring to me, so I may discontinue it and only mention things that are particularly amusing.

“Prestigious Christmas gifts for only the dearest people!”

Hey, I want a prestigious Christmas gift! I’m stuck with lousy, disreputable gifts. I guess I’m not one of the elite “dearest.”

I was just reading a Wired article about the deaths of two AI researchers:  Chris McKinstry and Pushpinder Singh.  Both were working on strong AI (or at least, had the hope of it).  Both committed suicide and did it within a month of each other.  McKinstry claimed that his system would be aware in a short time.  If GAC ever became aware, it has vanished into the cloud.  So all very interesting and I recommend the article.  Not if you want a serious read about the topics they researched, but it presents an interesting narrative of two lives with eerie parallels.

What inspired this post is a minor quibble about a word that many English speakers have surely heard:  Wunderkind.  In German, it literally means “wonder child” and is often applied in English to a child prodigy or a young person whose star is on the rise.  Here is an excerpt from the Wired article:

Push, as everyone called him, had also taught himself to code — first on a VIC-20, then by making computer games for an Amiga and an Apple IIe. His father, Mahender, a topographer and mapmaker who had studied advanced mathematics, encouraged the wüenderkind. Singh was brilliant, ambitious, and strong-willed. In ninth grade, he had created his own sound digitizer and taught it to play a song he was supposed to be practicing for his piano lessons. “I don’t want to learn piano anymore, I want to learn this,” he said. [emphasis mine]

When you have a German vowel with an umlaut, it is rendered in English orthography as the vowel + e.  So ü would be written in English as ue.  Wunderkind has no umlaut in German, so this would not be necessary.  Plus, you wouldn’t have to add the e anyway since they already included the umlaut.  Shoddy editorial work, but it made me lol.

I go through my spam everyday to make sure that false positives don’t get deleted. For whatever reason, stuff coming from the Help Desk at CMU gets labeled as spam a lot. I’m not saying it sounds like word salad (*cough*), but it trips off gmail’s spam sensors. The good thing about gmail is a low false negative rate, the bad thing is a fairly high false positive.  And if you weren’t already aware, word salad is the name given to the jumble of unrelated, often obscure words that appear in a spam email to throw off spam filters.

The various spam messages I get never fail to amuse me in some way, so why not share them with you, my innocent reader would rather never see another spam title again? Ages ago, I was especially amused by two bits of spam that actually had lines from Robert Jordan’s Wheel of Time series as subject lines. I captured an image of the second one, but the first is lost forever and I haven’t noticed one since (click on it if it’s too small to read).

Twice the Dragon, for the price he must pay.

So the inaugural Spam of the Day (SOTD, rhymes with sotted):

“Try the new manpower candy!”

This is the question I will have to answer over the next few weeks.

One of my classes this semester is the Advanced Machine Translation Seminar (and I hope that link works outside of CMU). Each of us who has registered for the class will present a certain topic in MT and then do a literature review about it by the end of the semester. Originally I had wanted to cover how word sense disambiguation (WSD) has been applied to statistical machine translation, but that overlapped with another topic on bringing in context to MT. In simple terms, WSD is just the task of figuring out which of the many definitions a word has applies in the given circumstances. WSD systems use the context around the word to determine its sense. Thus, it is just another way of bringing context into MT. We determined there was no clear way of separating the topics so that I could still do that, so since mine was the more specific it seemed reasonable to me that I should change topics. No one else is presenting on machine translation evaluation (MT Eval), so I opted for that.

MT Eval is actually a pretty vibrant topic at the moment. For some quick background, machine translation systems produce woefully inadequate translations much of the time. If you have any doubt of this, try to translate a random web page using any of the many free online services. You will get many disfluencies, untranslated words, downright gibberish, and much worse. Not all of it will be bad, of course, but much of it will be. It is a hard problem, and many MT researchers believe it to be AI-complete (the Wikipedia article mentions MT explicitly). In order to improve machine translation, you need some way to automatically evaluate how well you are doing. Currently this is done using automatic metrics that compare machine output to (usually multiple) human translations (aka reference translations). The most commonly used metric is BLEU (pdf), but a rising star is METEOR, developed in part by one of my professors. I won’t go into these metrics any further here at the moment, and I recommend interested parties check out the papers. What these metrics aim to do is gauge how similar the machine output is to the reference translation(s).

The problem with MT Eval is that in order to be able to automatically tell whether something is a good translation, we would have to know exactly what goes into making a good translation (and by good I mean human-level). If we could do that, we would have solved MT!

More to come.

DARPA (the Defense Advance Research Projects Agency, see disclosure note below) is known for going out on a limb with some of its ideas.  I am simultaneously intrigued by many of the ideas and research projects they propose (and fund), but also torn by the fact that they are an integral part of the military industrial complex.  Moral dilemmas!

Anyhow, as regular readers of this blog might know, I am a fan of airships.  There is just something about a floating behemoth and the idea of living in the sky that stirs something deep within me.  The particular airborn behemoth that has sparked this post is a spy blimp envisioned by DARPA that appears to be getting the go ahead.  This blimp would be the size of a 15-story hotel, float about 17 miles above the ground, and would serve as a comm relay, radar, and scout for the military.  The robotic monster could spot enemies on the ground 180 miles away.  At the moment the technology appears to be here, but they need funding from one of the armed services.

How long would it stay in the air at a time?

10 years!

Full disclosure:  My research is funded by a grant from DARPA under the RADAR project.

My longtime friend over at the Wrathful Dove has an excellent post today on the lack of superness in this so-called Super Tuesday, and I wanted to give it a plug.  Here is a brief excerpt that I thought sheds light on the charade that we call “elections” in America:

I was reading the “issues” section of the Atlanta Journal Constitution on Sunday where there was an entire article devoted to comparing the musical selections of the candidates to see what exciting insights this exercise might provide. The same article also subtly observed the importance of selecting a candidate who seems likely to win in November, effectively reducing elections down to the horse race terms in which it is often framed in the corporate media.

These elections are a sham and an obscene circus.

Every four years the American public gets to select its master-in-chief from a narrow field of candidates who fiercely compete and debate within a very narrow range so as to give the illusion of choice and dialog while keeping the true options fixed to those acceptable and profitable to corporate America.

Check out his blog for the rest of the post.

Ed Clarke, a professor of Computer Science at CMU, just won the 2007 ACM Turing Award.  The ACM is the Association for Computing Machinery and is the oldest professional group for the computing industry.  I first became a member in 2005 and have maintained that membership since.  The Turing Award is given in honor of Alan Turing, the father of computer science (most would agree).  This award is basically the Nobel prize of computer science (since they don’t give Nobels for CS) and is meant to recognize individuals who have made a lasting and significant contribution to the computing field.

Ed’s work was in conjunction with two other people:  E. Allen Emerson and Joseph Sifakis.   Their work was on model checking, which is a way of determining whether a hardware or software structure is a model of a logical formula.  So if a structure matches a formula in propositional logic, it checks.

Clarke joins three other professors at CMU who are Turing recipients.  Raj Reddy was co-awarded it in 1994 for large scale AI systems.  Manuel Blum won it in 1995 for his work on computational complexity theory.  Dana Scott won it in 1976 for non-deterministic finite state machines, something that has a major role in natural language processing (and computational linguistics).

Daedalus loves getting up on the window sill, the better to sniff at his treat jar. He can also look out of our third story window to see what there is to be seen. And bark at it.

My lemon beagle Daedalus on the window sill — caught trying to get into the treat jar

President Omar al-Bashir of the Sudan is one of those people who really are scumbags through and through.  Speaking of scumbags, he has appointed Musa Hilal as an advisor to the minister of internal affairs (ethnic matters).  Musa Hilal’s beliefs about ethnic tend towards the clean side.  That is, ethnic cleansing.  So you can imagine the outrage such an appointment would be.  It would be like President Bush appointing David Duke to be — well, anything.  The US government doesn’t have much sway in Khartoum, but China does, since they are Sudan’s biggest ally (anyone surprised?).

So if you’re an American, you can influence your representative to ask China to stop protecting the Sudan.  The email that will be sent is reproduced below the jump.

Read the rest of this entry »

If there ever is a robot uprising, I fear I may be at ground zero. In a case where reality mirrors art (kinda sorta), Carnegie Mellon researchers (including Seth Goldstein) are working on a swarm of small robots held together by magnetic fields. This will allow them to take on just about any shape. Of course, this is still a long ways off. What Seth et al are currently working on is a control strategy for said microbots. This touches on one of the most fascinating aspects of computer science to me: emergent behavior. Imagine designing an algorithm that will allow a swarm of small robots to do (collectively) a complex task with each robot only obeying simple rules. Good times!

But I would be remiss in my duties if I failed to point out the amusing end-of-the-world aspects of this particular bit o’ research. Seth says:

“I’ll be done when we produce something that can pass a Turing test for appearance. You won’t know if you’re shaking hands with me or a claytronics copy of me.”

Seth, I think we’ll all be done when that day comes. Build a thousand of these claytronic cylons and they will overthrow the world’s most powerful military government (aka USA) in a few short hours. Once the danger has been identified, the following dialogue might ensure at the White House:

The Chairman of the Joint Chiefs says, out of breath, “Madame President, what are your orders?”

“Declare immunity to the Homo claytronae and stand down all forces.”

“Wha-?”

The Secretary of State steps forward, face rippling, “You heard her. Now on your knees, meatsack.”

Ahh. A boy can dream.

CNN is reporting that Microsoft is making eyes at Yahoo! to the tune of $31 per share, or about $44.6 billion. If such a deal ever materialized, it would definitely make things interesting for Google. Personally I consider both Microsoft Live search and Yahoo to be inferior products to the Google, but two wrongs make a right, wrong? [hat tip] There has been talk the SEC might try to block such a move due to monopoly worries. I’m not convinced there is anything to worry about, but what do I know.

What I am interested in knowing, though, is how this will affect both Microsoft and Yahoo’s research arms. Will they become bigger and better than ever or will there be some cuts? I certainly hope the former is true.

Update

Check out the comments on the Google Blogoscoped article regarding the monopoly worries. I just read them after posting and they pretty much shoot down the idea of an SEC action on those grounds.

About Me

Jason M. Adams

My name is Jason M. Adams and I recently graduated with my masters from the Language Technologies Institute at Carnegie Mellon University. My main areas of research were with recommender systems and word sense disambiguation. Now I am on the job market. And I am obsessed with my two dogs.

Calendar

February 2008
S M T W T F S
« Jan   Mar »
 12
3456789
10111213141516
17181920212223
242526272829  

Archives

Site Statistics

  • 68,433 reads

Site Information

Contact me: jaso...@gmail.com

Creative Commons License

This work by Jason M. Adams is licensed under a Creative Commons Attribution 3.0 License.

Header image credit seakwenby.

Random Crap

i am a super geek

Science Blogs - Blog Catalog Blog Directory