You are currently browsing the monthly archive for May, 2008.
So I decided to finally fart around with OpenCalais a little. There’s a nice video on the site that gives you an impression of what it is capable of, but it’s also like all videos about software: propaganda. Calais is basically Named Entity Recognition (NER) software that can be accessed via a web API. Whereas a regular NER system might recognize named entities like people, organizations, and places, Calais also recognizes relationships like corporate acquisitions. To be a little more clear if you aren’t familiar with NER, it is basically the task of identifying the proper nouns in a body of text. Named entities aren’t always proper nouns, but that is one starting point. Examples would be: John Hancock (Person), New York (Place), and Apple (Organization). Calais recognizes relationships, which means we get an extra layer of information: Acquisition(Microsoft, Yahoo!).
Calais is put out by Reuters which has a long history of helping out the NLP and IR research communities with data sets. Being Reuters, the data sets are all newswire stuff, and Calais is produced in that spirit. Currently the relationships and named entities available reflect that bias, but the list is expanding and it is probably flexible enough for most domains. Their claim is that with each new release, there will be additional entities and relationships available. Also, the software is completely open source free for commercial and private use. For this, I give Reuters props.
OpenCalais uses SOAP or HTTP post to issue requests and you can take a look at their tutorials for exactly how to use it. After some very shallow digging on the googles, I found an open source project called python-calais, which is basically just a script that wraps some text and sends it to the Calais service, then processes the output. The output is in RDF (resource description framework), which is a type of xml document that is not very friendly to the human eye but is nice and powerful otherwise. The python-calais script uses an rdf library for python, so you’ll need to download that if you don’t already have it.
Running it on my most popular post, you get the following output:
93B6642D-0D7C-37Ab-A92F-66Ebfef13C8D :: Recommender Systems (Industryterm)
0Dccb106-442A-3848-Bd0B-A388E73F4C8C :: Chris Sternal-Johnson (Person)
Aab0D16A-Ad5A-348A-A8Dc-58Cf59A1Bc15 :: Kristina Tikhova (Person)
42F476A0-2Fae-3F36-808D-803E4F620Ab0 :: Java (Technology)
6C4Cd5D9-5866-35B5-81Ab-B8A5C1751A44 :: Pre-Processing Phase (Industryterm)
4003D863-C7A6-3E6F-8E3C-0913Bf2F8242 :: National Aeronautics And Space Administration (Organization)
77D1Ceb3-9900-3Dd7-8351-F29408B21412 :: Carnegie Mellon University (Organization)
Ee58Ef4B-1C98-3F8B-Aff8-3Fd6E3D76A9E :: Wonderful Site (Industryterm)
8F12E551-A8F1-3705-866C-D44D1A6A54F4 :: Richard M. Hogg (Person)
Adee23De-B1B0-37Ad-9E20-1Fa8094F6D39 :: Steel (Industryterm)
0Ace00C6-2B9F-32C2-8949-82A0F6C6B444 :: Xml (Technology)
2Ed2F085-1C63-324E-B518-60332388E273 :: Norman French (Person)
136157D8-D62E-3C55-Ae67-3Ec182C2C703 :: Phil Barthram (Person)
B6A8Dbfa-Fd35-32Bb-9E05-A2811C480000 :: Mike Tan (Person)
Ed8B5Fe4-616A-36Ea-8C47-3Eea7C71Aee0 :: Ben Eastaugh (Person)
D3Bcba58-00Fc-33C5-9346-Dbf6A2441867 :: Machine Learning (Technology)
F17C3779-3810-3Ff9-A42D-75C3137F0F7F :: Modern English (Person)
38116E8D-F8B4-3D03-B0Ad-C9A24B888E61 :: Jason M. Adams (Person)
4386B07C-F6B8-3991-Af74-Ab11A951F0Ee :: David Petar Novakovic (Person)
Aa14303F-F9F0-31B8-Adff-3B9C68E0A9F1 :: Language Technologies Institute (Organization)
Ca1E4Eb7-7820-3862-8443-26E37B33E13F :: Machine Translation (Technology)
As it picks up everything on the page, there is a lot included there that isn’t related to the post about Old English translation. Also, it picks up some weird so-called industry terms like “steel.” If you filter out just the text (manually), the output is a little more sensible:
6C4Cd5D9-5866-35B5-81Ab-B8A5C1751A44 :: Pre-Processing Phase (Industryterm)
Ca1E4Eb7-7820-3862-8443-26E37B33E13F :: Machine Translation (Technology)
0Ace00C6-2B9F-32C2-8949-82A0F6C6B444 :: Xml (Technology)
2Ed2F085-1C63-324E-B518-60332388E273 :: Norman French (Person)
136157D8-D62E-3C55-Ae67-3Ec182C2C703 :: Phil Barthram (Person)
F17C3779-3810-3Ff9-A42D-75C3137F0F7F :: Modern English (Person)
(The codes are unique identifiers.) Unfortunately, some important terms are still missed, like Old English. So it appears Calais has some growing to do, but it’s off to a good start. Part of the problem might be that that blog post is out of domain. I imagine with time, it will continue to improve. We’ll see.
While at Oak Island, we visited the North Carolina Aquarium. It was decent, nothing on the Baltimore Aquarium, but it had some cool stuff. In the southeastern swampland exhibit, there was a very specific sign (below) about what not to do on the plants. When I trundled on the venus flytrap three feet away, I had the perfect excuse, but they didn’t buy it…
There was also an outdoor pond with a bale of baby turtles and some carp.
This past weekend at Oak Island, NC was extremely fun. My sister got married, I got to catch up with family and friends, and I got to bask in the beauty of the ocean and great weather. It was very depressing to leave. The beach house we stayed in was surprisingly nice. The game room had a large poker table and there was an enormous hot tub. If you think sitting in a hot tub looking out on the ocean is cool, you’d be right. The wedding itself took place right on the beach. The background was punctuated by the odd pelican plunging into the surf, shooting up large sprays of water.
I think that Memories of the Sun would be a fitting name for a Pittsburgh blog. When I was younger, I loved dark, cloudy weather. I’ve never been a fan of the steel gray overcast that doesn’t change, but it didn’t bother me much. After experiencing it nearly nonstop for the past few months, it gets to you a bit. I didn’t fully realize that until I was standing outside today, in the park, with the sun out, and the sky deep blue beside swiftly moving cumulus clouds. I missed the sun.
My favorite Pittsburgh weather comes at the beginning of autumn, when the temperature begins to drop and the leaves are changing. It doesn’t seem to rain quite as much then and it can be quite beautiful.
I’ll be travelling south this week, to Oak Island, NC for my sister’s wedding. The North Kakalakee beach in early summer (which begins in April in the South) should be nice and refreshing. And just as summer begins there early, spring gets started late here. It has been less than 60 degrees out for most of the past week.
Suffice it to say, probably no blogging for the next week.
My taste in music is definitely in flux. Five years ago I would have found this intolerable, but now I can’t stop listening to it. I blame Pandora. The musical journeys it takes you on can be transformational.
Unfortunately the video stops before the song is over, but YouTube offers several full length suggestions immediately after. The videos themselves are all insane, so I didn’t want to endorse any. I just listen to the sound track in another tab and don’t watch them.
This question was a central theme in the movie The Nines, which I recommend. It also came up in Revolver, which I just watched tonight, though it wasn’t asked explicitly. Instead, the question is who is your worst enemy? The movie’s position is that it is not external, but internal. I think I can say that without spoiling anything. The trick is to avoid the lie that your perception is infallible. Pulling that off is a different matter altogether, though it is a helpful trait for a good scientist.
Figured I’d post this promo video the GWAP group did. Unfortunately, I wasn’t able to participate in the filming of it since I was visiting my dad and family in Ohio for the first time after many years. So unfortunate in that I missed the filming, but the alternative was worth it. Johnny Lee had a not insignificant role in the making of the video, I believe. Check out his stuff if you haven’t, he’s doing some pretty amazing things with Wii remotes.
I attended a Matlab training seminar yesterday with the dual topics of “Advanced Matlab Programming” and “Distributed and Parallel Computing.” Of the two, the Advanced section was more interesting, though my original motivation for going was the parallel computing part. In the morning, I felt like it was going to be a waste because my Matlab programming skills are weak, and if my advisor had not strongly suggested I attend, I might’ve skipped it. I’m glad he did, because it was surprisingly enjoyable and I felt like it was right on my level. This might be because programming in Matlab isn’t especially hard or different from other programming languages and I know enough to get by already. Or it might be because Matlab is becoming a little more like Python.
Today is the official opening day of GWAP: Games with a Purpose. This is one of two research projects I have been working on for the past few months, though my involvement with GWAP so far has only been in the form of attending meetings, minor testing, and offering my sage gaming advice (and by sage, I mean the herb). GWAP is the next phase in Luis von Ahn’s human computation project. If you visit and play some games, not only will you be rewarded with a good time, but you’ll be helping science! Science needs you. To play games. Now.
The Idea
Artificial intelligence has come a long way, but humans are still far better at computers at simple, everyday tasks. We can quickly pick out the key points in a photo, we know what words mean and how they are related, we can identify various elements in a piece of music, etc. All of these things are still very difficult for computers. So why not funnel some of the gazillion hours we waste on solitaire into something useful? Luis has already launched a couple websites that let people play games while solving these problems. Perhaps you’ve noticed the link to Google Image Labeler on Google Image Search? That idea came from his ESP game (which is now on GWAP).
The Motivation
What researchers need to help them develop better algorithms for computers to do these tasks is data. The more data the better. Statistical machine translation has improved quite a bit over the past few years, in large part due to an increased amount of data. This is the reason why languages that are spoken by few people (even those spoken by as few as several million) still don’t have machine translation tools: there is just not enough data. More data means more food for these algorithms which means better results. And if results don’t improve, then we have learned something else.
The Solution
Multiple billions of hours are spent each year on computer games. If even a small fraction of that time were spent performing some task that computers aren’t yet able to do, we could increase the size of the data sets available to researchers enormously. Luis puts this all a lot better than I can, and fortunately, you can watch him on YouTube (below).
So, check it out already.
The standard way of doing human evaluations of machine translation (MT) quality for the past few years has been to have human judges grade each sentence of MT output against a reference translation on measures of adequacy and fluency. Adequacy is the level at which the translation conveys the information contained in the original (source language) sentence. Fluency is the level at which the translation conforms to the standards of the target language (in most cases, English). The judges give each sentence a score for both in the range of 1-5, similar to a movie rating. It became apparent early on that not even humans correlate well with each other. One judge may be sparing with the number of 5’s he gives out, while another may give them freely. The same problem crops up in recommender systems, which I have talked about in the past.
It matters how well judges can score MT output, because that is the evaluation standard by which automatic metrics for MT evaluation are judged. The better an MT metric correlates with how human judges would rate sentences, the better. This not only helps properly gauge the quality of one MT system over another, it drives improvements in MT systems. If judges don’t correlate well with each other, how can we expect automatic methods to correlate well with them? The standard practice now is to normalize the judges’ scores in order to help remove some of the bias in the way each judge uses the rating scale.
Vilar et al. (2007) propose a new way of handling human assessments of MT quality: binary system comparisons. Instead of giving a rating on a scale of 1-5, they propose that judges compare the output from two MT systems and simply state which is better. The definition of what constitutes “better” is left vague, but judges are instructed not to specifically look for adequacy or fluency. By mixing up the sentences so that one judge is not judging the output of the same system (which could introduce additional bias), this method should simplify the task of evaluating MT quality while leading to better intercoder agreement.
The results were favorable and the advantages of this method seem to outweigh the fact that it requires more comparisons than the previous method required ratings. The total number of ratings for the previous method was two per sentence: O(n), where n is the number of systems (the number of sentences is constant). Binary system comparisons requires more ratings because the systems have to be ordered: O(log n!). In most MT comparison campaigns the difference is negligible, but it becomes increasingly pronounced as n increases.
What would be interesting to me is a movie recommendation system that asks you a similar question: which do you like better? Of course, this means more work for you. The standard approaches for collaborative filtering would have to change. For example, doing singular value decomposition on a matrix of ratings would no longer be possible when all you have are comparisons between movies. Also, people will still disagree with themselves (in theory). You might say National Treasure was better than Star Trek VI, which was better than Indiana Jones and the Last Crusade, which was better than National Treasure. You’d have to find some way to deal with cycles like this (ignoring it is one way).
References
Vilar, D., G. Leusch, H. Ney, and R. E. Banchs. 2007. Human Evaluation of Machine Translation Through Binary System Comparisons. In Proceedings of the Second Workshop on Statistical Machine Translation. 96-103. [pdf]
I attended some of the final presentations of an undergrad class on Game Programming today with a friend. We went in expecting something more like a poster session, where people are arrayed around a room showing their work off to a few people who managed to crowd around them. The poster session is ideal for brief browsing, because you can skip anything you’re not interested in. Instead, it was a series of power point presentations followed by an on-screen demo.
Mayhaps you have used the Facebook app Likeness. It’s a fluff app, but has wide appeal since it does two things most people like: easy quizzes and comparisons with our friends. The graphic design that went into the app is a bit low-scale, but it gets the job done. If you haven’t used it, the concept is simple. You are presented with a quiz topic, like “What’s your addiction?” You are then presented with ten items that you must rank in the order specified by the question page (usually most to least favorite, or whatever). Once you have ranked the ten items, you are shown a screen that easily allows you to goof up and spam all your friends. But after that, it produces some sort of similarity score between you and all your friends who have taken it. I’ve never had a similarity below 46% and never one above 98%.
But it got me thinking, how exactly are they measuring this similarity?
I wonder how many blog posts have this title? It’s just the catchy thing we bloggers love. I originally started with “Rowling Howling” and Google was saying twelve results (from blogs) but only three recently. I updated it to yowling when I saw there were no results from Google Blog Search.
Anyhow, Orson Scott Card, author of my beloved Ender’s Game, has a nice diatribe (oxymoron?) against J. K. Rowling’s latest misdeed (and I’m just hearing about this). The word diatribe often has an unsavory connotation against the issuer of said diatribe, but I want to be clear from the start that I think Card is perfectly in the right.
Apparently, some poor schmuck published a book that acts as a reference for the Harry Potter series called Harry Potter Lexicon. Said schmuck, according to Rowling, simply rearranged her work and so it represents a violation of copyright. The terrible yowling that JKR has committed during the course of this utter debacle is truly shameful. I loved the Harry Potter series, but since its completion, she is going downhill. I’m going to have to agree with Card here, I think she wants to be taken seriously. Why can’t people be content with mad cheddar? Would people be happier if they had the respect of millions of people rather than millions of dollars? People always say money can’t buy happiness, and it would seem to be correct, since she is nickle-and-diming this poor fool who raised his head an inch above the crowd-line. Just for once, I’d like someone to prove it to me (by giving me millions of dollars). The deal is, if I stay happy, I get to keep the money.
And completely unrelated, I decided I liked the word “crowd-line” and checked to see if it’s available. Unfortunately, the .com variation is taken, though .org is free. Estibot guesses the .com is worth $140 (compare that to mendicantbug.com, which is worth a whopping $340). Crowd-line with a dash dot com is available, though.. Interestingly, Go Daddy is selling .info domains for $0.99 a year. Is that because they are trash and the refuge of spammers and online biz marketers? The only domain extension more reprehensible is .biz itself, which they are selling for even more. The day I type a .biz address into an address bar is the day I leave the interwebs for good.
I think end-of-semester stress is making me grumpy.









