You are currently browsing the category archive for the 'computer science' category.
Figured I’d post this promo video the GWAP group did. Unfortunately, I wasn’t able to participate in the filming of it since I was visiting my dad and family in Ohio for the first time after many years. So unfortunate in that I missed the filming, but the alternative was worth it. Johnny Lee had a not insignificant role in the making of the video, I believe. Check out his stuff if you haven’t, he’s doing some pretty amazing things with Wii remotes.
I attended a Matlab training seminar yesterday with the dual topics of “Advanced Matlab Programming” and “Distributed and Parallel Computing.” Of the two, the Advanced section was more interesting, though my original motivation for going was the parallel computing part. In the morning, I felt like it was going to be a waste because my Matlab programming skills are weak, and if my advisor had not strongly suggested I attend, I might’ve skipped it. I’m glad he did, because it was surprisingly enjoyable and I felt like it was right on my level. This might be because programming in Matlab isn’t especially hard or different from other programming languages and I know enough to get by already. Or it might be because Matlab is becoming a little more like Python.
Today is the official opening day of GWAP: Games with a Purpose. This is one of two research projects I have been working on for the past few months, though my involvement with GWAP so far has only been in the form of attending meetings, minor testing, and offering my sage gaming advice (and by sage, I mean the herb). GWAP is the next phase in Luis von Ahn’s human computation project. If you visit and play some games, not only will you be rewarded with a good time, but you’ll be helping science! Science needs you. To play games. Now.
The Idea
Artificial intelligence has come a long way, but humans are still far better at computers at simple, everyday tasks. We can quickly pick out the key points in a photo, we know what words mean and how they are related, we can identify various elements in a piece of music, etc. All of these things are still very difficult for computers. So why not funnel some of the gazillion hours we waste on solitaire into something useful? Luis has already launched a couple websites that let people play games while solving these problems. Perhaps you’ve noticed the link to Google Image Labeler on Google Image Search? That idea came from his ESP game (which is now on GWAP).
The Motivation
What researchers need to help them develop better algorithms for computers to do these tasks is data. The more data the better. Statistical machine translation has improved quite a bit over the past few years, in large part due to an increased amount of data. This is the reason why languages that are spoken by few people (even those spoken by as few as several million) still don’t have machine translation tools: there is just not enough data. More data means more food for these algorithms which means better results. And if results don’t improve, then we have learned something else.
The Solution
Multiple billions of hours are spent each year on computer games. If even a small fraction of that time were spent performing some task that computers aren’t yet able to do, we could increase the size of the data sets available to researchers enormously. Luis puts this all a lot better than I can, and fortunately, you can watch him on YouTube (below).
So, check it out already.
I attended some of the final presentations of an undergrad class on Game Programming today with a friend. We went in expecting something more like a poster session, where people are arrayed around a room showing their work off to a few people who managed to crowd around them. The poster session is ideal for brief browsing, because you can skip anything you’re not interested in. Instead, it was a series of power point presentations followed by an on-screen demo.
A French-built supercomputer beat a 5 dan Go master in France a couple weeks ago. Go is a game I became very interested in in January 2007. I played several thousand games between then and a month ago, when I deleted my account on an online turn-based Go server. My reason for quitting was that it was taking too much time I should be using for studying, and I was letting it frustrate me too much. Go is a game that requires mental peace. You know how when you became a Jedi, you had to let go of your anger? Same helps for Go. I’ll take it back up again at some point, because it is a great mental exercise, but my obsession was just becoming too great.
The reason I picked up Go in the first place was that it remained outside the reach of computers. Of course, it was only a matter of time before it too fell. And actually, it hasn’t yet. Just because it beat a 5 dan French master, doesn’t mean it can beat a 9 dan master from China. So we’ll see.
The method this system used to beat said master was a Monte Carlo method. These are brilliantly simple in theory. You basically generate a multitude of random games for a set of moves and score each resulting game state. The next move with the best scoring set of random game states is chosen. This can also be thought of as voting. A set of random models each vote for a move. The most (or strongest) votes win. And when 10,000 monkeys agree…
My blogarrhea has tapered off over the past few weeks due to a number of time constraints, stress, and anxiety. The wikipedia article for graphomania actually suggests blogging may be a form of graphomania. Currently, I’m visiting the University of Maryland tomorrow to take a look at the (cs) grad school and the CLIP (Computational Linguistics and Information Processing) laboratory. So maybe I won’t blog anything for a couple days or maybe not.
Anyhow, here are two views out of my hotel room of illustrious Silver Springs, Maryland. The direct view out is the roof of a Whole Foods.


So I was recently asked (and gave a very bad answer to) a question that has been haunting me ever since. What is the subfield of computer science where I am the strongest? First of all, in my undergraduate training, I was never really introduced to these ideas of subfields of CS explicitly. I knew intuitively there was a difference between people working on databases or on operating systems, programming languages or algorithms, but it wasn’t emphasized as a choice I would ever need to make. This is perhaps because I went to a relatively weak school in CS for my undergrad. But now that I’m in a rather strong CS school and pursuing a CS-related masters, the question should probably have entered my mind before now.
So when asked, I floundered about for an idea and spluttered out “algorithms” just because it seemed like it was hard to go wrong there. Well, I’ll leave the details out of this little memoire, but suffice it to say, I was wrong. A better answer would have been “none.” Where does natural language processing / computational linguistics fall in the list of subfields? Is it its own? Or is it part artificial intelligence, part algorithms, part whatever? I’ve seen it lumped with AI more closely in the past, but unfortunately AI escaped me as a possible choice when called upon in this high-stress scenario. Moreover, I haven’t really compartmentalized techniques as belonging to “AI” or “databases.” Is it useful to do that? I guess I do sometimes, but when people ask me to make big picture assessments of things I haven’t thought about much, it takes me a while to process it.
I hate interviews.
Dan Reed has posted an interesting article both on his blog and on the Computing Research Policy Blog about the many problems in computing education. Ever since the dotcom bubble burst, computer science enrollment at universities has declined and even more so for women. So many ideas have been tossed around out there, trying to figure out just where we’re going wrong. Recently I wrote about Robert Dewar’s views on where CS education has failed. He made the case that graduates of most CS programs are incompetent and that employers have to go through a period of re-education. Whereas Dewar sees the problem more in the fact that core principles are not being taught to students, Dan Reed makes the case that core principles are really not necessary for everyone.
Both viewpoints are nuanced and so lumping them into polar categories like that results in major inaccuracies. Reed is not making the point that students shouldn’t be taught about operating systems and Dewar is not making the point that students must be taught assembly language. While many CS graduates are incompetent, learning about operating systems and compiler design is totally worthless to most programmers. Sure, there are certain skills that could applied to other areas and learning stuff like that will give you an appreciation for the various aspects of the field, but most programmers are never going to build a compiler or an operating system. As computer science is increasingly being applied to other fields (biology, chemistry, physics, astronomy, etc), it is crucial for new software engineers to have specific skillsets that aren’t being taught (and I mean CS skills). Reed makes the point very clearly:
First, as researchers and technologists we seek to reproduce students in our technical image, failing to acknowledge that most of our students will not develop compilers, write operating systems or design computer chips. Rather, they benefit from training in logical problem solving, knowledge of computing tools and their applicability to new domains.
Like any entrenched system (bureaucracy), it is easy for computer science educators to fall prey to the lament that “CS grads these days are not like they used to be.” I’m going to go out on an anthropological limb and say that’s a human universal. The day will come (and I think it already has) when there is just too much core CS information to feed into our brains and to continue to try to cram that into young learners is going to result in spillover and disillusionment. There will always be people capable of soaking all of it up (though they will become rarer as the volume increases), but we must be aware of the futility of over-educating. Let me be clear, in a four year program, I believe it is more of a disservice to students to give them a shallow but broad understanding of the computing field (thereby making them incompetent) than it is to give them a deeper understanding of a subfield where they will be competent but lacking in other so-called core areas.
So I have a couple off-the-cuff ideas that need to be refined but which I want to put out there. All of these core principles can be boiled down into the true essentials, the things programmers actually need to know to do their jobs. Instead of having classes on computer architecture, operating systems, compilers, etc., combine those concepts into one or two classes with a name like “Core Computing Principles.” As Reed points out, the focus should be on teaching algorithmic problem solving skills and logic. From there, students can pursue different directions like theoretical CS, natural language processing, or large-scale systems. An undergraduate education that puts a stronger focus on statistical methods would have been hugely helpful for me. Having a broad range of options that are mapped out for students who really don’t have a clue how to get there, but know basically where they want to go, would be great.
In any case, there are many views and some will side with Dewar, some with Reed. Ultimately I think the field will settle closer to Reed’s side. I’m looking forward to hearing some of the ideas the CRA-E committee that Reed mentioned (pdf) will come up with.
Ed Clarke, a professor of Computer Science at CMU, just won the 2007 ACM Turing Award. The ACM is the Association for Computing Machinery and is the oldest professional group for the computing industry. I first became a member in 2005 and have maintained that membership since. The Turing Award is given in honor of Alan Turing, the father of computer science (most would agree). This award is basically the Nobel prize of computer science (since they don’t give Nobels for CS) and is meant to recognize individuals who have made a lasting and significant contribution to the computing field.
Ed’s work was in conjunction with two other people: E. Allen Emerson and Joseph Sifakis. Their work was on model checking, which is a way of determining whether a hardware or software structure is a model of a logical formula. So if a structure matches a formula in propositional logic, it checks.
Clarke joins three other professors at CMU who are Turing recipients. Raj Reddy was co-awarded it in 1994 for large scale AI systems. Manuel Blum won it in 1995 for his work on computational complexity theory. Dana Scott won it in 1976 for non-deterministic finite state machines, something that has a major role in natural language processing (and computational linguistics).
Just what value is there in getting a degree in Computer Science (CS)? Are new graduates competent programmers? Is that the purpose of a CS degree? Should companies be spending money to train new hires out of college in the programming languages and practices that they use?
Robert Dewar is a professor emeritus at NYU in computer science, and he believes that the status of software engineers in America is in danger due to general incompetence of new graduates. The long and the short of it is that after the dot-com bubble burst, and computer science enrollment at universities plummeted, schools restructured their programs to be more fun. Essentially, they were dumbed down. Specifically, the focus has shifted away from math and the theory of computation. Students are not taught a wide range of programming practices, but instead are trained to rely on large software libraries in a sort of “cookbook” approach. That is, students can assemble a solution to a known problem (in Java), but they are woefully undertrained for solving actual problems in the wild with “more practical” programming skills.
Today is Donald Knuth’s 70th birthday. If you haven’t at least heard of him, then you probably are not a programmer. I’ve heard several bloggers refer to him as a modern-day Alan Turing (who is widely considered the father of computer science). Knuth is sometimes referred to as the father of algorithmic analysis, so at the very least, his contributions to the field should definitely earn him a place of high regard.
While I’ve never read any of his books, I have used one of the tools he created quite extensively in the past two years: TeX. For those who’ve never had the pleasure of using TeX and seeing documents come out beautifully and professionally formatted with relatively little effort, you’re missing out. Some might argue that you’re missing out on hours of headaches for something you could do in Microsoft Word in 1 minute. I would argue back that while getting TeX to do exactly what you want can sometimes be hard, there are things you can do in TeX very easily that you will never, ever be able to do in Word. Try producing a lower case delta with a hat in Word. Unless you are lucky enough to have a font on your computer with it (and please send me a copy of that font if you do), you will be searching a long time.
There are many Knuth tributes out there from people with far more interesting stories than me. There was an even a call to post, issued by Jeff Shallit. Here are a few:
-
Recursivity - biographical notes and discussion of Knuth’s impact on his life (Jeffrey Shallit)
-
Computational Complexity - some observations about his achievements, his books, and TeX
-
Good Math, Bad Math - a lot about TeX if you’re interested
-
Geomblog - a discussion of something from the second volume of his book The Art of Computer Programming
-
Shtetl-Optimized - more in-depth observations of Knuth’s many contributions
-
in theory - more biographical info and background
-
0xDE - a pretty remarkable Knuth tribute with some very interesting CS stuff, complete with exercises!
Greg Linden and Daniel Lemire have both written a little about the Netflix Prize and whether the systems that are doing the best are really worth anything. The KorBell system that recently won the Progress Prize consists of 107 different parts in an ensemble system (Note: the team of Bob Bell and Yehuda Koren at AT&T goes by BellKor and KorBell on the Netflix leaderboard). The paper is interesting for two reasons: the ensemble method being used and the fact that only about 3 or 4 of those components are doing the heavy lifting. Actually, I have no idea whether the actual ensemble algorithm they use would be especially interesting to anyone else, but as I have no experience with ensembles in this context, it was interesting to me.
I mentioned the esoteric programming language brainfuck a little while back. It consists of 8 operations and was created in order to make the smallest compiler in the world (I think the current best is 174 bytes). I was reading a post over on Good Math, Bad Math that defines arithmetic in terms of sets. Pretty basic if you’ve done anything with set theory, but Mark has a clear way of explaining things so I usually try to read all of his posts. I’ve been playing catch-up today. It struck me immediately how closely the set form that Mark describes matches the syntax/logical structure of brainfuck. So I decided to play around a little. Read on for more.
Either the coolest or the stupidest programming language in the world, brainfuck was designed by Urban Müller in order to create the world’s smallest compiler of a Turing-complete programming language. Originally his compiler was 240 bytes in size, but he reportedly got it down to about 200 bytes. Others have gotten it below that. The language consists of only 8 operations, which I will go into after the jump.
Japanese electronics use is perhaps a faulty bellwether for the American market. Whereas new gadgets are often available in Japan long before they make their appearance (if ever) in the US, there are also interesting cultural differences that don’t always translate popularity. There does seem to be a trend in the area of PC sales, however. An AP article today points out that PCs are taking a less important role in Japanese households with the emergence of smart phones, consoles that can reproduce many PC functions (web browsing, gaming, playing DVDs & music), and flat screen TVs (versus flat screen monitors, say). If you can check your email on your phone, listen to music on your iPod, download music on your Wii, and play games on your 52″ LCD, why would you want a computer in your home? Note: throughout this post I will use the term PC in the general sense of computer, rather than specifically as an IBM-compatible PC.
So this got me thinking about what a PC is good for and why I liked it back in the day (well, I still like it).
So I’ve been reading A New Kind of Science by Stephen Wolfram, the creator of Mathematica. It was hyped up big time back when he first wrote it, since he had gone silent for a number of years, hinting that he was about to do something big. So my middle little sister got me the book for Christmas (cuz she rocks) and I cracked it open a few times. It’s about 846 pages of text (yipes!) and then another 351 pages of notes. Quite daunting. So I put it down and have meant to pick it back up a thousand times. Today I was needing a diversion because a particular C++ issue was giving me fits.
In Chapter 2, Wolfram introduces a fairly simple 2-dimensional cellular automata (one spatial dimension, one temporal dimension). The temporal dimension can be plotted as another spatial dimension producing a nice little spreadsheet style graph. Each cell of the graph can be considered a bit. Depending on whether the bit is set, the cell is either shaded or not. So the single line in the spatial dimension contains some initial setting. Let’s say there is one single bit set in the middle of the line, so it might look like this:
000000000010000000000
Update and note: Be sure to check out the comments. It turns out that while the MacArthur site does not list Paul Rothemund as a computer scientist, he actually is. So there is indeed at least one CS genius for 2007.
Well the 2007 MacArthur fellows were announced and it looks like no computer scientists made it onto the roles this year. The last time there were no computer scientists on the list was 2001. This is probably just a blip, but if the trend continues, it might be a warning sign that computer science is producing fewer visionaries. In a time when the intertubes are booming with social networking activity, this seems odd. Perhaps the problem is that there are too many players and no one stands out, even though as a whole the products are changing the world. At least two of the geniuses are doing some work that appears to draw a lot on computer science but deals more with computational applications of DNA. That is, the DNA is the computer. Maybe this isn’t such a bad sign for CS, but a sign that CS by itself is diminishing and interdisciplinary applications of CS are on the rise. I know I’m not the first person to say that, so if you have a quote or a reference, please leave a comment.
My picks for top geniuses are:
- Paul Rothemund - Caltech - DNA computation and nanotechnology. Awesome stuff. He has used
algorithmic self assembly methods with DNAscaffolded DNA origami to construct images like a map of the western hemisphere (see picture). - Saul Griffith - Squid Labs - One of his inventions could bring cheap eyeglasses to poor communities throughout the world. Another was a handheld human powered generator. He’s also into nanotech, so we could see some cool stuff there.
- Michael Elowitz - Caltech - More algorithmic coolness with DNA and DNA circuits.
I came across a story on NPR today about why women read more than men. They quote from Louann Brizendine who wrote the book The Female Brain. The issue of gender differences and the brain always starts fights. Men have larger brains and more gray matter, which handles information processing. Women have more white matter and thus more interconnectivity between parts of the brain. The prefrontal lobe in women is more densely packed with neurons, and that is the area responsible for judgment, planning and language. Here is a quote from the article:
“Girls have an easier time with reading or written work, and it’s not a stretch to extrapolate [that] to adult life,” Brizendine says. Indeed, adult women talk more in social settings and use more words than men, she says.
Woah nelly! Brizendine hasn’t been doing her reading, because tons of contrary evidence to this crap has been out for a while. And trying to find that link, I discovered that Language Log has already done a pretty extensive commentary on this article. When it comes to matters of language, it’s hard to scoop them. The long and the short of the Language Log commentary is that the so-called gap in fiction sales could be accounted for entirely by sales of romance books.
They have compiled a great list of nutty captchas caught in the wild. The best is certainly the Russian one, as many others have agreed. Captchas are particularly interesting to me since it looks like I’m going to be working with Dr. Luis von Ahn next semester on the GWAP project (games with a purpose).





