Netflix Prize just about wrapped up

Image representing Netflix as depicted in Crun...
Image via CrunchBase

It looks like some of the top players in the Netflix Prize competition have teamed up and finally broke the 10% improvement barrier.  I know I’m a few days late on this, though not because I didn’t see when it happened.  I’ve been battling an ear infection all week and it has left me dizzy, in pain, and with no energy when I get home from work.  I hesitated before even posting anything about this, since there is little I can add at this point that hasn’t already been said. I’ll just share a few thoughts and experiences for posterity and leave it at that.  I’m also going to eventually make the point that recommender systems are operating under a false assumption, if you read this all the way through. :)

I competed for the prize for a bit, trying out a few ideas with support vector machines and maximum margin matrix factorization [pdf] that never panned out.  We were getting about a 4% improvement over Cinematch, which put us way down the list.  Going further would mean investing a lot of effort into implementing other algorithms, working out the ensemble, etc., unless we came up with some novel algorithm that bridged the gap.  That didn’t seem likely, so I stopped working on it just after leaving school.  I learned a lot about machine learning, matrix factorization, and scaling thanks to the competition, so it was hardly a net loss for me.

The one thing I regret is that the prize encouraged me and my advisor to spend more effort on the competition than we should have, which in turn meant we didn’t spend more time working on something tangibly productive for research.  Bluntly put, I think if we hadn’t wasted so much time on the competition, we could have worked on a different research problem more likely to produce a paper.  The lack of published research on my CV was the main reason I didn’t move on to get my PhD at CMU (at least, that’s what I was told by those close to the decision).  Hindsight is 20/20, and at the time, the shining glory of winning a million bucks and fame was delicious.  It also seemed like we had ideas that “maybe kinda sorta” were going somewhere.  That turned out to not be the case, but when admissions committees look at research experience, negative results = no results.

Many people have lauded the competition by saying that it has encouraged research in collaborative filtering and brought public attention to the field.  I was one of those people.  Others have criticized it for not focusing more on what people actually care about when using recommender systems — getting something useful and having a good experience!  And yes, Daniel Lemire, I’m thinking of you. :)  But I’m convinced that Daniel is right.  I remember reading in the literature that a 10% improvement is about what’s needed for someone to actually be able to notice a difference in recommender systems.  So maybe people will notice a slight improvement in the Netflix recommendations if these ideas are ever implemented.  Which is another problem — most of the stuff that led to winning the prize is so computationally expensive, it’s not really feasible for production.  Netflix recently released some improvements, and I didn’t notice a damned thing.  They still recommended me Daft Punk’s Electroma, which was a mind-numbing screen-turd.  And I must have seen every good sci-fi movie ever made, because there are no more recommendations for me in that category.  I have trouble believing that.

The point of a recommender system really shouldn’t be just to guess what I might happen to rate something at a given time.  The fact that introducing time makes such a big difference in improving performance in the competition seems like a ginormous red flag to me.  Sure I can look back in time and say “on day X, people liked movies about killing terrorists.”  The qualifying set in the competition asked you to predict the rating for a movie by a user on a given date in the past.  Remember what I said about hindsight being 20/20?  How about you predict what I will rate a movie this coming weekend.  See the problem?

I will sound the HCIR trumpets and say that what recommender systems should really be looking at is improving exploration.  When I go looking for a movie to a watch, or a pair of shoes to buy, I already know what I like in general.  Let me pick a starting point and then show me useful ways of narrowing down my search to the cool thing I really want.  Clerk dogs is a good first step on this path, though I think we’re going to have to move away from curated knowledge before this is going to catch fire.

Maybe I have this all wrong.  Maybe we need to discard the notion of recommender systems, since they are operating under the wrong premise.  We don’t need a machine to recommend something it thinks we’ll like.  We need a machine that will help us discover something we’ll like.  We need to be making discovery engines.  (Replace recommender system with search engine in most of what I just said and you’ll find that I have really been sounding the HCIR trumpets.)

Reblog this post [with Zemanta]

Spice-addicted child stars in commercial

What the hell is up with fake blue eyes in commercials?  Some lousy Father’s Day commercial came on today and I was immediately struck by the little boy with spice addiction.  His eyes were glowing blue and when I paused it to get a closer look, it turns out he has no pupils.  This crappy photoshopping is ridiculous.  Do blue eyes really test so well you have to erase a child’s real eyes and replace them with electric blue ones? WTF people!

Young boy with spice addiction.

Young boy with spice addiction.

Wolfram|Alpha is making kids dumber?

While that statement is a bit premature, I guarantee you will hear a chorus of teachers belching that from their 1970’s era podiums in the coming months.  One of the coolest features of Wolfram|Alpha, being able to do annoying calculus problems at the drop of a hat, is going to mean cheating galore.  This point occurred to me immediately upon using W|A, and I tweeted that its usefulness would be much greater for high school students, but this was just brought to my attention again by an article in the Washington Examiner.  Computational physics teacher John Dell makes the following point:

“It no longer makes a lot of sense to spend lots of time teaching students to perform calculations that machines can do better.”

If you ask me, it never has made much sense.  Physics tests where I couldn’t use my graphing calculator were the dumbest things to me.  I imagine very few physicists these days do any serious calculations without the aid of a computer.  Why spend three hours chugging through something that a computer can give an answer to in 3 seconds.  It’s just dumb.

On the other hand, learning calculus is more than just coming up with an answer.  It’s easy to do simple calculus, any kid with pre-algebra knowledge can do it as long as they know what variables, fractions, and exponents are.  But that will only get you so far.

At the very simplest level, you might say the point of learning something is being able to apply it to a future problem.  Calculus does you no good if you never learn what sort of problems you can apply it to.  Stuff that you learn that has no application is trivia, and while that may be useful for winning Jeopardy, it’s not much help in real life outside of cocktail parties.  Most kids feel that the math they learn after ‘rithmetic is trivia.

What if instead of a class of children who give up learning higher math because of brain bending pages of complex equations, you had a class of children who grew up knowing how to ask the right questions of software to solve real world problems?  I think we need to step back and ask ourselves if what we’re teaching children we’re teaching them just because it’s somebody’s notion of “what kids ought to know” or if it is applicable to real problems.  Teaching trivia has a non-trivial effect.

Alternative Grad School

Image representing Seth Godin as depicted in C...
Image via CrunchBase

Seth Godin suggests that all these unemployed college grads put themselves to good purpose this year and spend some time really enriching themselves (ht to @johndcook for the link).  The laundry list of to-do items includes:

  • Spend twenty hours a week running a project for a non-profit.
  • Teach yourself Java, HTML, Flash, PHP and SQL.
  • Volunteer to coach or assistant coach a kids sports team.
  • Start, run and grow an online community.

And some other stuff you should visit his site to read for yourselves.  I like his picture.  It makes me think of posthumans.  John’s tweet about Seth’s post inspired the following exchange (reverse chronological order):

Twitter exchange between <a href=

Twitter exchange between @johndcook and @gappy3000.

Regardless of whether Seth Godin can be taken seriously in this case, though I see nothing wrong with the spirit of his post, John made a very good point in reply and reminded me of what good music means to me.  When I find a song I really like, I listen to it again and again until I can no longer hear the words.  Instead the music makes me daydream and I find a lot of inspiration there.  Good times.

Having just come out of grad school, I encourage recent grads to consider Seth’s suggestion, assuming you have someone who can float you for a year.  Those student loans start nagging you around the six month marker, too.

Reblog this post [with Zemanta]

To slay the sun

Sunset photo by <a href=

Sunsets are so beautiful.  But their beauty is transient.  I sit in front of my window and feel the peace of the moment.  I rarely feel peace anymore, so even a moment is a treasure.  But it is a peace haunted by the knowledge that soon the gold and peach streaks will flee from the sky, replaced by the sickly orange reflection of city lights off smog.  Oh, that I could slay the sun and stake its corpse forever to the sky.  Even then, the beauty would fade like the light from a squashed lightning bug.  Nothing lasts forever.

Image by Per Ola Wiberg.

Two new toys: G^2 and Bing

This week has given me two new toys to play with, and you could probably say both were bought at the dollar store.  The first was Microsoft’s release of Rebranded Live, aka Bing.  Bing’s search results have been poor (for me), but not much poorer than Google’s.  Just enough poorer for me to see no reason to really switch, which is very bad for Microsoft.  There are neat little features, like pop up feed links for blog posts and previews.  I like it, but it’s not much.  Where they shine is in image search, which incorporates similar image search already (Google still has theirs in Labs).  Google Similar Images knocked my socks off at first, but then it just seemed like it should be renamed Google Identical Images.  Not much diversity.  Bing got this part right.  The images are similar, not identical.  There is a diverse collection and the navigation is great.  Kudos, Live Labs, for that one.  Is it perfect?  Nope, but it’s better than what I was using.

The next toy was Google Squared, which inspired this tweet right after I tried it:

Google Squared.  You had me at hello.

Google Squared. You had me at hello.

Further playing around with it convinced me that this would have been a nice tool to have when I was doing ridiculous term papers in high school.  Term papers about crap I didn’t care about.  Basically random stuff.  G^2 is great for that, but really not very helpful otherwise.  It was pretty awesome finding out the number of victims of 30 different serial killers all at once, though.  As quality improves (assuming it does), this could be pretty useful.  Quality has to get there though.  90% of time using it is trial and error trying to find something that works.  I was able to add some sorting algorithms to a square, but couldn’t find a single column to add that actually had something in it (that wasn’t absurd).  Wolfram|Alpha is still the winner in the knowledge engine department, methinks.

Some Google Squared Results

Some Google Squared Results

Reblog this post [with Zemanta]

It’s official – girls think CS sucks

New Image for Computing (NIC) is a project put together by WGBH and the ACM to spice up the image of computing professions amongst teens, especially among girls and minorities.  They released a study showing that at least among boys, the mission has pretty much been achieved for minorities.  Black and hispanic male teens have a more favorable image of computing as a profession than white males do.  Girls, on the other hand, think it really sucks.  45% of teen males think computing would make a very good profession, whereas only 10% of girls think so.  35% of girls think it’s a bad choice, as opposed to 10% of males.  Ouch!

Reblog this post [with Zemanta]

The One Millionth English Word is ‘Rubbish’

Paul Payak of the Global Language Monitor is claiming the 1 millionth English word is coming soon.  He says a new English word is coined every 98 minutes, so the 1 million marker will arrive about 15 days hence.  The CBS article that tipped me off to this is pretty amusing in the quotes it selected from linguists, which resoundingly cried “bullshit.”  But the best quote came from Payak himself:

We believe words can be counted if you define them in the right way. You can count them like anything else in science. You can count how many atoms there are in the ocean.

Let’s think about counting the atoms in the ocean for a moment. What about where rivers flow into the ocean? Where is the boundary line? Salt and fresh water are mingling quite a bit and finding the exact boundary is pretty much impossible. If we draw an arbitrary line, surely we will get too much in one place and too little in another. Also, what about rain and evaporation? Counting the atoms would require an instantaneous snapshot of the entire ocean at the atomic level. It can’t be done.

You run into similar problems counting words.  Compound words blend into single words and words leave the language as well as enter it.  How do you detect this?  You’d need a snapshot of the entire English language as it is spoken, typed, and read all around the world.  What is a word in one dialect isn’t necessarily a word in another dialect.  Where do you draw the line?

First Impressions of Wolfram|Alpha

Perhaps you’ve heard of the latest brainchild of the Wunderkind Stephen WolframWolfram|Alpha.  Matthew Hurst nicknamed it Alphram today and I agree that’s a much better name.   Wolfram|Alpha (W|A henceforth) is not a search engine, it’s a knowledge engine.  It will compete with Google on a slice of traffic that Google really isn’t all that hot in for now, comparative questioning answering.  When you ask Google something like “How does the GDP of South Africa compare to China?” you hope you get back something relevant in the first few results (spoiler alert:  you don’t).  When you ask that of W|A, you get exactly what you’re looking for.  Beautiful.  W|A’s so-called natural language interface isn’t perfect, though.  You get a lot of flakiness from it until you start to recognize what works and what doesn’t.

Now let’s be honest.  How often do we search for that kind of thing?  Not very often.  I think that’s partly because Google is notoriously bad at it.  Once we start to get a handle on what W|A is capable of, I think people will start expecting more of their friendly neighborhood search giant.  Google claims to have a few tricks up its sleeves, but everything I’ve seen out of Google lately has been such a disappointment I am deeply skeptical.  The new trick is called Google Squared and it returns search results in a spreadsheet format, breaking down the various facets of the things you are searching for.  In the demo, it shows stuff like rollercoaster drop speeds, heights, etc when you search for roller coasters.  You can add to the square and do some pretty nifty stuff.  TechCrunch claims this will kill W|A.  I think the two could be complementary.  Based on the demo, I expect W|A will return results of a higher calibre, but will miss out on a lot of queries because the knowledge is just missing.  Google Squared appears to be doing something fuzzier and will return results that might be really bad.  So while W|A just says it doesn’t know, Google Squared will let you pick through the junk to find the gem.  Google Squared is expected to launch later this month in Google Labs.

Many have said that where W|A will really compete is against Wikipedia and I am inclined to agree.  There are plenty of things I go to Wikipedia for now that I probably will switch over to W|A for, like populations of countries, size of Neptune’s moons, and so on.  Wikipedia still wins for more in-depth knowledge on a topic.  W|A also does some pretty cool stuff when you search for the definition of a word (use a query like “word kitten“).  You learn that kitten comes from Classical Latin, and entered English about 700 years ago.  You can find out a similar thing (and go further in depth for the etymology at least) using the American Heritage dictionary on dictionary.com, but W|A requires less digging.

And this brings me around to a key point with W|A.  It’s an awesome factoid answering service.  It does it well and it does it in a pretty way.  Stuff you can find in more depth elsewhere you can get quickly and easily, but only superficially via W|A.  There are links to more information, though, so you don’t lose much by relying on W|A — assuming it has knowledge about what you’re looking for.  You’re still going to be more likely to hit a brick wall with W|A.

And of course, since Wolfram developed Mathematica, W|A is backed by it.  Enter an equation and you get some really handy math info back.  Need to quickly know the derivative of a fairly complicated equation?  Presto.  Probably the most satisfying feeling I got today was from a query similar to “what is the area under x^4+3x^2+4 from 1 to 8?“  Let’s see you answer that, Google Squared.

Wolfram|Alpha sample results

Reblog this post [with Zemanta]

iPhone MMS and High Blood Pressure

Image representing iPhone as depicted in Crunc...
Image via CrunchBase

Warning:  this is a rant as old as the iPhone.  I just have to vent into the interetherwebs or else explode.

Complaining to a friend, I’ve managed to work myself up into quite a state.  He just got an iPhone and so we’ve been discussing how frickin sweet they are.  But then I remembered a pix message I got a couple days ago and my blood began to rise in temperature rapidly.  For those without iPhones, when you get a multimedia message (picture, video), you are given a link that is clickable and a big ass username and password that you must either write down or remember and then manually enter into the site you’re taken to.

Ok, I understand they didn’t want to support multimedia messaging on the iPhone.  That was widely known when I bought it.  No big deal.  But here is my problem.  You can slap the username and password into the url and automatically log in from the link to view the message.  It is so easy to do, not doing it is ridiculous.  Security?  Bah!  The damn text message has the password in plain text.  I can only conclude that AT&T does not do so out of pure malice.

In summary, AT&T, please hire a Bangladeshi programmer and pay them the $2 for the fifteen minutes it would take to implement this.

Reblog this post [with Zemanta]