You are currently browsing the monthly archive for June, 2008.
I just completed the final requirements of my Masters degree today (the details of which I will save for a future post). It has been a difficult road, and I’m glad it’s done. I didn’t attend any sort of graduation ceremonies, because I don’t go for that sort of thing — at all. Until today, it didn’t feel like the weight was off my shoulders. Now I actually feel like celebrating! But I won’t, because I’m a nerd. I’m currently celebrating by working on a programming puzzle. And surfing the blagoblag.
I still have a couple months of servitude to complete the requirements of my fellowship, but the degree is mine.
I’ve taken to calling her kitten. I was messing around with obscuring the flash. This shot had the flash obscured by an opaque object. I liked the look on her face, but the lighting was too low and led to noise in the capture. So I hit it with the sepia and film grain. The pictures are rather large, so I’m hiding them below the jump.
Peter Turney posted recently on the logic of attributional and relational similarity. Attributes are features or characteristics of a single entity. Relations describe some connection between two entities, such as a comparison. We’ll denote a relation between two entities A and B as A:B. A relational similarity between two groups A, B and C,D will be denoted as A:B::C:D. This is the standard SAT-style proportional analogy: A is to B as C is to D. An attributional similarity indicates that two entities share the same attribute (this could be to varying degrees, but in the boolean case, it’s either shared or it isn’t). An attributional similarity between A and B will be denoted as A~B. This is like saying Z, A:Z::B:Z. I’m just giving a brief introduction here, but this is all in Peter’s post to greater detail, so I recommend reading that for more information.
This got me thinking about collaborative filtering (because, well, I’ve been thinking about it all the time for the past two years). Collaborative filtering exploits similarities between users to predict preferences for items the user has not seen. In the case of movie recommendations, like with Netflix, this means that users can recommend movies they have seen to similar users who have not seen those movies. There are many ways of doing this. At the heart of it, however, is this notion of relational and attributional similarity.
A: base user
B: movies rated by A
C: some set of other users
D: movies rated by C
We can’t just say that A:B::C:D, since A and C may be nothing like each other. If we constrain it to users with attributional similarity, then we arrive at the definition of collaborative filtering: A~C & A:B::C:D. Logically, it follows that B~D also holds. See Peter’s post for some properties of proportional analogies that make this more clear.
In the non-binary case, we can choose C to be a set of users whose similarity varies with A. Also, our measure of what exactly constitutes similarity can be any number of different metrics. From here, it seems pretty clear that the limit of collaborative filtering is bounded by the attributional similarity A~C. If (A~C) & (A = C) (complete similarity) then it follows that B = D or else A C. If A
C then does it logically follow that B
D? I guess it depends on the similarity metric and how we are defining the differences in the sets of movies and the differences in the sets of users.
I wonder if there has been any work done in this area? I wasn’t able to find anything, but maybe I’m just not searching for the right thing. Is it even worth pursuing?
While browsing my various photo albums (all of them digital) for a proper head shot of myself, I was struck by my complete absence. I make appearances so rarely, it’s like I don’t exist in the photo history of my life. I’m always the cameraman. Really, I have so few pictures of myself throughout my entire life, I am basically a ghost. There was a period of time in the late 90’s when I had long hair. I don’t have a single picture of that (this post reminded me of that time). But I do have pictures of my babies. I found this while hunting.
Willow gets nervous when Donna leaves. She watches her like a hawk through the window until she disappears from sight. Today I decided to take a picture of this, which meant opening the screen as well as the window. However, Willow started getting a little too “jumpy” and I had to tug her away from the window before she tried to learn how to fly.
The berries on the ground are mulberries. We have several trees in our yard, so they are falling everywhere right now. Daedalus is addicted to them, the little blighter. It’s a constant tug-of-war when we’re outside.
I need these all in one place:
- Distractions make you dumb
- The Myth of Multitasking
- Research stamina
- Genius, Sustained Effort, and Passion
- How to become smarter
- How to be a genius
- Does one have to be a genius to do maths?
- Google makes us stupid [Added 2008-06-24]
It’s no coincidence a lot of that originated from Daniel Lemire. Posts like the ones above are a great reason to subscribe to his blog, even if you’re not into computer science.
If you know of any links that belong here, please let me know in the comments.
The North American Computational Linguistics Olympiad is an annual competition open to US high school students that introduces kids to computational linguistics at a much younger age than people normally hear about it. I didn’t hear about CL until I was three years into my undergrad program. The instant I did hear about it, I knew I wanted to do it. Most people I talk to about it, look like I’ve just uttered a phrase of Klingon. I suspect most people don’t hear about it at all, or if they do, it’s sometime during their undergrad program and not at the beginning, when they might be better able to plan their educational career path. Also, CL is pretty much a graduate program and rarely taught before then. Granted, a lot of the maths involved are beyond what’s taught to high school students and early undergrads, but the linguistics is not. And thinking about linguistics computationally is not. So NACLO is doing an extremely valuable service which I support completely. And not just because one of my professors is one of the General Chairs of the organizing committee for it. She no longer can affect my grade and I have no need to suck up — so this is genuine. How’s that for full disclosure?
One of my google alerts popped up a post on a spam blog I tracked down to this original post, which talks about a lot of young kids doing some great things in science. In the post is an interview with last year’s winner, Adam Hesterberg. He said, “I’d never studied linguistics, and ‘computation’ sounded like boring calculation.” That reminded me of the fact that computation might mean a different thing for most people than it does for scientists. I’m no corpus linguist, so I’m not gonna try to find out right here. What I suspect is that computation has a more “hard work” connotation for people outside of science: it’s the “plugging and chugging” meaning. Inside science, it’s tacked onto the beginning of some other field to mean anything in that field that can be computed. Computational linguistics deals with the computable aspects of linguistic theories. A very quick search on wikipedia finds at least a dozen other computational fields:
- Computational biology
- Computational chemistry
- Computational economics
- Computational electromagnetics
- Computational engineering
- Computational finance
- Computational fluid dynamics
- Computational mathematics
- Computational mechanics
- Computational particle physics
- Computational physics
- Computational statistics
Is it a good idea to use this name when approaching high school students? What about language technologies? Well, the competition isn’t about language technologies, it’s about critical problem solving in a linguistics setting. And trying to fit that into a competition name isn’t going to work, either. North American Critical Problem Solving about Linguistics Olympiad (NACPSLO)? It makes me think of narcolepsy.
So my proposal is North American Logic and Language Olympiad (NALLO). It’s easy to say (rhymes with hallow) and accurately describes the subject matter. Plus, I think it has broader appeal. A lot of kids are interested in logic, language, or both. It shakes free of the negative connotation of computation and draws kids where they can be introduced to it a little more easily. The downside is that it doesn’t mention linguistics directly, so that might trouble some people who are a little more traditional about their outreach.
What do you think?
If you follow news on the semantic web or new search engines, you may have heard of hakia. TechCrunch has done a small write up about their new semantic search API. TechCrunch is brutally hard on startups who aren’t fully operational, so there is a lot of criticism in that article that I take with a grain of salt. I like seeing startups open their services with APIs and I think they deserve some benefit of the doubt. Maybe I’m looking at it the wrong way, though, and the fact that TechCrunch does make such a stink ensures the startup will correct the problem asap, rather than farting around for a while.
This post contains no spoilers.
I rewatched Primer this week. I had seen it a couple years ago as one of the first movies I got from Netflix the first time I signed up. It was a successful recommendation. Since I was a kid, I have been totally intrigued with time travel and time travel movies. Time travel movies rank among my favorite films, like 12 Monkeys, Time Bandits, the Butterfly Effect, etc. Time travel books are great too, like The Time Traveler’s Wife. Thinking about the implications of being able to change things — and what happens when you do — filled many teenage hours. An important part of my fascination then is resolving the conflicts inherent in time travel. What happens if you change something in the past? What are the rules in the movie or book? Does the movie/book adhere to its own rules or do they screw up?
Primer is a time travel movie in a league of its own. I think it’s pretty much impossible to fully grasp the first time through. It is probably the most confusing movie I have ever seen (that is not “absurd” anyway). It’s been bumping around in my head for the past couple years, driving me to see it again. Mike D’Angelo in Esquire said it’s like “following the path of one blade on a high-speed ceiling fan.” That’s a fairly accurate description.
refresh
This post is spoiler free.
I finally got to see Juno tonight. It’s been sitting at the top of my Netflix queue for nearly two months with a long wait. What a great movie! One of my favorite parts was the soundtrack. There were several great songs by Kimya Dawson (of the Moldy Peaches) and then a performance by the two leads of the Moldy Peaches song “Anyone Else But You.” The version sung in the movie is missing a few stanzas. My favorite of the missing ones is below (sung by Kimya):
“Up up down down left right left right B A start
Just because we use cheats
Doesn’t mean we’re not smart
I don’t see what anyone can see in anyone else
But you…”
Go geek references (and Thundercats)! And speaking of cheats, trying using that cheat code in Google Reader (minus the start button at the end of course).
And returning to Netflix: they are removing individual profiles from accounts as of September 1st. What a boneheaded, retardafreakin’ idea. Supposedly it will help them make the website better. I hope it’s a lot better since this change has me pissed.
Spore is probably the most anticipated game of the year. Indeed, it has been anticipated for quite a while. It’s by the same dude who did SimCity and the Sims, yada yada, if you want to know all that you can check out the myriad gaming articles out there who care a lot more about the particulars than I do. The main thing of interest to me is the creature creator at this point, since Maxis just released a demo version of it. You can also buy a non-disabled version for $10 (digitally starting at noon CST today). The demo version limits the variety of parts you can add pretty significantly. What it does let you see is how well it animates and interprets the morphology of the creatures you make. And it’s pretty frickin’ cool.
Below is one of my creations, Otzertzen.
Thanks to TwitPic, I can post these pics directly to twitter from my cell phone. Good times.
I just noticed that on the first page of Google Image results for the query cutest dog ever is none other than the Daedalpuppy (reposted below). The first picture on that list is pretty cruel. :P I think Daedal has those other dogs beat.

![]() |
RedOrbit named me one of their blogs of the day today. Go me! I had come across them a time or two before. They are a space/tech news site. Not bad for that sort of thing and certainly less spammy and clunky than Space.com.
A couple of days ago, I wrote a script that would tweet anything you plurked. Thanks to some code from Neville Newey (based on PHP code by Charl van Niekerk), the plurk.py script I wrote has been updated to both plurk your tweets and tweet your plurks. This should work on both windows and linux machines. If you have access to a linux machine, I suggest setting up a cron job to take care of this. As I mentioned in the previous post, if you set up a cron job, be sure to change the path to plurkdb.dat to an absolute path. I have done the most testing on this with python 2.4 in linux.
This code is open source under the Creative Commons 3.0 Attribution license that this blog uses Creative Commons BSD license. Neville’s code appears to be under CC:Attribution 2.5 for South Africa, by what I could glean from his site. I have considered making this an open source project under Google code but have yet to take it all the way. Google sets a lifetime limit of 10 projects, so I will continue to hoard those against future need. If you make modifications to the code, please let me know and I will probably post them here and in the code for future releases, so we all win.
Note that the command line parameters have changed:
plurk.py <twitter username> <twitter password> <plurk username> <plurk password>
And of course, as with all software, use at your own risk.

My friend Israel clued me in on Dapper a few weeks ago. I have played around with them a very small bit, but that was all it took to recognize their potential. The idea is simple, the implementation not so much. When you browse videos on YouTube, the layout of search results are all the same. So why can’t something recognize this and treat any search result as an rss feed, checking it periodically for changes? Enter Dapper. One thing that has bothered me for the past couple years is the fact that the ACM Technews does not have an RSS feed. WTF, ACM? Thanks to Dapper, now it does.
Unfortunately, Dapper is not perfect. It took me a few tries to get my first dapp working (what they call a single instance of the service). Granted, it was on fairly complicated output (not ACM Technews). If the service you are trying to create a dapp of uses sessions, your attempt will probably fail (and if it doesn’t, let me know how you did it). They are still improving the service, though, so perhaps that will change.
If you are into information trapping, though, Dapper is a must have in your arsenal of traps.
If you want to use Plurk, but aren’t ready to leave Twitter, I wrote a little python script you can use to automatically mirror your plurks on Twitter. This will not work for response plurks, but your main plurks will be extracted and posted to your Twitter account with the prefix “plurking:” followed by your plurk.
The resulting tweet looks like this:

Download the script and set it up as a cron job (or you could execute it manually). It should work with python 2.4 and later. It stores a plurkdb.dat file (which you should probably assign an absolute path to, depending on the behavior of cron on your system). This file is checked every time it is run to make sure that duplicate plurks aren’t being tweeted. You should pass the following parameters on the command line (or modify the script so they are hardcoded, if you want): <twitter username> <twitter password> <plurk username> <plurk password>. Update: see later post on updated plurk script. And like with all software, use at your own risk.
Please let me know if you have any problems with it or see room for improvement. I hacked this out in a hurry, so …










