You are currently browsing the monthly archive for June, 2008.

I just completed the final requirements of my Masters degree today (the details of which I will save for a future post).  It has been a difficult road, and I’m glad it’s done.  I didn’t attend any sort of graduation ceremonies, because I don’t go for that sort of thing — at all.  Until today, it didn’t feel like the weight was off my shoulders.  Now I actually feel like celebrating!  But I won’t, because I’m a nerd.  I’m currently celebrating by working on a programming puzzle.  And surfing the blagoblag.

I still have a couple months of servitude to complete the requirements of my fellowship, but the degree is mine.

I’ve taken to calling her kitten. I was messing around with obscuring the flash. This shot had the flash obscured by an opaque object. I liked the look on her face, but the lighting was too low and led to noise in the capture. So I hit it with the sepia and film grain. The pictures are rather large, so I’m hiding them below the jump.

Read the rest of this entry »

Wordle is a totally awesome tag cloud creator.  You can create a tag cloud from text or some del.icio.us username.  My rendition of my del.icio.us tags is below.

del.icio.us tag cloud for user ealdent created by wordle

[via SMRB]

Peter Turney posted recently on the logic of attributional and relational similarity. Attributes are features or characteristics of a single entity. Relations describe some connection between two entities, such as a comparison. We’ll denote a relation between two entities A and B as A:B. A relational similarity between two groups A, B and C,D will be denoted as A:B::C:D. This is the standard SAT-style proportional analogy: A is to B as C is to D. An attributional similarity indicates that two entities share the same attribute (this could be to varying degrees, but in the boolean case, it’s either shared or it isn’t). An attributional similarity between A and B will be denoted as A~B. This is like saying \forall Z, A:Z::B:Z. I’m just giving a brief introduction here, but this is all in Peter’s post to greater detail, so I recommend reading that for more information.

This got me thinking about collaborative filtering (because, well, I’ve been thinking about it all the time for the past two years). Collaborative filtering exploits similarities between users to predict preferences for items the user has not seen. In the case of movie recommendations, like with Netflix, this means that users can recommend movies they have seen to similar users who have not seen those movies. There are many ways of doing this. At the heart of it, however, is this notion of relational and attributional similarity.

A: base user
B: movies rated by A
C: some set of other users
D: movies rated by C

We can’t just say that A:B::C:D, since A and C may be nothing like each other. If we constrain it to users with attributional similarity, then we arrive at the definition of collaborative filtering: A~C & A:B::C:D. Logically, it follows that B~D also holds. See Peter’s post for some properties of proportional analogies that make this more clear.

In the non-binary case, we can choose C to be a set of users whose similarity varies with A. Also, our measure of what exactly constitutes similarity can be any number of different metrics. From here, it seems pretty clear that the limit of collaborative filtering is bounded by the attributional similarity A~C. If (A~C) & (A = C) (complete similarity) then it follows that B = D or else A \neq C. If A \neq C then does it logically follow that B \neq D? I guess it depends on the similarity metric and how we are defining the differences in the sets of movies and the differences in the sets of users.

I wonder if there has been any work done in this area? I wasn’t able to find anything, but maybe I’m just not searching for the right thing. Is it even worth pursuing?

While browsing my various photo albums (all of them digital) for a proper head shot of myself, I was struck by my complete absence. I make appearances so rarely, it’s like I don’t exist in the photo history of my life. I’m always the cameraman. Really, I have so few pictures of myself throughout my entire life, I am basically a ghost. There was a period of time in the late 90’s when I had long hair. I don’t have a single picture of that (this post reminded me of that time). But I do have pictures of my babies. I found this while hunting.

My australian shepherd Willow as a puppy.  Isn\'t she as cute as a baby elk?

Willow gets nervous when Donna leaves.  She watches her like a hawk through the window until she disappears from sight.  Today I decided to take a picture of this, which meant opening the screen as well as the window.  However, Willow started getting a little too “jumpy” and I had to tug her away from the window before she tried to learn how to fly.

My australian shepherd Willow fretting over my wife Donna being separated from the pack

The berries on the ground are mulberries.  We have several trees in our yard, so they are falling everywhere right now.  Daedalus is addicted to them, the little blighter.  It’s a constant tug-of-war when we’re outside.

Mulberry tree in our front yard

I need these all in one place:

It’s no coincidence a lot of that originated from Daniel Lemire. Posts like the ones above are a great reason to subscribe to his blog, even if you’re not into computer science.

If you know of any links that belong here, please let me know in the comments.

The North American Computational Linguistics Olympiad is an annual competition open to US high school students that introduces kids to computational linguistics at a much younger age than people normally hear about it. I didn’t hear about CL until I was three years into my undergrad program. The instant I did hear about it, I knew I wanted to do it. Most people I talk to about it, look like I’ve just uttered a phrase of Klingon. I suspect most people don’t hear about it at all, or if they do, it’s sometime during their undergrad program and not at the beginning, when they might be better able to plan their educational career path. Also, CL is pretty much a graduate program and rarely taught before then. Granted, a lot of the maths involved are beyond what’s taught to high school students and early undergrads, but the linguistics is not. And thinking about linguistics computationally is not. So NACLO is doing an extremely valuable service which I support completely. And not just because one of my professors is one of the General Chairs of the organizing committee for it. She no longer can affect my grade and I have no need to suck up — so this is genuine. How’s that for full disclosure?

One of my google alerts popped up a post on a spam blog I tracked down to this original post, which talks about a lot of young kids doing some great things in science. In the post is an interview with last year’s winner, Adam Hesterberg. He said, “I’d never studied linguistics, and ‘computation’ sounded like boring calculation.” That reminded me of the fact that computation might mean a different thing for most people than it does for scientists. I’m no corpus linguist, so I’m not gonna try to find out right here. What I suspect is that computation has a more “hard work” connotation for people outside of science: it’s the “plugging and chugging” meaning. Inside science, it’s tacked onto the beginning of some other field to mean anything in that field that can be computed. Computational linguistics deals with the computable aspects of linguistic theories. A very quick search on wikipedia finds at least a dozen other computational fields:

Is it a good idea to use this name when approaching high school students? What about language technologies? Well, the competition isn’t about language technologies, it’s about critical problem solving in a linguistics setting. And trying to fit that into a competition name isn’t going to work, either. North American Critical Problem Solving about Linguistics Olympiad (NACPSLO)? It makes me think of narcolepsy.

So my proposal is North American Logic and Language Olympiad (NALLO). It’s easy to say (rhymes with hallow) and accurately describes the subject matter. Plus, I think it has broader appeal. A lot of kids are interested in logic, language, or both. It shakes free of the negative connotation of computation and draws kids where they can be introduced to it a little more easily. The downside is that it doesn’t mention linguistics directly, so that might trouble some people who are a little more traditional about their outreach.

What do you think?

If you follow news on the semantic web or new search engines, you may have heard of hakia. TechCrunch has done a small write up about their new semantic search API. TechCrunch is brutally hard on startups who aren’t fully operational, so there is a lot of criticism in that article that I take with a grain of salt. I like seeing startups open their services with APIs and I think they deserve some benefit of the doubt. Maybe I’m looking at it the wrong way, though, and the fact that TechCrunch does make such a stink ensures the startup will correct the problem asap, rather than farting around for a while.

Read the rest of this entry »

This post contains no spoilers.

I rewatched Primer this week. I had seen it a couple years ago as one of the first movies I got from Netflix the first time I signed up. It was a successful recommendation. Since I was a kid, I have been totally intrigued with time travel and time travel movies. Time travel movies rank among my favorite films, like 12 Monkeys, Time Bandits, the Butterfly Effect, etc. Time travel books are great too, like The Time Traveler’s Wife. Thinking about the implications of being able to change things — and what happens when you do — filled many teenage hours.  An important part of my fascination then is resolving the conflicts inherent in time travel.  What happens if you change something in the past?  What are the rules in the movie or book?  Does the movie/book adhere to its own rules or do they screw up?

Primer is a time travel movie in a league of its own.  I think it’s pretty much impossible to fully grasp the first time through.  It is probably the most confusing movie I have ever seen (that is not “absurd” anyway).  It’s been bumping around in my head for the past couple years, driving me to see it again.  Mike D’Angelo in Esquire said it’s like “following the path of one blade on a high-speed ceiling fan.”  That’s a fairly accurate description.

refresh

This post is spoiler free.

I finally got to see Juno tonight. It’s been sitting at the top of my Netflix queue for nearly two months with a long wait. What a great movie! One of my favorite parts was the soundtrack. There were several great songs by Kimya Dawson (of the Moldy Peaches) and then a performance by the two leads of the Moldy Peaches song “Anyone Else But You.” The version sung in the movie is missing a few stanzas. My favorite of the missing ones is below (sung by Kimya):

“Up up down down left right left right B A start
Just because we use cheats
Doesn’t mean we’re not smart
I don’t see what anyone can see in anyone else
But you…”

Go geek references (and Thundercats)! And speaking of cheats, trying using that cheat code in Google Reader (minus the start button at the end of course).

And returning to Netflix: they are removing individual profiles from accounts as of September 1st. What a boneheaded, retardafreakin’ idea. Supposedly it will help them make the website better. I hope it’s a lot better since this change has me pissed.

Spore is probably the most anticipated game of the year.  Indeed, it has been anticipated for quite a while.  It’s by the same dude who did SimCity and the Sims, yada yada, if you want to know all that you can check out the myriad gaming articles out there who care a lot more about the particulars than I do.  The main thing of interest to me is the creature creator at this point, since Maxis just released a demo version of it.  You can also buy a non-disabled version for $10 (digitally starting at noon CST today).  The demo version limits the variety of parts you can add pretty significantly.  What it does let you see is how well it animates and interprets the morphology of the creatures you make.  And it’s pretty frickin’ cool.

Below is one of my creations, Otzertzen.

After hearing about it for weeks, I caved and decided to check out friendfeed last night [and again, ht @dpn]. In previous posts I mentioned something I like to call the information diaspora. This is the phenomenon created by posting all sorts of personal information about your likes, dislikes, thoughts, opinions, etc all over the web and your subsequent loss of that information because it can’t be managed. I can see friendfeed coming in handy for removing some of this problem. You can attach a number of different social networking sites, flickr, youtube, etc all to your friendfeed account. Whenever you post something new in one of these sites, that information will be updated on friendfeed for all of your friends (and yourself) to be able to view. It’s not the perfect solution, but it is a very big step in the right direction.

Check it out. As usual, my username there is ealdent and feel free to friend me.

Thanks to TwitPic, I can post these pics directly to twitter from my cell phone. Good times.

Daedalus in the tubes

I just noticed that on the first page of Google Image results for the query cutest dog ever is none other than the Daedalpuppy (reposted below).  The first picture on that list is pretty cruel.  :P  I think Daedal has those other dogs beat.

Daedalus as a puppy - the cutest dog ever

RedOrbit Blog of the Day 2008-06-06

RedOrbit named me one of their blogs of the day today. Go me! I had come across them a time or two before. They are a space/tech news site. Not bad for that sort of thing and certainly less spammy and clunky than Space.com.

A couple of days ago, I wrote a script that would tweet anything you plurked. Thanks to some code from Neville Newey (based on PHP code by Charl van Niekerk), the plurk.py script I wrote has been updated to both plurk your tweets and tweet your plurks. This should work on both windows and linux machines. If you have access to a linux machine, I suggest setting up a cron job to take care of this. As I mentioned in the previous post, if you set up a cron job, be sure to change the path to plurkdb.dat to an absolute path. I have done the most testing on this with python 2.4 in linux.

This code is open source under the Creative Commons 3.0 Attribution license that this blog uses Creative Commons BSD license. Neville’s code appears to be under CC:Attribution 2.5 for South Africa, by what I could glean from his site. I have considered making this an open source project under Google code but have yet to take it all the way. Google sets a lifetime limit of 10 projects, so I will continue to hoard those against future need. If you make modifications to the code, please let me know and I will probably post them here and in the code for future releases, so we all win.

Note that the command line parameters have changed:

plurk.py <twitter username> <twitter password> <plurk username> <plurk password>

And of course, as with all software, use at your own risk.

Dapper

My friend Israel clued me in on Dapper a few weeks ago. I have played around with them a very small bit, but that was all it took to recognize their potential. The idea is simple, the implementation not so much. When you browse videos on YouTube, the layout of search results are all the same. So why can’t something recognize this and treat any search result as an rss feed, checking it periodically for changes? Enter Dapper. One thing that has bothered me for the past couple years is the fact that the ACM Technews does not have an RSS feed. WTF, ACM? Thanks to Dapper, now it does.

Unfortunately, Dapper is not perfect. It took me a few tries to get my first dapp working (what they call a single instance of the service). Granted, it was on fairly complicated output (not ACM Technews). If the service you are trying to create a dapp of uses sessions, your attempt will probably fail (and if it doesn’t, let me know how you did it). They are still improving the service, though, so perhaps that will change.

If you are into information trapping, though, Dapper is a must have in your arsenal of traps.

If you want to use Plurk, but aren’t ready to leave Twitter, I wrote a little python script you can use to automatically mirror your plurks on Twitter. This will not work for response plurks, but your main plurks will be extracted and posted to your Twitter account with the prefix “plurking:” followed by your plurk.

The resulting tweet looks like this:

sample of what the script outputs in twitter

Download the script and set it up as a cron job (or you could execute it manually). It should work with python 2.4 and later. It stores a plurkdb.dat file (which you should probably assign an absolute path to, depending on the behavior of cron on your system). This file is checked every time it is run to make sure that duplicate plurks aren’t being tweeted. You should pass the following parameters on the command line (or modify the script so they are hardcoded, if you want): <twitter username> <twitter password> <plurk username> <plurk password>. Update: see later post on updated plurk script.  And like with all software, use at your own risk.

Please let me know if you have any problems with it or see room for improvement. I hacked this out in a hurry, so …

Is it Hallowe’en already? A fellow nlp blogger (and twitterer) pointed me to Plurk just a few minutes ago. I have been messing with Twitter’s api over the past couple days, which hasn’t been as easy as you’d think since they are suffering from massive growing pains. Fetching the public timeline takes between 5-30 seconds. However, they just got like $15 million in funding, so maybe they’ll be able to address the issue. The even bigger question is can they turn this free advertising service (which is what it is partially becoming) into a revenue stream?

Plurk is basically Twitter with a makeover and some extra social features thrown in. It still has the 140 character status update style interface, but includes a function selection for each plurk (what they call qualifiers): you can say, think, ask, wish, etc. You can also add smileys. Rather than appearing as a series of boxes scrolling down the screen, your plurks appear as floating boxes on a side-scrolling timeline. Plurks of friends also appear on this timeline and the result is a more graphical and pleasing (to me) interface. You can reply directly to other plurks in the boxes and conversations are tracked very nicely. This is far superior to twitter, which requires you to visit the other person’s timeline and wade through their tweets to find previous tweets in a thread. With Twitter being slower than a drunken monkey with three broken legs, that’s even harder.

Preview of Plurk

As my esteemed colleague pointed out, however, scaling is an issue for any service like this. Ultimately, you are bound by how fast you can access the database. If Plurk becomes as popular as Twitter (and I have every reason to believe it won’t), it will also become bogged down. Also, Plurk is just getting started and has no discernible API (unless I’m just missing it). Twitter already has quite a few third party apps.

I must say, though, I am sorely tempted to abandon Twitter in favor of Plurk just for the fact that Plurk is accessible. The massive lag of Twitter is getting to me. Of course, if no one is there to listen to my ramblings, what’s the point?

 Follow me on Twitter
 RSS Feed

About Me

Jason M. Adams

My name is Jason Adams and I work on opinion mining for a growing startup in Atlanta, GA.

Calendar

June 2008
S M T W T F S
« May   Jul »
1234567
891011121314
15161718192021
22232425262728
2930  

Archives

Site Statistics

  • 105,427 reads

Site Information

Contact me: jaso...@gmail.com

Creative Commons License

This work by Jason M. Adams is licensed under a Creative Commons Attribution 3.0 License.

Header image credit seakwenby.

Random Crap