Now that TunkRank has gone live, I am faced with some interesting choices.
First of all, I want to make the code open source. The only barrier to my doing that is that I have passwords saved in version control that really can’t be shared with the outside world. I didn’t even think about that being an issue until I was actually deploying it and saw someone mention it in a blog tutorial. There are ways of getting around this problem, and I’ll have to look into them before I can do it.
Next is the issue of expanding the size of the explored social graph. Right now I have found about 2 million users and know the followers of 500 thousand of those. When I was doing everything in memory, it was very fast for me to expand it and relatively easy to merge them. Now I am using a database (MySQL) and doing operations on it for the social graph are just not fast. So I need something better.
Also, I want to make the TunkRank scores available as a data set or at least via an API, so I need to look into ways of doing that. Merb makes it pretty easy to deliver results as xml or json — I just have to get around to doing it. Right now you can find user jimbob’s TunkRank score either by entering “jimbob” in the search box on the main page or by going to the URL http://tunkrank.com/score/jimbob. Extracting it via json or xml will be just a matter of going to http://tunkrank.com/score/jimbob.format.
I need to provide additional scoring systems other than TunkRank, so that TunkRank can be compared. I’m not sure whether this isn’t something better served by just providing the data set and letting people play around in their own database or if I should provide alternate views. The former is more versatile, the latter will probably reach a larger audience.
Currently I show Google ads on TunkRank, mainly because I have spent a small amount of money on it and wouldn’t mind getting that back. If it starts making any real kind of money, that probably means the traffic has increased significantly and I will need to look at hosting it on EC2 or somewhere. I have no illusions that TunkRank will make me rich. I expect it will make me literally tens of dollars … poorer. :)
Finally, there is the issue of the score I display. I chose to show the percentile ranking because it’s easy to see where you are in comparison to other twitter users. If I just showed you the raw TunkRank score, you would have no frame of reference. My solution was to show you both. The downside is it groups all the “interesting” users into the top three percentiles. @NealRichter has put some thought into this, and I urge you to check out his post, leave some comments and help come up with a scoring mechanism that offers better granularity and yet lets you easily compare yourself to the rest of the world. Thus completes today’s desperate plea for crowdsourcing.



Open sourcing would be nice, though obviously you’d have to address the password concern. But I think the more interesting question is how to go about comparing TunkRank and other measures, e.g., follower counts or twitter.grader.com.
Also, how much did you spend on this in total?
I didn’t spend much, which is why I was hoping Google ads might actually pay me back, though I kinda doubt they will. I think I spent around $15 on Amazon EC2 (for building the initial social graph) and $11 on the domain name.
I definitely agree comparing it to other measures is more interesting. Open sourcing is more of an engineering problem, which means I’ll probably put it off as long as I can.
Just speaking aloud about your “problem”. In regard to the passwords, could you not just drop them in a config file which is ignored by your version control system – and simply put in the README the preferred layout (YAML or whatever) for said config file?
Also, in relation to MySQL, it can can be a bit of a resource hog, I know – but have you ever had a look at CouchDB? In my experiences it runs a lot leaner on memory but the trade-off is merely disk space (which is cheap).
Just thinking aloud here – I think this project was pretty cool. Keep up the good work.
Yeah, part of my problem with the config stuff is that the system I use for deploying depends on the passwords being in version control. There is a way around it, I’ve seen, just have to do it. Plus I can’t make public the version history of the project up til now since it contains the files with the passwords. You could just rewind to a point in the history when the passwords were there. So I’ll probably need to set up a new repository, import the code without those files from scratch and then do the capistrano deploy thing without the passwords in version control.
I have looked at CouchDB, though I haven’t used it for anything more than a quick tutorial at a Merb conference. Looks cool, though, I’ll definitely keep it in mind if I need to scale out.
[...] wrote before about the road ahead for TunkRank, and I have mostly held to it. I have many more ideas I want to expand on, including [...]