Posts Tagged ‘twitter’

The Road Ahead for TunkRank

Posted: 6 March 2009 in Uncategorized
Tags: , , , ,

Now that TunkRank has gone live, I am faced with some interesting choices.

First of all, I want to make the code open source. The only barrier to my doing that is that I have passwords saved in version control that really can’t be shared with the outside world. I didn’t even think about that being an issue until I was actually deploying it and saw someone mention it in a blog tutorial. There are ways of getting around this problem, and I’ll have to look into them before I can do it.

Next is the issue of expanding the size of the explored social graph. Right now I have found about 2 million users and know the followers of 500 thousand of those. When I was doing everything in memory, it was very fast for me to expand it and relatively easy to merge them. Now I am using a database (MySQL) and doing operations on it for the social graph are just not fast. So I need something better.

Also, I want to make the TunkRank scores available as a data set or at least via an API, so I need to look into ways of doing that. Merb makes it pretty easy to deliver results as xml or json — I just have to get around to doing it. Right now you can find user jimbob’s TunkRank score either by entering “jimbob” in the search box on the main page or by going to the URL http://tunkrank.com/score/jimbob. Extracting it via json or xml will be just a matter of going to http://tunkrank.com/score/jimbob.format.

I need to provide additional scoring systems other than TunkRank, so that TunkRank can be compared. I’m not sure whether this isn’t something better served by just providing the data set and letting people play around in their own database or if I should provide alternate views. The former is more versatile, the latter will probably reach a larger audience.

Currently I show Google ads on TunkRank, mainly because I have spent a small amount of money on it and wouldn’t mind getting that back. If it starts making any real kind of money, that probably means the traffic has increased significantly and I will need to look at hosting it on EC2 or somewhere. I have no illusions that TunkRank will make me rich. I expect it will make me literally tens of dollars … poorer. :)

Finally, there is the issue of the score I display. I chose to show the percentile ranking because it’s easy to see where you are in comparison to other twitter users. If I just showed you the raw TunkRank score, you would have no frame of reference. My solution was to show you both. The downside is it groups all the “interesting” users into the top three percentiles. @NealRichter has put some thought into this, and I urge you to check out his post, leave some comments and help come up with a scoring mechanism that offers better granularity and yet lets you easily compare yourself to the rest of the world. Thus completes today’s desperate plea for crowdsourcing.

tunkrank-ravenA couple months ago, Daniel Tunkelang posted an algorithm on his blog that attempts to emulate PageRank for Twitter.  I implemented a toy version I dubbed TunkRank, and then suggested that name on his blog.  It got some traction, so I figured what the heck and decided to implement it on TunkRank.com.

Now, there appeared to be a little debate about just whether it is actually emulating PageRank or something else on Daniel’s blog, but I leave it to you to read the comments  on his post if you’re interested. There are also plenty of ideas there on the best way to establish a measure of influence.  I’ll limit the discussion in this post to the basics.

  1. The amount of attention you can give is spread out among all those you follow. The more you follow, the less attention you can give each one.
  2. Your influence depends on the amount of attention your followers can give you.

As a twitterer, your influence does not depend on how many people you follow. However, your usefulness as a follower does. Having higher influence depends on having many followers who follow relatively few people but are followed by many. Followers like that are more likely to pick up on your tweets, act on them, retweet them, whatever. You gain influence through the social graph thanks to their influence.

Therefore, your TunkRank score is a reflection of how much attention your followers can both directly give you and give to you.

I implemented this algorithm in Ruby using Merb, MySQL, Capistrano, nginx, and ActiveRecord (and, of course, Git for version control). While my job involves working on a web app, my role has mostly been on back-end NLP stuff. I’m still quite new to the whole Rails-level-web-app-world. For those who don’t know, Merb is a framework similar Ruby on Rails. So similar they are merging and will become Rails 3. ActiveRecord is an Object-relational Mapping (ORM) that Rails uses. The standard ORM for Merb is DataMapper, but I stuck with something I’m more familiar with to limit the variables in my little project.

There are many aspects of getting a web app up and running that I had only heard about in passing — and many more I’m still lost on. But I figured implementing TunkRank would be an interesting place to start.

Phase I – Data Collection

As I said, I implemented TunkRank as a toy the same night that Daniel posted his algorithm. Things seemed to work out quite nicely and I liked it on theoretical grounds as a measure. When I decided to implement the real version, the task of hammering Twitter millions of times suddenly loomed. I suppose I thought there were maybe about 1 million active accounts on Twitter. I have harvested over 2 million before slowing my harvesting down in favor of other development. I have also collected about 40 million edges in the social graph (user A follows user B is one edge). Of the 2 million users I have encountered, those 40 million edges are for only 25% of them. I still haven’t gotten the followers for the remaining 1.5 million. When I do so, I’m sure I’ll discover another million or three users I haven’t seen yet.

I stopped where I did because I was using Ruby’s marshal functionality to dump the social graph to disk. Each dump was weighing in around 250 MB and it was exceeding Marshal’s ability to function. At this point I threw everything into a MySQL database. Bleh! I can’t even describe the pain in the ass that was. If I were to do that again, I would certainly use PostgreSQL, and may still do so. Better yet, I would use some sort of column store database.  But it’s in the MySQL db now and running ok (just ok, not great or even well). MySQL dies quietly and annoyingly at times.  I hate it.

Doing the operations I was doing before in memory in ActiveRecord instead is mind-bogglingly slow by comparison, as you’d expect. Twitter just released the ability to pull all follower ids in one request, which would have made my life easier, but I still can benefit from it going forward. Also, I should have been storing more information about users than just the twitter username. Having to go back and collect that was slow and annoying, but it’s done.

Phase II – Implementing the Algorithm

The algorithm is simple to compute. Check out this gist for a version that calculates it using ActiveRecord. I’d post it here, but WordPress.com sucks and I’m stuck with it. The code uses ActiveRecord more than I’d like, so I rewrote it in SQL using twitter ids.  The gist for that is here.  The #{p} and #{self.twitter_id} are Ruby variables.

Phase III – Doing the Web App

The web app itself is both the most important step and the least fun for me. I very much enjoyed putting together the code to collect the Twitter social graph and then computing the TunkRank scores, but all the nuts and bolts of getting a web app up and running are tedious. Some of it is interesting. Merb isn’t so bad, though I feel like the documentation is shitty. There is an open source Merb book that is missing stuff in all the sections I needed the most. The API documentation isn’t bad, but isn’t easy to search for high level things that you would normally find in a tutorial. Nor should it be — it’s API documentation not a tutorial.

Fortunately, most things were easy enough that I could find a solution eventually. The whole deploying step is foreign to me, and I’m an apache noob so when it comes to balancing mongrel instances I’m like wtf?  Fortunately, I found a few tutorials I was able to piece together.

So the final product is hosted on my 1.8 GHz dual core Dell laptop with 2 GB RAM running Ubuntu 8.10. If you check it out, hopefully it won’t overtax my pathetic server and bring the site down. My data is becoming a little stale so if your username isn’t found, please be patient. When a new person is encountered, I queue them for processing.

Final Thoughts

You can also follow @tunkrank on Twitter. I originally had that account acting as a bot that tweets scores when it encounters influential users. Also,  I was having it auto-follow anyone it grades, but upon reflection, it occurred to me these two things were just plain spammy. I chalk it up to a bad decision in the dead of night. Instead I will just have it follow anyone who follows it.  See my twitter philosophy for how the account will be managed.  I will post updates there on changes, fixes, and up/downtime.

The TunkRank score itself can grow quite large, especially for users with a high number of followers. I present percentiles as the measure, so everything falls in the interval [0,100]. That does not properly reflect that someone in the 100th percentile can be almost 1000 times more influential than someone in the 99th. I’m open to suggestions about how better to show this information. Neal Richter had a few good ideas, perhaps I’ll try one of those.  Still, though, I’m left feeling a little dissatisfied by all of the scoring mechanisms (my own included). As Neal pointed out, his ideas are starting points and I’d like to hear what other people would like to see before proceeding with a different scoring method.

Let me know what you think.

There are tons of posts by people who have various views on how Twitter should be used.  Here is mine.  Perhaps it is typical, perhaps not.

  1. I follow anyone who follows me back.  Unfollow me, I unfollow you — with very few exceptions.  For this I use friendorfollow.
  2. I heavily filter the tweets I normally read using twalala.  If you haven’t used twalala, it lets you filter out phrases you don’t want to hear about (like sports stuff) and mute people who are contributing more noise than value to your twitterstream.  You could just unfollow them, but see point #1.
  3. I block repeat follow/unfollow spammers.  These are people who follow you to get the follow-back and then unfollow to improve their follow/follower ratio.  They annoy me, I can’t help it.
  4. Even though I heavily filter the main stream I look at, I do try to occasionally dip into my larger stream and interact with others.

I find that I have a hard time following all of the conversations in the 50 or so people in my filtered list.  It would be a full time job to follow all conversations of all 394 people I currently follow, and even then there would be tons that slip through the cracks.  So to me, the twitterstream is just that.  A stream.  You step in and experience some of the water as it passes, and then step out.  You can’t lament the water that passed untouched.

So what do you think?  Is this an acceptable way of using twitter?

P.S.  Follow me on twitter.  I promise I’ll follow you back.  Also, thanks to Daniel for unwittingly encouraging me to write this post. :)

I hereby declare that the word literally has not lost its meaning, despite a rash of rumors to the contrary.

What would it even mean for a word to lose its meaning? A word can change from one meaning to another, certainly.  Maybe you could argue that a word that has dropped out of usage has lost its meaning..

You hear complaints of that sort all the time, but what is being missed is the fact that language is fluid. Meanings evolve as the need arises (and there are many kinds of  needs). Speakers each carry a somewhat different representation of the language in their heads, and once like-minded speakers agree on a novel usage and adapt it into their own representations, language evolves.

The debate over literally is literally nothing new. Turning to old faithful, the American Heritage dictionary:

Usage Note: For more than a hundred years, critics have remarked on the incoherency of using literally in a way that suggests the exact opposite of its primary sense of “in a manner that accords with the literal sense of the words.” In 1926, for example, H.W. Fowler cited the example “The 300,000 Unionists … will be literally thrown to the wolves.” The practice does not stem from a change in the meaning of literally itself—if it did, the word would long since have come to mean “virtually” or “figuratively”—but from a natural tendency to use the word as a general intensive, as in They had literally no help from the government on the project, where no contrast with the figurative sense of the words is intended.

So literally has been known to be a general intensive for quite some time. Why the fuss now?

Twitter is my new linguistic data collection engine, btw.  Just some of the multitude of great results:

References

Dictionary.com, “literally,” in The American Heritage® Dictionary of the English Language, Fourth Edition. Source location: Houghton Mifflin Company, 2004. http://dictionary.reference.com/browse/literally. Available: http://dictionary.reference.com. Accessed: January 27, 2009.

Twitter Wordle

Posted: 10 January 2009 in Uncategorized
Tags: , , , ,

I was recently pointed to @miljoshi‘s blog and a post on twitter word clouds (using Wordle, of course!).  My twitter background was made using Wordle from a sampling of text from my blog.  Tweetstats offers the ability to create a Wordle cloud automatically from your tweets, which is fairly cool.  Mine is below.  It’s dominated by twitpic, since I frequently use it for posting pictures.

//twitter.com/ealdent"" target=""_blank"">twitter stream</a>.

Word cloud for my twitter stream.

Update: Here’s my wordle after removing some words that don’t reflect the content of my tweets as much (e.g. good, great, new, old, etc.). Good idea, Melinda!

//twitter.com/ealdent"" target=""_blank"">twitter stream</a>.

Updated word cloud for my twitter stream.

Mars Phoenix gets a lame-ass epitaph

Posted: 5 November 2008 in Uncategorized
Tags: , , , , , , ,

Well the Wired contest to come up with an epitaph for the Mars Phoenix lander has ended and the final choice blows, in my opinion.

Veni, vidi, fodi. (I came, I saw, I dug) 

The number three choice wasn’t so bad:

It is enough for me. But for you, I plead: go farther, still. 

My choice, as I mentioned before, was ranked at #4, so not too bad.  I scrolled down to the very end of the list and looked for the most hated epitaphs.  There were some real stinkers, to be sure, but also some funny ones.  Here are several of the turdiest:

  • this weather gives new meaning to the old saying, ‘blue balls in a nor’easter’
  • May he rest in peace ~~~Lance was here ’69~~~
  • Go to the light. Like great men and myths, (Elvis, Tupac, BigFoot, Nessie) your legend will live on after your tweetstream goes flatline.
  • Better Dead on Red. The First of what will be many efforts to raise us from the mire of our own making.

@MarsPhoenix is a twitter success story.  It’s also a NASA success story.  Oh and also a scientific success for all it has done on Mars.  As six months of night approach, the Phoenix probe was slowly shutting down systems to finish analyses.  A couple of days ago, a dust storm diminished the day time charging cycle enough that it caused the lander to go into hibernation.  NASA is going to try to revive the it this weekend, but the prospects are grim.  Even more grim are the chances that the probe will awake come spring.  Temperatures at the Martian poles go so low in the winter, they exceed the minimum tolerance for electrical circuits.

But back to the Twitter success story.  As of right now, @MarsPhoenix has 37,284 followers.  That makes it one of the most followed users on Twitter.  For the past few months, NASA has been posting updates posing as the probe.  The updates take the form of first-person snippets of information and answers to questions from users.  Overall, it has been great PR, keeping people up-to-date on space exploration in a completely new way.  We can’t exactly have a live feed from Mars, but by personifying the probe and getting people involved, NASA has really done a lot for improving public involvement in the mission.

NASA has expanded their twittering to a whole host of other missions.  Most notable (to me) amongst them are the Cassini probe (which is orbiting Saturn),  the Lunar Reconnaissance Orbiter, and the Spirit and Opportunity rovers.  So if you twitter, they might be worth some of your time.

@MarsPhoenix posted the following earlier today:

I should stay well-preserved in this cold. I’ll be humankind’s monument here for centuries, eons, until future explorers come for me ;-)

In honor of its imminent passing, Wired is running a contest to find the best epitaph for Phoenix.  My current favorite is:  ”Every robotic lander dies. Not every robotic lander truly lives.”  I’m getting a little choked up..

Twitrratr

Posted: 27 October 2008 in Uncategorized
Tags: , , , , ,

Twitrratr is a new service that attempts to do sentiment analysis on Twitter (follow me while you’re at it).  According to their about page, they started off by tracking opinions on Obama but have since expanded to any term.  Enter a keyword and it searches twitter for occurrences.  It then assigns a sentiment to each post and returns percentages of positive, neutral, and negative tweets for that word.  You can also track your own sentiment by searching for @your-username.  I come up neutral, but there’s not a lot of data to go on there.

Their method appears to be fairly simple.  They have a collection of adjectives with sentiment values (negative, positive) and based on what appears in a given tweet, they can classify a sentence.  Of course, this is probably low recall (meaning it misses a lot of tweets that do express sentiment) since sentiment can be expressed without using adjectives.  I’m not sure if it tries to do anything with negation, but so far my scans of results look like it ignores it.

So even though it’s pretty ghetto, it’s a nice toy.  If they care to extend the algorithm, they have some pretty cool data to work with.  I think it would be cool to get some (possibly donated, probably not paid) human effort together to tag some of their data to release as a research dataset.

Daedal in the tubes

Posted: 8 June 2008 in Uncategorized
Tags: , , , , ,

Thanks to TwitPic, I can post these pics directly to twitter from my cell phone. Good times.

Daedalus in the tubes

A couple of days ago, I wrote a script that would tweet anything you plurked. Thanks to some code from Neville Newey (based on PHP code by Charl van Niekerk), the plurk.py script I wrote has been updated to both plurk your tweets and tweet your plurks. This should work on both windows and linux machines. If you have access to a linux machine, I suggest setting up a cron job to take care of this. As I mentioned in the previous post, if you set up a cron job, be sure to change the path to plurkdb.dat to an absolute path. I have done the most testing on this with python 2.4 in linux.

This code is open source under the Creative Commons 3.0 Attribution license that this blog uses Creative Commons BSD license. Neville’s code appears to be under CC:Attribution 2.5 for South Africa, by what I could glean from his site. I have considered making this an open source project under Google code but have yet to take it all the way. Google sets a lifetime limit of 10 projects, so I will continue to hoard those against future need. If you make modifications to the code, please let me know and I will probably post them here and in the code for future releases, so we all win.

Note that the command line parameters have changed:

plurk.py <twitter username> <twitter password> <plurk username> <plurk password>

And of course, as with all software, use at your own risk.