Posts Tagged ‘code’

I just published the simple-random ruby gem, which is ported from C# code by John D. Cook.  You can view the source on github or install the gem via rubygems:

gem install simple-random

The gem allows you to sample from the following distributions:

  • Beta
  • Cauchy
  • Chi Square
  • Exponential
  • Gamma
  • Inverse Gamma
  • Laplace (double exponential)
  • Normal
  • Student t
  • Uniform
  • Weibull

Simple examples:

require 'rubygems'
require 'simple-random'

r = SimpleRandom.new
r.uniform # => 0.127064087195322
r.normal(5, 1) # => 5.71972152940515

Java maps and sorting

Posted: 1 August 2009 in Uncategorized
Tags: , , , , ,

I’m always a little annoyed I have to implement sorting Map keys by their values myself in Java.  It seems like they should be a part of the standard Collections library or something.  Maybe they are and I just haven’t seen it?  My solution (gist) is based on feedback from Josh in the comments to a previous post. How does that look to you?

Jekyll and Code

Posted: 8 January 2009 in Uncategorized
Tags: , , , , , , ,

Tom Preston-Werner, aka mojombo, rocks.  When GitHub announced GitHub Pages recently, they pointed to a new blog engine, Jekyll.  Jekyll generates the blog as a set of static pages — no database reads, no PHP, just fast HTML.  I was instantly drawn to it, and since I’ve been itching to switch blog engines, I damn near moved this blog.  It would be hosted on GitHub, for free.  And it would be backed up using my favorite version control system.  I would have complete access to all of my content.  If WordPress went belly up, I would lose all of my content.  That bothers me.

Jekyll is still in its infancy.  But for two things, I would switch right now.  First, support for tags is incomplete, so pages on my blog such as http://mendicantbug.com/category/computational-linguistics/ would no longer be supported under Jekyll.  That would play hell with my Google traffic.  I’m willing to make that sacrifice since most of that traffic is from people who don’t care about the main topics I’m interested in.  Second, and this is the killer, Jekyll does not support comments.  Yet.  The good news is, it can be forked and someone may implement comments.  I hope so, but the static nature of Jekyll means handling comments is not very straightforward.  I can imagine how it might be done, so we’ll see.  I suppose I could do it myself, but my plate is so full right now I’m having a hard time getting what I need to get done done.

So what I’m doing instead, for now, is hosting my code there.  Jekyll has code highlighting built-in using Liquid.  Handy!  I put up the source for my post on Bandwidth simulation.  I’ll be adding more soon, which I’ll make note of, if for some reason you’re actually interested.

Since Ruby is my new favorite toy, I thought it would be fun to try my hand at C extensions.  I came across David Blei’s C code for Latent Dirichlet Allocation and it looked simple enough to convert into a Ruby module.  Ruby makes it very easy to wrap some C functions (which is good to know if you need a really fast implementation of something that gets called alot).  Wrapping a C library is slightly harder, but not horribly so.  Probably most of my challenge was the fact that it’s been so long since I wrote anything in C.

Since the code is open source, I decided to release the Ruby wrapper as a gem on GitHub.  I chose GitHub over RubyForge, because it uses Git and freakin’ rocks.  But GitHub is a story for another day.  Feel free to contribute and extend the project if you’re so inclined.

A basic usage example:

require 'lda'
# create an Lda object for training
lda = Lda::Lda.new
corpus = Lda::Corpus.new("data/data_file.dat")
lda.corpus = corpus
# run EM algorithm using random starting points
lda.em("random")
lda.load_vocabulary("data/vocab.txt")
# print the topic 20 words per topic
lda.print_topics(20)

You can also download the gem from GitHub directly:

gem sources -a http://gems.github.com
sudo gem install ealdent-lda-ruby

You only need the first line if you haven’t added GitHub to your sources before.

A couple of days ago, I wrote a script that would tweet anything you plurked. Thanks to some code from Neville Newey (based on PHP code by Charl van Niekerk), the plurk.py script I wrote has been updated to both plurk your tweets and tweet your plurks. This should work on both windows and linux machines. If you have access to a linux machine, I suggest setting up a cron job to take care of this. As I mentioned in the previous post, if you set up a cron job, be sure to change the path to plurkdb.dat to an absolute path. I have done the most testing on this with python 2.4 in linux.

This code is open source under the Creative Commons 3.0 Attribution license that this blog uses Creative Commons BSD license. Neville’s code appears to be under CC:Attribution 2.5 for South Africa, by what I could glean from his site. I have considered making this an open source project under Google code but have yet to take it all the way. Google sets a lifetime limit of 10 projects, so I will continue to hoard those against future need. If you make modifications to the code, please let me know and I will probably post them here and in the code for future releases, so we all win.

Note that the command line parameters have changed:

plurk.py <twitter username> <twitter password> <plurk username> <plurk password>

And of course, as with all software, use at your own risk.

Tweet your plurks

Posted: 2 June 2008 in Uncategorized
Tags: , , , , , , , ,

If you want to use Plurk, but aren’t ready to leave Twitter, I wrote a little python script you can use to automatically mirror your plurks on Twitter. This will not work for response plurks, but your main plurks will be extracted and posted to your Twitter account with the prefix “plurking:” followed by your plurk.

The resulting tweet looks like this:

sample of what the script outputs in twitter

Download the script and set it up as a cron job (or you could execute it manually). It should work with python 2.4 and later. It stores a plurkdb.dat file (which you should probably assign an absolute path to, depending on the behavior of cron on your system). This file is checked every time it is run to make sure that duplicate plurks aren’t being tweeted. You should pass the following parameters on the command line (or modify the script so they are hardcoded, if you want): <twitter username> <twitter password> <plurk username> <plurk password>. Update: see later post on updated plurk script.  And like with all software, use at your own risk.

Please let me know if you have any problems with it or see room for improvement. I hacked this out in a hurry, so …

Java Properties

Posted: 5 December 2007 in Uncategorized
Tags: , , , , , ,

I discovered the java.util.Properties class a couple weeks ago in the ginormous Java API docs. If you’ve ever created a software project where you have a lot of different settings that change frequently, this is the class for you. In my research, I implement all these different algorithms for various things, find out they don’t work, implement something else, rinse, repeat. Being able to look back at my results from two months ago and then loading the exact same configuration and running the experiment all over again is a must. Enter the Properties class. (more…)

brainfscking set theory

Posted: 3 December 2007 in Uncategorized
Tags: , , , , , ,

I mentioned the esoteric programming language brainfuck a little while back. It consists of 8 operations and was created in order to make the smallest compiler in the world (I think the current best is 174 bytes). I was reading a post over on Good Math, Bad Math that defines arithmetic in terms of sets. Pretty basic if you’ve done anything with set theory, but Mark has a clear way of explaining things so I usually try to read all of his posts. I’ve been playing catch-up today.  It struck me immediately how closely the set form that Mark describes matches the syntax/logical structure of brainfuck.  So I decided to play around a little.  Read on for more. (more…)

brainfsck

Posted: 28 November 2007 in Uncategorized
Tags: , , , , ,

Either the coolest or the stupidest programming language in the world, brainfuck was designed by Urban Müller in order to create the world’s smallest compiler of a Turing-complete programming language. Originally his compiler was 240 bytes in size, but he reportedly got it down to about 200 bytes. Others have gotten it below that. The language consists of only 8 operations, which I will go into after the jump.

(more…)

Morning Madness

Posted: 2 November 2007 in Uncategorized
Tags: , , , , , , , , , ,

Ever get pissed twice before you’ve really even opened your eyes? This is why I shouldn’t read my RSS feeds so early in the morning. At the top of the list is Bush equating Democrats who oppose the war (as if it could be called opposition, anyway) to those who ignored Hitler and Lenin and then Hillary firing back. Am I mad at Bush for making this analogy? No and I think he’s correct, but not in the way he thinks. I’m more angry at Hillary for firing back and not recognizing her own culpability. The Sheepocrats sat back and did nothing four years ago when this war began and passed the Patriot Act before that. They have endorsed the war at every stage since and even their current so-called opposition is luke-warm and putrid with its weasliness. So yeah, they are like people who ignored the rise of Hitler and Lenin. If she had recognized that and said it publicly, it would have done her credit.

 

Next up, I was reading a few bit twiddling hacks and came across a nice one for branchless absolute value [hat tip]. The hacks are all in the public domain, too, so that’s good. He does list the occasional variation that is patented, an enormously helpful fact if you’re producing commercial software. So here is the patented version of the branchless absolute value:


int v; // we want to find the absolute value of v
int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v ^ mask) - mask;

The last ^ (XOR) – (subtract) combination represents the patent. What works also?

r = (v + mask) ^ mask;

As Sean points out, though, the patent probably could be contested if the holder (none other than Sun Microsystems) ever tried to enforce it. So what ticked me off is that such a thing could be patented. I raise my hands in impotent fury at the ludicrousness of software patents. I don’t blame the inventors for them, it’s something you pretty much have to do these days. I blame the system that makes that true.

Update

Did some benchmarks on the two versions of absolute value given above.  Using a 3.06GHz processor, I could run 4 billion absolute values in 18.916 +/- 0.021 seconds for the patented version and 18.906 +/- 0.026 seconds for the free version.  So no need to even bother with the patented version it looks like.