You are currently browsing the daily archive for October 4th, 2007.

For one of my homework assignments, I have to solve words encrypted via a substitution cipher. These ciphers were insecure before computers came around, but they are still fun. If you’re unfamiliar with them, you’d probably recognize them as the cryptograms (”Cryptoquote”) in your local newspaper. In the simplest form, each letter is mapped to a different letter of the alphabet. A lot of people do these for fun and I know at least one person reading this does. The result is a run of text that might look like:

ov umy rfgs f nmg
MY DOG EATS A LOT

There are many ways of going about solving substitution ciphers, but a common way is by counting frequencies of characters. As most people know, e tends to occur more than other letters in most written English. The rest of the letters typically follow a pattern, as well, but that pattern degenerates once you leave the most common letters. The domain of the text you are examining is fundamentally important here. By domain I mean whether this text is from a newspaper, an IM, transcribed speech, etc. You can also look at bigrams, two character sequences, to find the most commonly appearing sequences. In English, th appears much more ty, but ty still occurs. When trying to solve substitution ciphers this way, you are essentially matching the frequency distribution of the cipher text to the distribution of English and building a mapping from there. To put that a different way, you are matching up the most common letters or sequences in the garbled text with the most common real English letters or sequences.

Once the frequency counts have revealed the most common letters, many people proceed to deduction to eliminate the rest. Of course, this requires knowledge of English words directly, which has an impact on computational approaches to solving substitution ciphers automatically. I’m curious what approaches people have taken (if any) other than using a dictionary of English words and trying to find matches from there.

Read the rest of this entry »

About Me

Jason M. Adams

My name is Jason Adams and I work on opinion mining for a growing startup in Atlanta, GA.

Calendar

October 2007
S M T W T F S
« Sep   Nov »
 123456
78910111213
14151617181920
21222324252627
28293031  

Archives

Site Statistics

  • 99,883 reads

Site Information

Contact me: jaso...@gmail.com

Creative Commons License

This work by Jason M. Adams is licensed under a Creative Commons Attribution 3.0 License.

Header image credit seakwenby.

Random Crap