In a recent press release, kannuu is claiming to have revolutionized text entry. They claim that you can now perform text entry with just your thumb at the same speed of a regular keyboard. Too good to be true? Here is their method, complete with Hype™.
“Advancing text entry exponentially, kannuu’s powerful and precise Partial Word Completion® technology enables users with a fail-safe text entry solution. The kannuu application appears on device, as a four-point diamond shape, comprised of the most popular letters in the database it is indexing, with the center kannuu logo leading to the next set of choices.”
They registered a trademark on the phrase “partial word completion”?? Blerg. Not only do they have an über lame web 2.0 name in lowercase, they gotta stop people from marketing a similar technology under their oh-so-not-original name. Why does this make me so angry? Anyhow, I’m running off sideways on a rant that is pretty insignificant.
The real point here is the potential for coolness. So here is the technology: you enter a letter, it presents you with a “diamond” shape and the most common letters or group of letters that follow the letter(s) you just entered. In this way, most of your everyday phrases will be right up at the top of the list of things you’re presented so you could potentially be entering words with fewer keystrokes and all with very little thumb movement. This could really revolutionize key input and maybe bring pocket computers to reality [source].
So here is what I think the technology is based on. A very common technique in language technologies is the use of n-grams. So they use a character-based n-gram model to predict the most common letter or letters that you would type next based on some corpus. This isn’t anything new. Cell phones already have a T9 input method that guesses the most common word based on the single letters you choose. This isn’t all that different. If they have done the interface well, that could be a serious improvement.
If you’re interested in character-based n-gram models, I go into them in more depth after the jump.
Think about this. In English, the most common letter is e. If you don’t believe me, count the letters in the paragraph above that starts with “the real point”.
| Letter | Count | Letter | Count | Letter | Count |
|---|---|---|---|---|---|
| e | 60 | h | 21 | c | 8 |
| t | 53 | a | 19 | a | 7 |
| o | 41 | y | 17 | g | 6 |
| r | 30 | u | 16 | b | 5 |
| s | 27 | p | 13 | v | 4 |
| l | 25 | d | 11 | k | 4 |
| i | 24 | m | 10 | z | 1 |
| n | 22 | w | 8 | j | 1 |
You can take these counts and get probabilities for the occurrences of each letter. These probabilities are useful because they give you a general idea of how often each letter occurs in English. Of course, the corpus I’m pulling them from is way too small to be representative of all of English, but it’s a start. Since we are looking at the probability of only one character, we call this a unigram model (1-gram). We can generate English text with this unigram model by sampling from the distribution. Some python code for that is below:
# code for generating from a toy unigram character-based language model for English import random letter_counts = [(1, 'j'), (1, 'z'), (4, 'k'), (4, 'v'), (5, 'b'), (6, 'g'), (7, 'f'), (8, 'c'), (8, 'w'), (10, 'm'), (11, 'd'), (13, 'p'), (16, 'u'), (17, 'y'), (19, 'a'), (21, 'h'), (22, 'n'), (24, 'i'), (25, 'l'), (27, 's'), (30, 'r'), (41, 'o'), (53, 't'), (60, 'e'), (93, ' ')] d = [letter_counts[0]] # initialize the sampling list to the first item in the list of counts tmp = "" for i in xrange(1, len(letter_counts), 1): d.append((d[i-1][0] + letter_counts[i][0], letter_counts[i][1])) for i in xrange(526): r = random.randint(0,525) for letter_pair in d: if r < letter_pair[0]: tmp += letter_pair[1] break print tmp
So we take the list containing the counts of the letters (and the space character) and then create a new list where each element consists of a tuple. The first element of the tuple is the sum of all counts of items in the list before it plus the count for the character, which is the second element in the tuple. This is just a sum of all the previous letter counts. Then we choose a random number and if the random number is less than the current sum, we add that letter to the output string. The idea is that if we have chosen our probabilities correctly, we should get something that looks like English:
h spiyoweytnt berkctpllittdulwa nhesh oensnuvuwgyysyeamcgt fir v oe obnstotitres ttelsdmt e hr liruorteevunaet uwyooe trewstpaayssers onyewiasne r oae yceu ruiz lorpse hg otderdretuk otlm odulloetnssaio wtygre ro gh eietvyrryeteetoat au y r vpte peascotog reiol cth enchl rpsn rtge saldi tngai iuentbdysbte eagrog nuee wetemsr srultnomisojhteolrtu t neyerhul io oe eeror dalts lr ib somunerno situlcihi e a j ec eotutnerrnu dpl io prntb t yli htsttiwmyop oydstwihn h loeebleeh ovebnrtoyldothdgo yto l bi
And of course, it looks nothing like English. The problem is that it doesn’t take the context of the letters into account. If you see a word that starts with t, what do you think the next letter will be? Probably h, right? And what’s the next letter after that? E. A lot of other letters co-occur in English, like ch and ou, to name a couple. So we can build a bigram model and this gets a bit closer to generating English-like words. The problem then becomes sparsity. We need a bigger corpus to accurately compute the probabilities for the bigram (2-gram) character based language model. But here is what a bigram model would generate using this data:
bey temm onowid eryobethigzee led hiorury tethp s nil sotr t iokd ngayle tt ndu s shtthefey ord tus co ere rsbemotemo cetntpoioouayinpuybeskeenlievngrsen yp e assfoan lhepud keleolerme ketlehuh or mene pndy ng ds keoc eerryt ttris hea thea tjug s anhee li h l ye sthillked ou aiz iheer mu aomheurt se be ct s ompulun urhrasdserrs t cth the rttrireese s has dtlo lurd t hnu luesetnptellamofery rstye hleoug hendisluupntwore sbethumouulf d r leitd ketib e zes io wu amerf rooua ketyomd al phesteaeyryntofe ioayalougyiti
So, it looks a little better, but it’s still clogged with junk. You’ll never get a model that produces all English words this way, but when you analyze enough English, you can reasonably predict the most common set of next letters you’ll use. There are plenty of other methods for doing this and improving the results.
One of the things the SciAm blog mentioned was having to enter scientific text. If they train the n-gram model using regular conversational English (or some corpus of email or something like that), scientific jargon is very unlikely to appear. The result is a much slower entry process. What they need to allow you to do is “create your own corpus.” You add the documents that are best representative of the kind of stuff you’ll be writing and it builds a model from that. Of course, it will need a lot of text to be really good, but it should start to work decently after a few hundred thousand words (or maybe even less). I know you can detect which language a text is written in decently well with less than a hundred thousand words of training text per language. If it allowed custom corpora for training, all it would need is the ability to switch language modes and you’re set. Also, it could learn which choices you are more likely to make over time and start suggesting things that you choose more often.



Posted by Jason Adams on 28 September 2007 at 09:18:01
An annoying thing about this code plugin for wordpress is the fact that it converts into spaces but converts < into <. So to get this code to work you will have to change the < in the code above to <.
Posted by Jason Adams on 28 September 2007 at 09:19:47
Also, to clarify, the bigram generation part looks a little better in that some of the letter combinations look slightly more like English. I don’t mean to say that it actually looks decent. The data I used to generate the distribution was much too small to build even a basic bigram model.
Posted by Alexandre Rafalovitch on 16 November 2007 at 16:09:53
That does sounds pretty lame, especially to assume that they are the first.
For example of no-hands entering method (has to be even cooler, right?), see Dasher: http://www.inference.phy.cam.ac.uk/dasher/