Posts Tagged ‘english’

Paul Payak of the Global Language Monitor is claiming the 1 millionth English word is coming soon.  He says a new English word is coined every 98 minutes, so the 1 million marker will arrive about 15 days hence.  The CBS article that tipped me off to this is pretty amusing in the quotes it selected from linguists, which resoundingly cried “bullshit.”  But the best quote came from Payak himself:

We believe words can be counted if you define them in the right way. You can count them like anything else in science. You can count how many atoms there are in the ocean.

Let’s think about counting the atoms in the ocean for a moment. What about where rivers flow into the ocean? Where is the boundary line? Salt and fresh water are mingling quite a bit and finding the exact boundary is pretty much impossible. If we draw an arbitrary line, surely we will get too much in one place and too little in another. Also, what about rain and evaporation? Counting the atoms would require an instantaneous snapshot of the entire ocean at the atomic level. It can’t be done.

You run into similar problems counting words.  Compound words blend into single words and words leave the language as well as enter it.  How do you detect this?  You’d need a snapshot of the entire English language as it is spoken, typed, and read all around the world.  What is a word in one dialect isn’t necessarily a word in another dialect.  Where do you draw the line?

This is a subject much larger than the treatment I am about to give it.  Linguistic homogenization occurs in modern states where regional dialects are marginalized and a standard dialect is advanced as the primary method for acceptable public communication.  The powerful favoring a single dialect is nothing new, but now more than ever, states are able to impose this on the wider populace.  European countries encourage one or two primary languages to be taught in school and used in public.  America does something similar with Standard American English.  Speaking a non-standard dialect is often seen as a barrier to employment and movement in higher social circles.  Basically, the snobs keep you down if you don’t talk like they do.

I was reading on Language Log earlier about the Uniformitarian Principle.  Uniformitarianism is simply the idea that things are now as they have always been, so we can learn how things were by learning how they are now.  Language Log describes how modern Europe no longer holds the key to language in prehistoric Europe thanks to the ability of modern states to impose linguistic homogenization.  Think about that for a second.  Modern states, presumably democratic, are so powerful they even tell you how to talk.  Perhaps even how you think.  Is that a paranoid leap?  Am I overstating it?  Even absolute dictators of past centuries didn’t have that kind of power.

But it’s not like one single person is doing this.  Instead they are doing it.  The ineffable they.  But if they are telling us how to think, why do we listen?  We can’t help it, we’re too young when it happens, and then we become them.

Absolute dictators of the past could not do this for many reasons.  They didn’t have the infrastructure to educate the masses, nor did they have popular media to transmit one dialect into every home on a daily basis.  A population too large for all of its parts to remain in constant contact will begin to diverge dialectally.  But educating the masses would have been looked down upon anyway since giving people too many ideas tends to make them question things like a single all-powerful leader calling all the shots.  So now that we are educated enough to know all-powerful dictators are bad news, we have replaced them with power structures more complicated and inscrutable.

A recent post by Daniel Lemire posing a simple mathematical puzzle revealed in stark contrast the bars of my mental prison.  So what are the bars like of this bigger prison we cannot see?  Philip K Dick called it the Black Iron Prison.  I’ve always found that concept appealing.

I was asked recently about the motivation for Abney’s DP (determiner phrase) hypothesis. That is, that determiners are not part of English noun phrases but head up their own phrases of which NPs are complements. I couldn’t remember the justification I was given in my Syntax I class, so I went back to the textbook (Syntax: A Generative Introduction by Andrew Carnie). I found the following interesting excerpt:

“… for lack of a better place to put them, we put determiners … in the specifiers of NPs. This, however, violates one of the basic principles underlying X-bar theory: All non-head material must be phrasal. Notice that this principle is a theoretical rather than an empirical requirement (i.e., it is motivated by the elegance of the theory and not by any data), but it is a nice idea from a mathematical point of view, and it would be good if we could show that it has some empirical basis.”

This clashes a bit with my empirical sensibilities. It represents very much the rational point of view in linguistics, that we can probe our own understanding of language by judging what we perceive to be grammatical or ungrammatical. The empiricist view would look at it from another angle: does it appear in data? So the theoretical view might be “nice” but if it is not supported by the data, it is crap.

Treebanks don’t use DPs (at least none that I’ve seen), so automatic parsers typically have no concept of them. I wonder if they would add any value?  I’m guessing they would just run into sparsity issues since another set of tags have to be estimated.   But who knows, the extra structure might be helpful in complex situations.

Overgermanification

Posted: 8 February 2008 in Uncategorized
Tags: , , , , , , ,

I was just reading a Wired article about the deaths of two AI researchers:  Chris McKinstry and Pushpinder Singh.  Both were working on strong AI (or at least, had the hope of it).  Both committed suicide and did it within a month of each other.  McKinstry claimed that his system would be aware in a short time.  If GAC ever became aware, it has vanished into the cloud.  So all very interesting and I recommend the article.  Not if you want a serious read about the topics they researched, but it presents an interesting narrative of two lives with eerie parallels.

What inspired this post is a minor quibble about a word that many English speakers have surely heard:  Wunderkind.  In German, it literally means “wonder child” and is often applied in English to a child prodigy or a young person whose star is on the rise.  Here is an excerpt from the Wired article:

Push, as everyone called him, had also taught himself to code — first on a VIC-20, then by making computer games for an Amiga and an Apple IIe. His father, Mahender, a topographer and mapmaker who had studied advanced mathematics, encouraged the wüenderkind. Singh was brilliant, ambitious, and strong-willed. In ninth grade, he had created his own sound digitizer and taught it to play a song he was supposed to be practicing for his piano lessons. “I don’t want to learn piano anymore, I want to learn this,” he said. [emphasis mine]

When you have a German vowel with an umlaut, it is rendered in English orthography as the vowel + e.  So ü would be written in English as ue.  Wunderkind has no umlaut in German, so this would not be necessary.  Plus, you wouldn’t have to add the e anyway since they already included the umlaut.  Shoddy editorial work, but it made me lol.

It’s a morning of fun new words! First I hear greenwashing on the Today Show, which Donna likes to watch while she eats brekkie. Then, Language Log delights me with nanoblahblah, henchgoon, and celebufreak. Erin McKean, the Dictionary Evangelist, twitters words of the day so I also got a nice infusion when I examined her twitter feed for the past week or so. A few selections I particularly like that she found: paracosm, yostelumpet, and anthroponymy. And now for the definitions!

  • anthroponymy – the study of the names of human beings [emckean@twitter]
  • celebufreak – a freak with fame (e.g. Kim Kardashian) [Wordlustitude]
  • greenwashing – marketing a product as green when it’s really not [Today show]
  • henchgoon – alternate term for administrative assistant or “assistant of doom” [Wordlustitude]
  • nanoblahblah – very, very tiny nonsense (nanotechnobabble) [Wordlustitude]
  • paracosm – a private imaginary world, esp. made by children to escape harsh circumstances (think Pan’s Labyrinth) [emckean@twitter]
  • yodelumpet – a singing style that combines yodeling and Louis-Armstrong-style trumpet-like sounds [emckean@twitter]

Please note that the twitter links are stable in terms of link permanence, but are unstable in twitter’s ability to serve up the page. So if at first you get a bizarre message with birds, try again. This has also led to the re-discovery of the most excellent Wordlustitude site. I had seen a while ago but for whatever reason didn’t subscribe to it. This has been remedied, and if you like neologisms, I recommend you do the same.

There is nothing unusual about verbing nouns in English.  Despite the fact that your English teacher may have told you not to do this, it is common practice, especially on the intarwebs.  Verbing brand names to mean the primary action performed by the chief product of that brand is less common, but we all know about “googling.”  Just sitting here, trying to drink my morning coffee, I couldn’t come up with another example.

But what got me thinking about this is another example used in today’s User Friendly.  One character says,

“You’re gonna ebay it to goths, aren’t you.” [emphasis mine]

I had never heard the brand name ebay used in verb form, meaning to sell something on ebay (the primary function of their chief product).   It is not uncommon, though.  Searching the Google for +”to ebay it”, I found that at least 10% of the top few pages of results were just this construction (versus “to ebay.  It …”).  I estimate from that there are about 19,000 uses of ebay as a verb in this context, and no doubt many others in variations (e.g. “I ebayed my watch”).

Another example that just occurred to me, but which is pretty artificial, is to twitter, meaning to post something on Twitter.  I say this is artificial because Twitter openly encourages and suggests this terminology.  It was not an emergent construct, but an imposed one.  It has been adopted by the overwhelming majority of users, though.  [follow me on twitter]

So here is my question:  does this only work for Internet companies?  I’m probably forgetting some obvious brick-and-mortar company for which we have verbed their brand, so please tell me if I have.  Or is it that Internet companies are especially conducive to this construction because so many Internet companies start off with only one service and become known by that service.  Google is search, ebay is selling crap through auctions, twitter is … twittering.   If this only works for Internet companies, why did we start doing it in the first place?

And I just came up with a brick-and-mortar example:  hoover.  You can hoover down a plate of food, meaning to suck something up like a champ.  But my classification still holds, that is the primary function of their chief product (or at least the main product that people know them by).  Marketing people have already taken this to heart, I’m sure.  You need an easy name that sounds like English.  Just like with scientific terminology, no one wants to Dinklefwat their dishes.

A couple months ago, I wrote about Richard Hogg dying. He was a professor at the University of Manchester who edited the Cambridge History of the English Language and did a lot of work on Old English morphology. I had corresponded with him briefly a few months before he died about a lab project on computational morphology. I was making a morphological analyzer for Old English verbs. I’m actually still working on it and generalizing it to the rest of the language. Anyhow, as I said before, he was a nice and helpful guy and it was a shame to see him go.

Now, the International Society for the Linguistics of English (ISLE) has set up a scholarship in his honor. Early career scholars who are members of ISLE (membership can be applied for at the time of submission) are eligible. Early career means you either haven’t gotten your PhD yet or got it within the past two years. Masters and undergraduate applicants are acceptable, but the expected entrant is a PhD candidate/recent recipient. The paper may be on any research-related topic in English or English linguistics and will be judged on originality and the contribution of its results. The prize is £500 and the submission deadline is March 31, 2008.

In a recent press release, kannuu is claiming to have revolutionized text entry. They claim that you can now perform text entry with just your thumb at the same speed of a regular keyboard. Too good to be true? Here is their method, complete with Hype™.

“Advancing text entry exponentially, kannuu’s powerful and precise Partial Word Completion® technology enables users with a fail-safe text entry solution. The kannuu application appears on device, as a four-point diamond shape, comprised of the most popular letters in the database it is indexing, with the center kannuu logo leading to the next set of choices.”

They registered a trademark on the phrase “partial word completion”?? Blerg. Not only do they have an über lame web 2.0 name in lowercase, they gotta stop people from marketing a similar technology under their oh-so-not-original name. Why does this make me so angry? Anyhow, I’m running off sideways on a rant that is pretty insignificant.

The real point here is the potential for coolness. So here is the technology: you enter a letter, it presents you with a “diamond” shape and the most common letters or group of letters that follow the letter(s) you just entered. In this way, most of your everyday phrases will be right up at the top of the list of things you’re presented so you could potentially be entering words with fewer keystrokes and all with very little thumb movement. This could really revolutionize key input and maybe bring pocket computers to reality [source].

So here is what I think the technology is based on. A very common technique in language technologies is the use of n-grams. So they use a character-based n-gram model to predict the most common letter or letters that you would type next based on some corpus. This isn’t anything new. Cell phones already have a T9 input method that guesses the most common word based on the single letters you choose. This isn’t all that different. If they have done the interface well, that could be a serious improvement.

If you’re interested in character-based n-gram models, I go into them in more depth after the jump.

(more…)

Another think coming

Posted: 28 September 2007 in Uncategorized
Tags: , , , , , ,

Language Log brought up the usage of the phrase another thing coming today.  This is the only way I’ve ever heard it or seen it used.  But it turns out, the original is another think coming.  The thing version is winning out on the interwebs, but the post on Language Log indicates that the two phrases may have been warring since their (mutual?) inceptions.  It’s no surprise to me that thing would replace think in this case, for simple phonological reasons.  The [k] in think is preceded by a voiced nasal sound (the vocal cords are vibrating) and then followed by a unvoiced velar stop (aka plosive, but essentially another [k] sound).  The phenomenon of assimilation occurs when a phoneme changes to reflect the surrounding phoneme(s).  In this case, the [k] probably originally became voiced, which would make it a [g] sound.  The [k] and [g] sounds are essentially the same, it’s just a difference in whether your vocal cords are vibrating.  So, assimilation generated thing instead of think in regular speech and since that is a well known word, people interpreted it as thing instead of think when they were first exposed to it.  From there it has been gaining steam.

Another interesting example of a similar nature is home in on versus the original hone in on.

I came across a story on NPR today about why women read more than men. They quote from Louann Brizendine who wrote the book The Female Brain. The issue of gender differences and the brain always starts fights. Men have larger brains and more gray matter, which handles information processing. Women have more white matter and thus more interconnectivity between parts of the brain. The prefrontal lobe in women is more densely packed with neurons, and that is the area responsible for judgment, planning and language. Here is a quote from the article:

“Girls have an easier time with reading or written work, and it’s not a stretch to extrapolate [that] to adult life,” Brizendine says. Indeed, adult women talk more in social settings and use more words than men, she says.

Woah nelly! Brizendine hasn’t been doing her reading, because tons of contrary evidence to this crap has been out for a while. And trying to find that link, I discovered that Language Log has already done a pretty extensive commentary on this article. When it comes to matters of language, it’s hard to scoop them. The long and the short of the Language Log commentary is that the so-called gap in fiction sales could be accounted for entirely by sales of romance books.

(more…)