Posts Tagged ‘information needs’

While watching the 2000 version of Henry James’ The Golden Bowl, I heard the once-common phrase “The deuce only knows…”  I’m always looking for vintage profanity, and this appealed to me strongly.  I’ve heard it hundreds or thousands of times before, of course, but here it was brought to the fore of my attention.  After some brief research, I found ties to 16th Century Northern German, Family Guy, and playing dice.  The word deuce seems most strongly tied in meaning to “the devil,” and is used interchangeably in old-fashioned profanity (cf. What the devil and What the deuce).

There are attested uses of the phrase “Was der Daus!” in German from the 16th Century, which has my money for being the real origin of the phrase.  Daus meant “devil” though the modern German is “Teufel.”  Deuce also means “two” and comes from the French deux.  Supposedly, the combination of the German phrase and the playing of dice led to the phrase entering English usage.  Rolling two (the Devil’s eyes) inspired the curse, since that was the lowest score and therefore, a loss.  I’m not sold on this particular coincidence.  It seems too much like folk etymology of the sort you hear in email forwards.  Lastly, while I enjoy Family Guy enormously when I hear it, I very seldomly get the opportunity to watch an episode, so the tie to Stewie was lost on me until Google unearthed it.

And when OpenEphyra is given the question What is the origin of the word deuce? the answer is “Watkins.”  It offers as evidence this page.  That page poses the question What does the word deuce mean? but the answer has nothing to do with my information need.  Also, the word Watkins never even appears on that page, so no idea where it came from.

When you go to a search engine, you have an information need. There is something you are searching for that you can only articulate imprecisely and you do so in a few words. People are good at determining if something satisfies their information need, but not so great at stating it clearly. Librarians are trained to elicit this information need from you, by force if necessary. (Just kidding, librarian mafia, don’t hurt me!) Their method is a dialogue where they probe the various aspects of what you are searching for, what you are not searching for, what you already know about it, etc.

A search engine can’t engage in this dialogue, yet, but think about how you interact with a search engine. You start off with this information need (at whatever degree of vagueness) in mind and probably compose a short 2-3 word query. How often do you do one word queries? We’ve been trained by search engines that this rarely succeeds unless it’s a low-frequency word (or a brand name or jargon). Our first query brings up some useful stuff perhaps, but usually we see that we weren’t thinking clearly about our information need and we begin honing it over the next couple queries until we find what we need. Some people are better at forming this mental picture and stating clear queries from the beginning [citation needed], but most people need to narrow it down.

These queries we use for Google are often purely keyword queries, though sometimes we use slightly more sophisticated queries with link: or site: (etc) operators. You can make sure terms are included with the + operator and excluded with the – operator. You can even use wildcard operators (*) which can be nice (but also touchy). What you can’t do are structured queries. You can’t search for things like (nice or sweet) and (man or guy). You can’t search for words that co-occur in certain spans of documents (like 50-word windows). These things can be very helpful to an experienced researcher and having this ability over a web corpus the size of Google’s would be enormously helpful. Unfortunately, the computational and storage costs of such a thing are much higher.

So my question for you, reader, is would you even use this?  Would this be used by very many people or just the odd few researchers, paralegals, etc?  Computationally, I think Google could handle this.  The problem would come from the larger index to handle supporting such queries.  Even this would probably not be unreasonable for Google at this point.  So… why not?  My guess is the cost of doing such a thing (moderate to high) versus the customer demand (low to nil).

Am I wrong?