You are currently browsing the category archive for the 'google' category.

I think there must be literally thousands of guys with the name Jason Adams in the English speaking world.  There are many who share my middle initial (Jason M. Adams) and I’ve encountered several with the same middle name:  Michael.  Really, my parents couldn’t have given me a more common name, and Jason was a very popular boy name at the time.

I think I need to change my name to stand out.  I want a googlewhackblatt.  This needs to be a unique identifier, after all.  Any suggestions?

Most people have at least a passing familiarity with information trapping, if not the term itself. That is, most people who are early adopters of new technology, technogeeks, etc. In a nutshell, it is the practice of collecting information from the web as it happens. Subscribing to rss feeds, setting up Google alerts, and using FreeAlert to find free stuff on craigslist are all examples of information trapping. If entering a query in a search engine is fishing for information, using one of these (and many other services) is setting a trap for information.

I think this is an area that is going to be taking off in the next few years for people in various industries that are expected to keep up with the latest trends.

So what sort of damage would you see if that low-yield nuclear device in your basement “accidentally” went off? What if some government launched a very high yield device on another country? Wonder no more!

The following pics are what would happen if “Little Boy” (the bomb dropped on Hiroshima) went off in my basement. Followed by a 50-megaton, also centered on my place. Lastly, what if the asteroid that theoretically killed the dinosaurs landed in my backyard? Good stuff! (The inner circle is the fireball, the purple is 3rd degree burns, the second and first degree.)

Little Boy - 15 kiloton nuclear bomb - going off in Pittsburgh
50 megaton bomb going off in Pittsburgh
Dinosaur killer asteroid hitting Pittsburgh

It looks like the sunburn (first degree burns) ring is inside the second-degree radius in the asteroid impact one. Still buggy?

When you go to a search engine, you have an information need. There is something you are searching for that you can only articulate imprecisely and you do so in a few words. People are good at determining if something satisfies their information need, but not so great at stating it clearly. Librarians are trained to elicit this information need from you, by force if necessary. (Just kidding, librarian mafia, don’t hurt me!) Their method is a dialogue where they probe the various aspects of what you are searching for, what you are not searching for, what you already know about it, etc.

A search engine can’t engage in this dialogue, yet, but think about how you interact with a search engine. You start off with this information need (at whatever degree of vagueness) in mind and probably compose a short 2-3 word query. How often do you do one word queries? We’ve been trained by search engines that this rarely succeeds unless it’s a low-frequency word (or a brand name or jargon). Our first query brings up some useful stuff perhaps, but usually we see that we weren’t thinking clearly about our information need and we begin honing it over the next couple queries until we find what we need. Some people are better at forming this mental picture and stating clear queries from the beginning [citation needed], but most people need to narrow it down.

These queries we use for Google are often purely keyword queries, though sometimes we use slightly more sophisticated queries with link: or site: (etc) operators. You can make sure terms are included with the + operator and excluded with the - operator. You can even use wildcard operators (*) which can be nice (but also touchy). What you can’t do are structured queries. You can’t search for things like (nice or sweet) and (man or guy). You can’t search for words that co-occur in certain spans of documents (like 50-word windows). These things can be very helpful to an experienced researcher and having this ability over a web corpus the size of Google’s would be enormously helpful. Unfortunately, the computational and storage costs of such a thing are much higher.

So my question for you, reader, is would you even use this?  Would this be used by very many people or just the odd few researchers, paralegals, etc?  Computationally, I think Google could handle this.  The problem would come from the larger index to handle supporting such queries.  Even this would probably not be unreasonable for Google at this point.  So… why not?  My guess is the cost of doing such a thing (moderate to high) versus the customer demand (low to nil).

Am I wrong?

CNN is reporting that Microsoft is making eyes at Yahoo! to the tune of $31 per share, or about $44.6 billion. If such a deal ever materialized, it would definitely make things interesting for Google. Personally I consider both Microsoft Live search and Yahoo to be inferior products to the Google, but two wrongs make a right, wrong? [hat tip] There has been talk the SEC might try to block such a move due to monopoly worries. I’m not convinced there is anything to worry about, but what do I know.

What I am interested in knowing, though, is how this will affect both Microsoft and Yahoo’s research arms. Will they become bigger and better than ever or will there be some cuts? I certainly hope the former is true.

Update

Check out the comments on the Google Blogoscoped article regarding the monopoly worries. I just read them after posting and they pretty much shoot down the idea of an SEC action on those grounds.

There is nothing unusual about verbing nouns in English.  Despite the fact that your English teacher may have told you not to do this, it is common practice, especially on the intarwebs.  Verbing brand names to mean the primary action performed by the chief product of that brand is less common, but we all know about “googling.”  Just sitting here, trying to drink my morning coffee, I couldn’t come up with another example.

But what got me thinking about this is another example used in today’s User Friendly.  One character says,

“You’re gonna ebay it to goths, aren’t you.” [emphasis mine]

I had never heard the brand name ebay used in verb form, meaning to sell something on ebay (the primary function of their chief product).   It is not uncommon, though.  Searching the Google for +”to ebay it”, I found that at least 10% of the top few pages of results were just this construction (versus “to ebay.  It …”).  I estimate from that there are about 19,000 uses of ebay as a verb in this context, and no doubt many others in variations (e.g. “I ebayed my watch”).

Another example that just occurred to me, but which is pretty artificial, is to twitter, meaning to post something on Twitter.  I say this is artificial because Twitter openly encourages and suggests this terminology.  It was not an emergent construct, but an imposed one.  It has been adopted by the overwhelming majority of users, though.  [follow me on twitter]

So here is my question:  does this only work for Internet companies?  I’m probably forgetting some obvious brick-and-mortar company for which we have verbed their brand, so please tell me if I have.  Or is it that Internet companies are especially conducive to this construction because so many Internet companies start off with only one service and become known by that service.  Google is search, ebay is selling crap through auctions, twitter is … twittering.   If this only works for Internet companies, why did we start doing it in the first place?

And I just came up with a brick-and-mortar example:  hoover.  You can hoover down a plate of food, meaning to suck something up like a champ.  But my classification still holds, that is the primary function of their chief product (or at least the main product that people know them by).  Marketing people have already taken this to heart, I’m sure.  You need an easy name that sounds like English.  Just like with scientific terminology, no one wants to Dinklefwat their dishes.

Google Reader recently added some social networking features. You can now add your friends’ shared items to your feeds. Up til now I haven’t used the shared items feature since it didn’t really make sense to send people a link to my shared items and expect them to give a crap. Now it’s easier to subscribe to it and the decision to give a crap is left up to them when they read their feeds.

As Robert Scoble pointed out, there is one major flaw with the new feature: it clutters up the rest of your feeds. This is, of course, assuming you read your feeds in the “All feeds” folder. I usually don’t since I have a number of Tech News feeds that I’m not always interested in and often has duplicated information. I only read it when I have time. You can do the same with your friends shared items, so no big deal to me.

So why not check out my shared items?  As it happens, I have none at the moment.  But if you use Google Reader, feel free to add/invite me so we can view each others.  I’ll accept any invitations.

Since I work with recommender systems, I’d hardly be doing my job if I didn’t notice things like Google Reader’s new feed recommendations. From the description of how the recommender works on the Google help page (which is unfortunately not very specific):

Your recommendations list is automatically generated. It takes into account the feeds you’re already subscribed to, as well as information from your Web History, including your location. Aggregated across many users, this information can indicate which feeds are popular among people with similar interests. For instance, if a lot of people subscribe to feeds about both peanut butter and jelly, and you only subscribe to feeds about peanut butter, Reader will recommend that you try some jelly.

This sounds like they are using a hybrid recommender system. When you are recommending items (in this case feeds) to users, you can either consider the qualities of the items themselves (content-based) or the behavior of people similar to you (collaborative filtering). The Netflix Prize is a collaborative filtering case for the most part, though it is possible to add in some amount of content.

Read the rest of this entry »

Google announced that it has abandoned Systran as its translation system for the 22 languages it services besides Arabic, Chinese and Russian.  Systran is one of the oldest machine translation companies around.  When Microsoft launched its service recently, it announced that it would be supplementing its translations with Systran.  Systran uses rule-based systems that have been massively tweaked to produce results that most would agree are still pretty crappy.  They get some basic stuff right, but once you start venturing off into uncommon word usages and complex constructions, all bets are off.  Some translation sites use Systran and others like freetranslation.com use their own system.  Babel Fish is perhaps the most well-known site still using Systran.

So Google is switching over to its own statistical machine translation system for all 25 language pairs.  Statistical machine translation systems typically look at two different kinds of text:  aligned text in two languages (bitext) and monolingual text.  The monolingual text is used to build a statistical model of the language so that output will conform to the target language rather than the original.  For example, in German, the auxiliary verb comes in second position as in English, but the main verb often comes in final position.  Reordering properly isn’t easy and this model helps make the output more natural.  Bitexts are texts that have been translated from language to another and then aligned word-by-word.  The actual alignment may be done by hand at the sentence level but the vast amount of human effort involved means that at the word level it is usually done automatically.  Getting good alignments is an ongoing area of research that is quite far from perfect.

The thing that Google has going for it is that with statistical machine translation, the more data the better.  And Google is overflowing with it.  It’ll be interesting to see how their systems progress.

Currently the features I want Google to add:

  1. add Google Scholar to search history
  2. add links for citations included in the paper for each result
  3. allow the two functions: cites and citedby for searching

Note: This is a very incomplete list, just what’s pressing at the moment.

Regarding (2), currently when you are presented with the papers in the search result, there is a link that looks like

 

Google Scholar example of number of citations

I’d like a link added that shows you results for everything this paper cites, so I don’t have to open the paper and manually search for everything in there. The link would basically say “Cites 13 articles” (or something similar). That’s not so hard, is it?

And for (3), I want to be able to search the papers that cite a particular paper or author and the papers that are cited by a particular paper or author. There are definitely more issues that need to be worked out for this, since it would be a many-to-many explosion in the case of ill-formed queries. Maybe just an option to narrow it down so we’re searching based on a paper that was already turned up in a search.

Update

Added (3) after the original post.

Rumors are running around about a possible upgrade to gmail. Garett Rogers at ZDNet started the rumor mill a few days ago when it was noticed that the gmail translation page wanted translations for the phrase “Newer Version.” Not exactly conclusive, but it’s long overdue so these rumors could actually pan out. Google Operating System (an unofficial google-watching blog) continued the rumors today, speculating about some of the possible new features.

Particularly scary to me is the potential of moving more towards an “outlook-style” interface. That is exactly the wrong thing for gmail to do. My favorite part of gmail is how it isn’t Outlook-like. Folders are so 2003. Labels let you classify email into multiple logical “folders” without having to duplicate the message and work pretty much the same. They also fit better into the current paradigm of classification using tags. Tags are something that gmail needs. Suggested tags for emails and a quick tag adding cloud would be very nice. A lot of times, searching text just isn’t enough. If I remember what the email was about, but can’t remember any verbatim phrases from it, I’m in for a difficult search. If I had tagged it, though…

Another thing I’d really like to see gmail get are more sophisticated filters. Maybe I’m missing some how-to somewhere or something, but when I want to add multiple contacts into a single filter, I can’t get it to work. I’d like filters that I can just add a group of contacts (or ones I hand-select) to.

When Yahoo! released their new mail client, I gave it a try. Certainly it was well done and quite sophisticated for a web-based client. It was also slower than a wounded three-toed sloth. It’s Outlook-clone-like interface also turned me off immediately. Plus I get so much freakin spam in my inbox with Yahoo! mail anyway, I just can’t use it. I know no one’s spam filter is perfect (I read my gmail spam for false-positives, which I do find). But anyhow, Google, I beg you, do not go the way of the Outlook-dodo interface.

Buzz has been building over the past few days about what will be the next X Prize. If you don’t know what the first X Prize was all about, skip down a bit. The new Google Lunar X Prize was announced today. The prize purse is $20 million for the grand prize winner, $5 million to a second place winner and $5 million split amongst several bonus prizes. The goal is a soft-landing on the moon with a robotic craft which then must signal back to Earth. The rover must roam around for at least 500m before sending the “Mooncast”.

Read the rest of this entry »

The Computer and Communications Industry Assocation is a nonprofit organization with members including Google, Microsoft, RedHat, Sun, and the Linux Foundation. To boil it down: they’re a lobbying group for the computing industry. I’m not saying they are therefore bad: it’s the unfortunate state of Washington that everything and everyone has to have a lobbyist in order to get anything done. For the moment I consider this group to be one of the “ok” guys (I’m not sure I’ll call them the “good” guys yet).

So yesterday, they released a study that reports that fair use exceptions in US copyright law account for $4.5 trillion in revenue each year: 18% of US economic growth. I’m not sure what economic growth is referring to here exactly. It’s not GDP because GDP is $13.13 trillion per year, which would make that percentage about 34%. This $4.5 trillion compares to the $1.3 trillion estimated to be the value that copyright industries contribute [source]. The fair use exception value is growing at a fast rate too, 31% since 2002.

So if fair use is that much better for business, why not expand it? Would it only eat into that $1.3 trillion or would it expand the economy even further?

There are a disturbing number of Jason Michael Adamses in the world. Two years ago, I tried googling myself and gave up after 20 pages. Using searches that included schools I went to yielded two math competitions from high school in 1992 and 1994 where I ranked in one and my team ranked in the other. Now I am proud to announce that I have made it to page 2 for the search “Jason Adams”. Woot! And “Jason M. Adams” puts me on page 1. I don’t know why I care about this since I’m the only one liable to search for myself. I’m very happily married so I’m not out dating, which is probably the main purpose of googling. I guess people might also google me when I make stupid comments on other blogs. Maybe I care because it’s like being on TV. You might not want to admit that you want to see yourself inanely answering a question on the 6 o’clock news about Mother’s Day cards just after you were ambushed at a local Wal Mart, but you rush home and program your DVR.

On a side note, my friend Melinda clued me in on the term Googlewhackblatt. A Googlewhackblatt is a single word that has only one search result on Google. There is perhaps some debate whether additional search results that were omitted (because they come from the same site and the same link on that site) might nullify Goolgewhackblatt status. My contention is no. If the primary search returns 1 of 1 results, it’s a GWB (I’m getting tired of typing it out). So anyhow, Mendicant Bug is not a Googlewhack (when it’s two words instead of one). Obviously it’s not anymore since I’ve been blogging under that title, but even when I started there were search results like “… mendicant. [BUG] 2006-04-31″ etc. I couldn’t find any uses of the term that were in the same sentence, though, so decided to go with it.

Another thing that has gotten me excited is finding search terms that put me very close to the top on Google. I’ve compiled a short list.

So anyhow, if you’re bored out of your mind and find more, let me know.

 Update

TechCrunch just posted a story about a hidden feature of Google Earth: a flight simulator. To get there on a PC, just hit Ctrl-Alt-A and on a Mac, it’s Command-Option-A (capital A). You can choose between flying an F16 or an SR22 prop plane. The F16 is fast but the controls are a little pickier, while the SR22 is slow and more stable.

Some key commands to get you going (if you’re impatient, like me):

  • left/right-arrow — ailerons left and right (this makes you roll if you’re unfamiliar with aircraft terms)
  • up/down arrow — elevator up/down (this makes you go up or down)
  • shift + left/right arrow — rudder left or right (this turns you right/left)
  • page up/page down — increase or reduce thrust

Game play is a bit sucky, but as TechCrunch points out, the nifty feature here is the fact that you are flying over real images. The downside, of course, is that those images aren’t 3-D. In the screenshot below, I am flying up Broadway in Manhattan approaching Central Park.

Google Flight Simulator - Manhattan approaching Central Park

About Me

Jason M. Adams

My name is Jason M. Adams and I recently graduated with my masters from the Language Technologies Institute at Carnegie Mellon University. My main areas of research were with recommender systems and word sense disambiguation. Now I am on the job market. And I am obsessed with my two dogs.

Calendar

July 2008
S M T W T F S
« Jun    
 12345
6789101112
13141516171819
20212223242526
2728293031  

Archives

Site Statistics

  • 68,437 reads

Site Information

Contact me: jaso...@gmail.com

Creative Commons License

This work by Jason M. Adams is licensed under a Creative Commons Attribution 3.0 License.

Header image credit seakwenby.

Random Crap