Update on Google and the deep web

February 5, 2009 by Ravi Shanker
Filed under: under the globe 

I recently took a much needed break. I spent a couple of days with a very dear friend in Colorado. On the drive back to Santa Fe, I called Abe to check in. In our discussion Abe told me that there has been a fair amount of buzz in the blogosophere about Google “surfacing” deep Web content. Last April I first wrote about Google’s efforts to crawl the deep Web. A couple of months later I followed up with Why is Google interested in the deep web. Today there’s more to write about.

Yahoo! Tech News published an article on January 30: Google Researcher Targets Web’s Structured Data (PC World). The article’s first paragraph is ominous, unless you believe that Google is regurgitating old news:

There is new news. Check out Google’s Deep-Web Crawl. Google is indeed stepping up its efforts to mine the deep Web. Google uses the term “surfacing.” What Google is doing more of is submitting queries to HTML forms and adding the results it finds to its index. From Google’s perspective this makes sense. Their model is to build a comprehensive index. Google isn’t interested in building federated search applications. But, they’d love to index all the good content behind search forms and blend those documents in with documents and web pages it finds by crawling. Here is a paragraph from the Google article’s abstract:

More info: searchblog

Leave a Comment

© 2010 Everything Under The Globe. All Rights Reserved.
Powered by WordPress | Entries (RSS) and Comments (RSS).