Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Searching Tips & Techniques > Invisible - Deep Web
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Thread Tools
Old 04-12-2008   #1
Kevin Heisler
Join Date: Sep 2007
Posts: 78
Kevin Heisler will become famous soon enough
Google: The Deep Web Is No Longer Invisible to Us

Google has started digging into the Deep Web, using forms as the gateway to site data not crawlable by Googlebot.

What are your thoughts?

Is it a challenge to corporate data privacy? Will your company and clients start using robots.txt to protect files?

As as SEO, how will the change affect your job? What are you telling your clients?
Kevin Heisler is offline   Reply With Quote
Old 04-13-2008   #2
Chris_D's Avatar
Join Date: Jun 2004
Location: Sydney Australia
Posts: 1,099
Chris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud of
Re: Google: The Deep Web Is No Longer Invisible to Us

This is a uncharacteristically poorly thought through move.

Ok - so Google fills in some 'get' forms, and then indexes the resulting content pages.

Why is this a bad idea from an indexing perspective?

1. For a start, many dynamically generated pages use cookie data so users can make better benefit of the site, go back to earlier searches etc. And for Cookie rejecters (like Googlebot) - many of these sites stuff the cookie data into the URL. So Google is now going to index the dynamically generated URL (complete with Cookie data) - so it will never actually get the same URL twice...

2. The vast majority of these 'get' form generated 'results' pages have no links to them. So even if they are indexed - they will only ever be returned in the SERP rankings for the very very very long long tail.

No links = no rankings on any kind of moderately competitive phrase.

3. Most sites using 'get' forms should/ would have already built a static crawl path/ google map etc to the content they want indexed.

So Google being able to index 'random' content generated by the application would generally be counterproductive (as potentially a static URL & crawl path to the 'selected' content already exists) - and that just means duplicate content....

A whole lot of grief & work for what will provide a pretty poor search result for Google users; and grief & worry and duplicate content issues for siteowners....
Chris_D is offline   Reply With Quote
Old 08-01-2008   #3
Join Date: Jul 2008
Posts: 1
ipodfansmail is on a distinguished road
Talking reply2

Cool post,I like your information
ipodfansmail is offline   Reply With Quote
Old 08-11-2008   #4
Join Date: Jul 2008
Location: UK
Posts: 31
Misscj will become famous soon enough
Re: Google: The Deep Web Is No Longer Invisible to Us

It is only an experiment remember.

"Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won't crawl any of the URLs that a form would generate. Similarly, we only retrieve GET forms and avoid forms that require any kind of user information. For example, we omit any forms that have a password input or that use terms commonly associated with personal information such as logins, userids, contacts, etc. We are also mindful of the impact we can have on web sites and limit ourselves to a very small number of fetches for a given site." (google webmaster central blog)

Some sites ask for information in order to access a load of information for example, which they won't be adverse to being indexed, as the user will still come face to face with the form anyway. The deep web has a lot of interesting stuff to offer.

It's worth giving it a go.
Misscj is offline   Reply With Quote

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
We're all lab rats and Google is Beta for something bigger FDJA Google Web Search 7 08-24-2007 09:38 AM
Google and the stock market NewKidOnTheBlock Search Industry Growth & Trends 1 01-01-2007 07:02 PM
The influence and domination of Google: Yahoo buys text links PixelStreamed Google Web Search 5 12-28-2006 04:41 AM
Google news no longer beta fulton savage Other Google Issues 1 01-25-2006 06:59 PM
Inside The Searcher's Mind - Live from SES San Jose rustybrick SEM Related Organizations & Events 0 08-02-2004 04:37 PM