|
#1
|
|||
|
|||
|
Google: The Deep Web Is No Longer Invisible to Us
Google has started digging into the Deep Web, using forms as the gateway to site data not crawlable by Googlebot.
What are your thoughts? Is it a challenge to corporate data privacy? Will your company and clients start using robots.txt to protect files? As as SEO, how will the change affect your job? What are you telling your clients? |
|
#2
|
||||
|
||||
|
Re: Google: The Deep Web Is No Longer Invisible to Us
This is a uncharacteristically poorly thought through move.
Ok - so Google fills in some 'get' forms, and then indexes the resulting content pages. Why is this a bad idea from an indexing perspective? 1. For a start, many dynamically generated pages use cookie data so users can make better benefit of the site, go back to earlier searches etc. And for Cookie rejecters (like Googlebot) - many of these sites stuff the cookie data into the URL. So Google is now going to index the dynamically generated URL (complete with Cookie data) - so it will never actually get the same URL twice... 2. The vast majority of these 'get' form generated 'results' pages have no links to them. So even if they are indexed - they will only ever be returned in the SERP rankings for the very very very long long tail. No links = no rankings on any kind of moderately competitive phrase. 3. Most sites using 'get' forms should/ would have already built a static crawl path/ google map etc to the content they want indexed. So Google being able to index 'random' content generated by the application would generally be counterproductive (as potentially a static URL & crawl path to the 'selected' content already exists) - and that just means duplicate content.... A whole lot of grief & work for what will provide a pretty poor search result for Google users; and grief & worry and duplicate content issues for siteowners.... |
|
#3
|
|||
|
|||
|
Cool post,I like your information
|
|
#4
|
|||
|
|||
|
Re: Google: The Deep Web Is No Longer Invisible to Us
It is only an experiment remember.
"Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won't crawl any of the URLs that a form would generate. Similarly, we only retrieve GET forms and avoid forms that require any kind of user information. For example, we omit any forms that have a password input or that use terms commonly associated with personal information such as logins, userids, contacts, etc. We are also mindful of the impact we can have on web sites and limit ourselves to a very small number of fetches for a given site." (google webmaster central blog) Some sites ask for information in order to access a load of information for example, which they won't be adverse to being indexed, as the user will still come face to face with the form anyway. The deep web has a lot of interesting stuff to offer. It's worth giving it a go. |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| We're all lab rats and Google is Beta for something bigger | FDJA | Google Web Search | 7 | 08-24-2007 08:38 AM |
| Google and the stock market | NewKidOnTheBlock | Search Industry Growth & Trends | 1 | 01-01-2007 06:02 PM |
| The influence and domination of Google: Yahoo buys text links | PixelStreamed | Google Web Search | 5 | 12-28-2006 03:41 AM |
| Google news no longer beta | fulton savage | Other Google Issues | 1 | 01-25-2006 05:59 PM |
| Inside The Searcher's Mind - Live from SES San Jose | rustybrick | SEM Related Organizations & Events | 0 | 08-02-2004 03:37 PM |