PDA

View Full Version : Supplemental index


I, Brian
01-26-2005, 04:13 PM
Just what is the supplemental index anyway?

You know:
In order to show you the most relevant results, we have omitted some entries very similar to the 2 already displayed.
If you like, you can repeat the search with the omitted results included (http://forums.searchenginewatch.com). (http://forums.searchenginewatch.com)
(http://forums.searchenginewatch.com)

I had *presumed* before that the supplemental index was effectively composed of:

1. URLs found but not crawled
2. Effective duplication of query


However, I'm not properly certain of what the supplemental index is, is for, and how pages (even sites) actually end up on it.

Comments welcome. :)

telNform
01-26-2005, 05:32 PM
From what I understand this is content that Google considers to duplicate (same sources, similar content, etc ...)

Chris_D
01-26-2005, 05:39 PM
Hi Brian,

I think there are 2 separate issues here.

It is my understanding that Google generally limits the number of results from e.g. any domain. Eg in the normal serps, for a normal keyword search, you get max 2 from one domain (one indented, directly underneath).

If you watch the address line in your browser, when you hit the "repeat the search with the omitted results included" link, it appends "&filter=0".

This is effectively the Google filter which allows more results from a domain to be displayed. So 'more results' using "&filter=0" could bring up 'supplemental results' pages, or 'normal pages'.

eg try it by searching e.g. your own site for a single word eg site:www.domain.com blah (where blah is a common word on every page of the site) and then hit "repeat the search with the omitted results included" - it may bring up both 'normal' and 'supplemental results' pages.

In my testing, I've seen pages listed as 'supplemental results' where they are pages that were previously indexed - but which are no longer 'live' pages. eg if you delete an already indexed page - for a little while it will appear as a supplemental result, as Google continues to try to request it (look in the logs). It appears as a 'supplemental result' until Google finally decides that its gone for good - when it is finally removed.

That's my understanding - but I'm sure the 'supplemental results' could be used by Google for other reasons. I've even seen it where a DDOS attack on a webserver has been so prolonged that googlebot couldn't access and refresh pages (ie any request got a 404). Many pages went into 'supplemental results' for a period of a week or so - and they all came back as 'normal' results after the DDOS attack finished. But the principal is the same - its a page Google knows about - but can't access.

Best

Chris_D

JohnW
01-26-2005, 07:30 PM
Sometimes it's easier to just manually add the &filter=0 to the url, than to find your way to the end of the results and repeat the search.

One thing for sure about the supplemental index, it's not where you want to be. As far as I can tell it's just one small step from not being in the index at all. I have always suspected that it is where pages go when they are considered totally unimportant, as in the case of penalization for duplicate content as was suggested. I have also seen hijacked pages stop there for a while on their way out of the index - which depending on perspective, may the same thing as duplicate content. I would like to know for sure what it means but we may never find out the real answer- even if they do tell us. Has anyone had any luck in recovering a page from this status?

Chris_D
01-26-2005, 07:57 PM
John asked:

Has anyone had any luck in recovering a page from this status?

This was my experience:

I've even seen it where a DDOS attack on a webserver has been so prolonged that googlebot couldn't access and refresh pages (ie any request got a 404). Many pages went into 'supplemental results' for a period of a week or so - and they all came back as 'normal' results after the DDOS attack finished.

I think, as you said "it's just one small step from not being in the index at all."

In a way - its 'Google purgatory'! You are in a place somewhere in between the main Google index and Google oblivion......

I think this relates to the 'supplemental result' topic:

Your site may not have been reachable when we tried to crawl it because of network or hosting problems. When this happens, we retry multiple times, but if the site cannot be crawled, it will not be listed in our current index. If it was a transient problem, your site will likely show up in the next index, which will be completed in a few weeks.
http://www.google.com/intl/en/webmasters/2.html#B3

So there are 'innocent' ways your pages can get there e.g. if there are site related issues preventing Googlebot from spidering known indexed urls (which are recoverable); your pages can get there by deleting pages from your site (which won't be recoverable - as you deleted the pages); and there are also possibly other reasons relating to Google removing the pages.....

Mel
01-26-2005, 10:50 PM
Google says that:

Google augments results for difficult queries by searching a supplemental collection of more web pages. Results from this index are marked in green as "Supplemental".

I do not think this is necessarily associated with the repeat the search with the omitted results included message but is a seperate index which is used only when Google cannot find enough results in is main index to fully satisfy the query. I do not think it has anything to do server outages, but with pages that Google for one reason or another sees as being less relevant than pages in the main index.

What the critera for inclusion in the supplemental index are is not clear to me, but I do agree that its not the index of preference for webmasters.

MHS
01-27-2005, 01:55 PM
Would this put your CACHED Results in 1969 LAND ALSO?

siteseo
02-01-2005, 03:40 PM
Here's an email I received from Google's User Support group when I inquired (via our AdWords rep) whether dupe content relegated a page to the supplemental results index (I'm 99% convinced it does, by the way):

"...supplemental sites are part of Google's auxiliary index. We're able to place fewer restraints on sites that we crawl for this auxiliary or supplemental index than sites that are crawled for our main index. For example, the number of parameters in a URL might exclude a site from being crawled for inclusion in our main index; however, it could still be crawled and added to our supplemental index.
The index in which a site is included is completely automated; there's no way you can select or change the index in which your site appears. Please be assured that the index in which a site is included does not affect its PageRank."

So being in the supp index doesn't affect your PR (whoopie) but it definitely impacts your rankings in the serps. Supp pages never rank on competitive terms.

Ben Anderson
02-02-2005, 05:52 AM
I had some of my old-site pages in the supp index. When I did a major refresh to this site, most of the pages came back to the main index. So, you can try also that...