PDA

View Full Version : Specific domain search results, a question.


Paul858
11-08-2007, 12:22 AM
Hi,

I have a few questions about using the site:xxx.com option in Google. If you just enter say site:ibm.com, with no specific search term in front of it, are the results you get back simply all pages indexed for that site? So you could reasonably assume that that the number that appears in the top right is the number of pages in the domain and its sub domains?

And then looking at the search site:ibm.com, if you enter just that into Google you get back 618,000 results. However, if you enter that same search and go to the advanced options and just look at english pages, you get back about 2.5 million results? Which doesn't make sense.

Any ideas?

Thanks.

beu
11-12-2007, 03:05 AM
If you just enter say site:ibm.com, with no specific search term in front of it, are the results you get back simply all pages indexed for that site?
Not always, in most cases there are more pages indexed. In some cases (larger sites) there are many more pages indexed.


So you could reasonably assume that that the number that appears in the top right is the number of pages in the domain and its sub domains?
No, this number tends to be low especially for larger sites. There are any number of issues and/or reasons that may cause this number to be low.

Paul858
11-12-2007, 08:35 AM
Thanks for the reply.

Ok, so if the number at the top of the search results is not indicative of the number of pages indexed or the number of pages in the domain, what is the point of showing it? And what does it mean?

And secondly, is there any way to find out, even approximately, the number of html pages in a given domain?

P

JohnW
11-12-2007, 08:57 AM
No not really but you can get close to a complete list if you repeatedly do the site: search with a wide variety of modifiers, saving the urls to a DB, then removing the duplicates. For example
keyword1 site:xyz.com
keyword2 site:xyz.com
site:xyz.com -keyword
Etc.

beu
11-12-2007, 11:25 AM
Thanks for the reply.

Ok, so if the number at the top of the search results is not indicative of the number of pages indexed or the number of pages in the domain, what is the point of showing it? And what does it mean?

Like JohnW said, the site operator helps find query results within a specific domain.

Here is how Google "defines" the site-operator:

If you include [site:] in your query, Google will restrict the results to those websites in the given domain. For instance, [help site:www.google.com] will find pages about help within www.google.com. [help site:com] will find pages about help within .com urls...

http://www.google.com/help/operators.html

IE
"site:ibm.com" shows 594,000 results
"ibm site:ibm.com" shows 3,800,000

Seems like to me, the results above would be impossible if the site-operator alone reflected all pages indexed.

And secondly, is there any way to find out, even approximately, the number of html pages in a given domain?
You could login via ftp and add up pages in folders, crawl the site and add up links and filter duplicates and or other. There are ways, they just may not be easy!:)

Dan01
11-12-2007, 01:25 PM
Google needs a sort by date option checkbox.

Paul858
11-12-2007, 03:37 PM
Thanks for the replies.

So basically there is no 'easy' way to find the actual number of pages indexed by Google in a given domain? Or no 'easy' way to find the approximate number of pages indexed or not.

But it still bugs me that, as you point out, searches for "site:ibm.com" and "ibm site:ibm.com" yield vastly different numbers. What is the search "site:ibm.com" actually searching for?

P

beu
11-12-2007, 05:28 PM
Thanks for the replies.

So basically there is no 'easy' way to find the actual number of pages indexed by Google in a given domain? Or no 'easy' way to find the approximate number of pages indexed or not.
Not a way that I'm willing to "bet the farm on" so to speak!:)

What is the search "site:ibm.com" actually searching for?
You get all kinds of answers for this question! I tend to believe it represents an approximate number of supplemental & non-supplemental pages indexed. In other words "results estimates" and not actual "web results" that are a little off the path of most users and maybe not as fresh. Anyone else have any ideas?

beu
11-12-2007, 09:45 PM
The IBM example may have a "bug" but here is another definition from Google:

Indexed pages in your site - uses the site: operator to return a sample list of your indexed pages.

http://www.google.com/support/webmasters/bin/answer.py?answer=35256&query=site+operator&topic=&type=

Also, check these out for more information:
http://www.mattcutts.com/blog/smaller-issues/
http://video.google.com/videoplay?docid=-3494613828170903728