PDA

View Full Version : Site Map Depth - Indexing Tens of Thousands of Pages


wschroter
06-08-2006, 07:29 PM
If Google only spiders X number of links (let's say 100) on a page, how do you create a site map that effectively points to thousands of pages of content to be spidered?

We have tens of thousands of pages. If we go hierarchically ("category > category > category") won't Google stop indexing after a few levels?

I know many of you have long since figured this out, so I look forward to your suggestions.

Mikkel deMib Svendsen
06-08-2006, 07:48 PM
We have tens of thousands of pages. If we go hierarchically ("category > category > category") won't Google stop indexing after a few levels?
No, that will be fine, but expect it to take some time - depending on how many backlinks you have. Also, it helps if you build new backlinks do deeper level entry pages.

wschroter
06-08-2006, 07:53 PM
Does anyone have an example of a well-done site map that covers a really deep site?

GDMInteractive
06-08-2006, 11:43 PM
I've never actually done this, but here is an idea that may be worth exploring. Create multiple site maps as well as two levels of site maps as follows:

1) Your main site maps (let's say about 10, if possible)

2) Each main site map links to related category site map (up to 100 per site map, but can be less)

3) Each category site map links to associated pages.

4) If possible link to all 10 main site maps from at least your home page and preferably every page. If not possible, then spread out links to your main site maps throughout your site and make sure that there is an external link from a page that Google regularly crawls pointing to those pages.


This would work as follows: if each of your 10 main site maps contained 30 categories (a total of 300 pages) and each category linked to 100 pages then your site maps would link to 30,000 pages overall. You can, of course, play with the numbers for the main site map page and the category site map page, say 5 main site maps and 60 categories. This would also result in 30,000 pages being covered by the site maps (and make it easier to include all the main site map links on your home page).

Hope this helps.

Moshe

shor
06-09-2006, 12:34 AM
Here are some examples of different types of sitemaps:

Google likes the 2-clicks-deep approach, with deep content directly linked: http://www.google.com/sitemap.html Very usability-centric.

At the other end of the spectrum you could always try the about.com, a spider-driven sitemap:
http://spiderbites.about.com/sitemap.htm

Then there is the popular mix-and-match sitemap + directory approach:
Answers Directory (http://www.answers.com/main/what_content.jsp) + Answers Sitemap (http://www.answers.com/main/sitemap.jsp). This allows for a clean and user-friendly high level sitemap while the directory provides good hierarchal drill-down.

Also take into account that how often the spider comes to crawl (and deep-crawl) your website should be taken into account when designing your initial sitemap. If you are a brand new low trust website you might found that spiders won't deep crawl you for quite a while (and even if they do you are not assured of having all your pages indexed and ranked).

wschroter
06-09-2006, 12:20 PM
This is great feedback and much appreciated.

However, I still would be interested to know how many links Google spiders into a page. I would imagine there would be some sort of relative cut-off point.

GDMInteractive
06-09-2006, 12:31 PM
Planet Ocean's Unfair Advantage seo book says that 100 links is generally consdiered the maximum number of links that the SE spider will crawl. SEO Book, on the other hand, says that Google abandoned this a while ago and that it depends on the link popularity of the page and that pages with high link popularity Google will scour thousands of links. While SEO book does contradict what Planet ocean said it does not necessrily mean that a sitemap can consist of thousands of links. After all, he only said that Google is known to follow thousands of links on sites with high link popularity. He also says that the Google Spider will not want to follow a lot of links on pages no link popularity. Based on this, unless your sitemap has high link popularity I would recommend limiting the number of links on the page.

Elisabeth
06-09-2006, 01:26 PM
some very good suggestions, including the about.com spiderbytes example.

The more obvious answer to me, is to use the Google Sitemaps program, and follow their tips for uploading 50,000 or more URLs (http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#sitemapFileRequirements).

wschroter
06-09-2006, 01:29 PM
We've had mixed results on Google Sitemaps. For as straightforward as the process should be, we're always surprised by how poor the results seem to be.

Has anyone else had any problems with Google Sitemaps?

GDMInteractive
06-09-2006, 01:32 PM
Google sitemaps only works for Google. Of course, reaching over 40% of the search traffic a day isn't so bad, but there is still the other 60%.

Elisabeth
06-09-2006, 01:42 PM
Google sitemaps only works for Google. Of course, reaching over 40% of the search traffic a day isn't so bad, but there is still the other 60%.

well, since this thread is in the Google forum, and wschroter is asking specifically about google, we'll assume he's most interested in Google;)

wschroter - so yes, you've had mixed results, so then going back to mikkel's comments, it may be more a matter of inbound linking.

wschroter
06-09-2006, 01:48 PM
Yes, it was centered around Google.

My initial concern was that after a couple levels Google would stop going deepr in the crawl and that the balance of our pages would not be indexed.

What I'm gathering from the responses I've heard is that Google will go as deep as our PR indicates it should. Over time as our PR increases, the crawls will go deeper, picking up more pages.

Mikkel deMib Svendsen
06-09-2006, 02:05 PM
If you keep your sitemaps to a maximum of 100 links (total) from each page you are fine. Google may crawl a few more links ff some pages, but to be safe keep it at 100.

I will NOT recommend using the Google XML Sitemaps! I have not yet seen one single example where that was better than an "old fashion" HTML sitemap but I HAVE seen examples where it actually hurt indexing. Also, a normal sitemap will work in all engines :)

JEC
06-09-2006, 02:48 PM
I will NOT recommend using the Google XML Sitemaps! I have not yet seen one single example where that was better than an "old fashion" HTML sitemap but I HAVE seen examples where it actually hurt indexing. Also, a normal sitemap will work in all engines :)
Mikkel can you shed some more light on your experience? I have used an XML sitemap file with the Google Sitemap utility and it helped index many more of my sites pages, very quickly. Not to mention it helped gather some interesting statistics from Google about my site.

I would also add that after having it up for approximately 4 months, I have not seen any negative effects on the other search engines. And of course, I still maintain an HTML site map on the site as well. That just makes sense for the other search engines to be able to crawl the site.

And, on a site I just created a month ago I also added an XML site map and it helped get my home page indexed in Google right off the bat. The rest of the site is still not indexed, by Google anyway, but that is probably due to poor IBLs and the sandbox syndrome. The other search engines have completely indexed that site. So I don't see any negative effects.

I'm just curious about your own experience.

JEC

Mikkel deMib Svendsen
06-09-2006, 03:07 PM
If you did not get fully crawl by Google with your exsisting HTML sitemap then that is to me a sign of either a bad sitemap or too few backlinks. If Google can't find any links to pages and therefore do not naturally index them then whats the point of having them included with the XML sitemap? They won't rank anyway :)

JEC
06-09-2006, 03:24 PM
If you did not get fully crawl by Google with your exsisting HTML sitemap then that is to me a sign of either a bad sitemap or too few backlinks. If Google can't find any links to pages and therefore do not naturally index them then whats the point of having them included with the XML sitemap? They won't rank anyway :)
Well, there were other changes as also. The menuing structure we had before was JavaScript based and as a result, Google never followed any of the main menu links since they were embedded in JavaScript code. I updated the menu structure to a CSS drop down menu after attending SES in San Jose last year and getting that advice, and that helped get even more pages indexed. Also, we had gone through a site update and renamed some pages, tossing more into the mix.

But that's a different discussion. You still didn't answer my question. ;) What has been your negative experience?

JEC

Mikkel deMib Svendsen
06-09-2006, 03:30 PM
The negative experience has been a couple of sites that where apparently getting LESS indexed because of the XML sitemap. Google did download it up to 25 times per hour! and reported no errors but still only included a fraction of the site. We kept it there for a few months with no improvements. After removing it we got back to "normal" indexing - much better than what we had with the XML sitemap.

JEC
06-09-2006, 03:42 PM
The negative experience has been a couple of sites that where apparently getting LESS indexed because of the XML sitemap. Google did download it up to 25 times per hour! and reported no errors but still only included a fraction of the site. We kept it there for a few months with no improvements. After removing it we got back to "normal" indexing - much better than what we had with the XML sitemap.
Okay. That's interesting. Just grasping at straws here, but could that have anything to do with your "changefreq" tags value? As I mentioned, our experience so far has been good. Our site is pretty small, only about 150 pages. But since you bring up an issue that you've experienced, it'll be something for us to keep an eye on. You can never get enough advice. :cool:

Thank you.

JEC

Marketing Man
06-09-2006, 04:11 PM
I have included brief excerpts and some great documentation regarding how much of a page is indexed by the big 3 and how many pages are indexed by the big 3.

1. Number of pages crawled.
"In the previous edition - Binary Search Tree 2 - a large scale experiment on search engine behaviour was staged with more than two billion different web pages. This experiment lasted exactly one year, until April 13th. In this period the three major search engines requested more than one million pages of the tree, from more than hundred thousand different URLs. The home page of drunkmenworkhere.org grew from 1.6 kB to over 4 MB due to the visit log and the comment spam displayed there.

This edition presents the results of the experiment. (http://drunkmenworkhere.org/)

2. How much of a page is crawled.
"The SEO community boasts a multitude of different opinions as to the volume of text indexed by the search engines on a single Web page."

The question is, how large should the optimized page be? (http://www.sitepoint.com/article/indexing-limits-where-bots-stop)