Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engine Marketing Strategies > Search Engine Optimization > Dynamic Website and Technical Issues
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 08-25-2004   #1
Mikkel deMib Svendsen
 
Mikkel deMib Svendsen's Avatar
 
Join Date: Jun 2004
Location: Copenhagen, Denmark
Posts: 1,576
Mikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud of
Getting millions of dynamic pages indexed

I have worked on a couple of sites that have hundred of thousands or millions of good content pages and would like to share some experience in getting that many pages indexed, ranked well and not the least benchmarked.

In most cases I have been furtunate in the fact that these large sites have already had good or decent linkpopularity/PageRank. Prominent sites but with a site architecture so messed up that they hardly get any pages indexed and ranked.

Removing indexing barriers on such sites is often not a trival task. Not the least the "human element" can be a huge limitation in the radical changes it often takes. However, it is indeed possible. But be prepared: It takes time! And, I found that the kind of companies that operate million-page sites often require extensive documentation before they change as much as a comma in the web-coding.

Once the indexing barriers is removed I found it very important to implement a solid robots.txt file to block all the agent that you don't want in, and that respect this file. And that's actually quite a few. Hundreds. Imagine you have just "opened up" your website to spiders (removed the indexing barriers) and all these hundreds of agents starts to spider your millions of pages. Not good. Not good at all. This can in fact take down your servers or force you to invest in heavier servers and bandwidth upgrades. So get protected.

I have often used a sligthly modified version of Brett Tabkes robots.txt file that he publish under the GNU license. This one is very strict - I usually take a few of the bots off the list, but thats up to you.

www.webmasterworld.com/robots.txt


After the indexing barriers are removed the pages should get indexed, right? Well, they don't allways. At least not all of the them - the millions you have. At least, I found that it takes a lot of work on your linking structure and how you update the content (freshness). And this is a tricky part ...

It is not so difficult to create site maps on a site with a few hundred or thousand pages in a herarchy that makes sense and with a resonable number of levels. With million of pages that gets tricky. Often I find that some site-maps, and links to pages end up so deep that spiders don't prioritise them high enough in crawling.

There are a couple of things I found that helps: Theming areas and getting external links to the entry points of those areas. This way you can create a number of internal hubs that you link to from the main site map as well as create external links to. From those hubs you can start the nested site map for each section. This will give you fewer pages to deal with in each site-map hierachy.

Before this post gets too long, I want to hear others experience in getting that many pages indexed. Next we can go into how we rank them well too
Mikkel deMib Svendsen is offline   Reply With Quote
Old 08-25-2004   #2
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
>>I want to hear others experience in getting that many pages indexed

Pagination: See pages: 1 2 3 - 50

next page (2)
See pages: 3 4 5 - 51

etc...

Works well for me..

Nick
Nick W is offline   Reply With Quote
Old 08-25-2004   #3
Mikkel deMib Svendsen
 
Mikkel deMib Svendsen's Avatar
 
Join Date: Jun 2004
Location: Copenhagen, Denmark
Posts: 1,576
Mikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud of
Yes, pagination works too. However, in some cases I have had problems making sure the pages do not become to identical. In any case, I always adjust titles and META-tags for paged pages, so as a minimum titles get a "... page 2" added. I just don't want hundred of pages with too identical content and identical headers indexed. It's not healthy
Mikkel deMib Svendsen is offline   Reply With Quote
Old 08-25-2004   #4
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Rotating Featured Articles/Products on the homepage and landing pages, on a daily basis works well.

Having links to related articles/products from articles and products also works well.

Last edited by rustybrick : 08-25-2004 at 11:19 AM. Reason: added more detail
rustybrick is offline   Reply With Quote
Old 08-26-2004   #5
Mikkel deMib Svendsen
 
Mikkel deMib Svendsen's Avatar
 
Join Date: Jun 2004
Location: Copenhagen, Denmark
Posts: 1,576
Mikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud of
How much do you think freshness impact indexing? Personally I do think it helps.
Mikkel deMib Svendsen is offline   Reply With Quote
Old 08-26-2004   #6
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Its hard to say, of course you want the search bots revisiting your pages as frequently as possible (if you have the server resources), this way it can pick up on new pages more frequently. In that case, I feel freshness is very important. So if you have a sub category landing page that rotates articles/products on a daily basis, then you can be sure that when you add a new article/product to that page, the search bots will pick them up more rapidly then a page that did not have a dynamic portion to it.

But in regards to ranking purposes, its really hard to say. I have some static pages (built dynamically, but really static in nature) that rank very well as well. I am unsure, how much freshness plays into ranking for existing pages...
rustybrick is offline   Reply With Quote
Old 08-26-2004   #7
seomike
Md_Rewrite Guru
 
Join Date: Jun 2004
Location: Dallas, Texas but forever a Floridian!
Posts: 627
seomike is a splendid one to beholdseomike is a splendid one to beholdseomike is a splendid one to beholdseomike is a splendid one to beholdseomike is a splendid one to beholdseomike is a splendid one to beholdseomike is a splendid one to behold
I hear ya when it comes to server load. I know there is a meta tag that I've seen that tells spiders to revisit every 10, 15, 20... days. Has anyone ever used that or can someone tell me if it works LOL.
seomike is offline   Reply With Quote
Old 08-26-2004   #8
Mikkel deMib Svendsen
 
Mikkel deMib Svendsen's Avatar
 
Join Date: Jun 2004
Location: Copenhagen, Denmark
Posts: 1,576
Mikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud of
I do not know of any search engines that supprot the revisit-after META-tag. I think it's just as effective as the pagerank=10 tag
Mikkel deMib Svendsen is offline   Reply With Quote
Old 08-26-2004   #9
seobook
I'm blogging this
 
Join Date: Jun 2004
Location: we are Penn State!
Posts: 1,943
seobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to all
Quote:
Originally Posted by Mikkel deMib Svendsen
I do not know of any search engines that supprot the revisit-after META-tag. I think it's just as effective as the pagerank=10 tag
exactly, since everyone is using the pagerank=10 tag you now need to use pagerank=11 to get the same effect out of it
__________________
The SEO Book
seobook is offline   Reply With Quote
Old 08-29-2004   #10
Mikkel deMib Svendsen
 
Mikkel deMib Svendsen's Avatar
 
Join Date: Jun 2004
Location: Copenhagen, Denmark
Posts: 1,576
Mikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud of
Yes, and if you want to be evil you can even use this trick:

<a href="YourCompetitor.com" PageRank="-1">Negative link</a>

Mikkel deMib Svendsen is offline   Reply With Quote
Old 09-22-2004   #11
Nicky
Member
 
Join Date: Jul 2004
Location: England
Posts: 14
Nicky is on a distinguished road
Ohhh....That sounds naughty
Nicky is offline   Reply With Quote
Old 11-16-2004   #12
Nacho
 
Nacho's Avatar
 
Join Date: Jun 2004
Location: La Jolla, CA
Posts: 1,382
Nacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to behold
Mikkel, this is such a great post! Gotta give it a bump <<

Here's another thread here at SEW related to yours: Let's discuss ROBOTS.TXT.
Nacho is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off