PDA

View Full Version : Let's discuss ROBOTS.TXT


Nacho
10-08-2004, 05:58 AM
I've noticed that some popular sites have robots.txt and some don't, for example:

DO HAVE
http://www.microsoft.com/robots.txt
http://www.google.com/robots.txt
http://www.cnn.com/robots.txt
http://www.adobe.com/robots.txt
http://www.apple.com/robots.txt
http://www.w3.org/robots.txt
http://www.dmoz.org/robots.txt
http://online.wsj.com/robots.txt
http://www.whitehouse.gov/robots.txt
etc.

DO NOT HAVE
http://www.yahoo.com/robots.txt
http://www.msn.com/robots.txt
... and I'm sure there are more.

There are two great sources about the robots.txt file that I know:

http://www.robotstxt.org/wc/robots.html
http://www.searchengineworld.com/robots/robots_tutorial.htm

One of our SEW Forums Moderators, Mikkel deMib Svendsen, gives great insight (http://forums.searchenginewatch.com/showthread.php?t=1289) how Webmaster World refuses visits to its robots.txt (www.webmasterworld.com/robots.txt) in order to not loose bandwith by allowing all bots to come through. Therefore filtering those which are not really important.

So, how important is the robots.txt or better said, how useful do you think it is to the search engines? are there any crawlers not respecting the robots.txt? do you think it's not useful at all? is it vital to gain rankings on the search engines or there is no impact at all on the SERPs? what are your thougths on the robots.txt?

fathom
10-08-2004, 10:30 AM
Most prime engines obey it and most of these you truly want them spidering your site for indexing... many BS bots ignore it so a ban list in .htaccess is the better approach.

Robots.txt is best used to keep all good bots out of sensitive areas and/or things like dup pages for different engines, PPC/PPI segregated landing pages, etc.