Nacho
10-08-2004, 05:58 AM
I've noticed that some popular sites have robots.txt and some don't, for example:
DO HAVE
http://www.microsoft.com/robots.txt
http://www.google.com/robots.txt
http://www.cnn.com/robots.txt
http://www.adobe.com/robots.txt
http://www.apple.com/robots.txt
http://www.w3.org/robots.txt
http://www.dmoz.org/robots.txt
http://online.wsj.com/robots.txt
http://www.whitehouse.gov/robots.txt
etc.
DO NOT HAVE
http://www.yahoo.com/robots.txt
http://www.msn.com/robots.txt
... and I'm sure there are more.
There are two great sources about the robots.txt file that I know:
http://www.robotstxt.org/wc/robots.html
http://www.searchengineworld.com/robots/robots_tutorial.htm
One of our SEW Forums Moderators, Mikkel deMib Svendsen, gives great insight (http://forums.searchenginewatch.com/showthread.php?t=1289) how Webmaster World refuses visits to its robots.txt (www.webmasterworld.com/robots.txt) in order to not loose bandwith by allowing all bots to come through. Therefore filtering those which are not really important.
So, how important is the robots.txt or better said, how useful do you think it is to the search engines? are there any crawlers not respecting the robots.txt? do you think it's not useful at all? is it vital to gain rankings on the search engines or there is no impact at all on the SERPs? what are your thougths on the robots.txt?
DO HAVE
http://www.microsoft.com/robots.txt
http://www.google.com/robots.txt
http://www.cnn.com/robots.txt
http://www.adobe.com/robots.txt
http://www.apple.com/robots.txt
http://www.w3.org/robots.txt
http://www.dmoz.org/robots.txt
http://online.wsj.com/robots.txt
http://www.whitehouse.gov/robots.txt
etc.
DO NOT HAVE
http://www.yahoo.com/robots.txt
http://www.msn.com/robots.txt
... and I'm sure there are more.
There are two great sources about the robots.txt file that I know:
http://www.robotstxt.org/wc/robots.html
http://www.searchengineworld.com/robots/robots_tutorial.htm
One of our SEW Forums Moderators, Mikkel deMib Svendsen, gives great insight (http://forums.searchenginewatch.com/showthread.php?t=1289) how Webmaster World refuses visits to its robots.txt (www.webmasterworld.com/robots.txt) in order to not loose bandwith by allowing all bots to come through. Therefore filtering those which are not really important.
So, how important is the robots.txt or better said, how useful do you think it is to the search engines? are there any crawlers not respecting the robots.txt? do you think it's not useful at all? is it vital to gain rankings on the search engines or there is no impact at all on the SERPs? what are your thougths on the robots.txt?