PDA

View Full Version : more clarification on robots.txt versus meta tag


mphung
01-05-2007, 02:50 PM
I've got a site with a lot of dynamic URLs that result in duplicate content being indexed. If I exclude these pages from spidering in my robots.txt file, but other people continue to link to the various versions, will they still be indexed or will the spiders consult robots.txt first and know not to index them.

In other words, do I need to put noindex,nofollow on each duplicate page in addition to the robots.txt file, or will just the latter do both (both = excluding these links based on a site crawl and also excluding them even though other sites are linking to them)?

Thanks.

jimbeetle
01-05-2007, 03:25 PM
If a page disallowed in robots.txt and a bot finds links to it there's a good chance that it will find its way into the index as a URL-only entry, that is, with no title or description.

If you use the meta robots noindex and disallow a page in robots.txt, then the page will not be fetched, the bot will not see the noindex entry, and the same situation as above might occur.

The only way to (almost) be positive that a page will not show is to use the meta robots element and not disallow it in robots.txt (so that the bot can see -- and hopefully obey -- the noindex).

(Be aware that a while back there was a hint from someone at Y! that even if there was a noindex on a page, Y! might still include it if there were so many links to it that it made it appear to be very important. Not sure if this is still operative.)