PDA

View Full Version : need help with robots.txt for dynamic site


mphung
12-12-2006, 04:35 PM
I'm not well-versed in robots.txt so I'm hoping someone with more expertise can answer what might be a simple question.

I'm want to disallow crawling of pages with dynamic variables. Currently one of our very large sites which appends a lot of variables to the end of nearly all our pages.

Example 1: domain.com/help/?variable1=x&variable2=y&variable3=&zipcode=&

where everything after and including the "?" is not necessary to view the page (i.e., duplicate). The file domain.com/help/index.asp doesn't exist. The version of the page that needs to be indexed, and the only one that should be indexed, is domain.com/help/

So, will Disallow: /help/ still make the domain.com/help/ page itself available?

Example 2: domain.com/content.asp?pageid=123&variable1=&...

where pageid=123 is unique to the page, but everything after isn't. So I want domain.com/content.asp?pageid=123, ?pageid=234, ?pageid=345 to be indexed, but not anything beyond that. Is there a way to include the first variable+value but disallow subsequent variable strings with robots.txt?

Any and all input is greatly appreciated.

Thanks.

mphung
12-26-2006, 05:16 PM
Sorry for re-posting this. The question didn't really get any views in the subforum, so I'm hoping someone here who hadn't seen it yet might be able to help:

<edit>Please try not to duplicate post, it's usually best to simply re-post to your existing thread and ask again</edit>

evilgreenmonkey
12-27-2006, 07:11 AM
Example 1
The following should allow indexing of /help/ but not of /help/index.asp or any help URLs with variables:User-agent: *
Allow: /help/
Disallow: /help/index.asp
Disallow: /help/*?I would then use a 301 redirect sending /help/index.asp over to /help/. You can code the redirect to attach any query strings sent to the old URL onto the new.

Example 2
This should allow URLs in the format of domain.com/content.asp?pageid=123, but not those with any other variables:User-agent: *
Allow: /content.asp
Disallow: /content.asp*&These examples have not been tested and should be checked using Google Webmaster Tools (http://www.google.com/webmasters/tools/siteoverview) for validation. Further information on how a robots.txt can be used with dynamic sites can be found here (http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling).

mphung
12-27-2006, 10:54 AM
Great - that's helpful. Thank you!