View Full Version : Robots: Allow: /*=site
Dj Morri
03-05-2009, 07:11 PM
Hi,
I would like to know if the following string on the robots.txt will allow Googlebot to read any URL that included the string =site no matter where that string occurs in the URL ???
User-agent: Googlebot
Allow: /*=site
What about Yahoo and MSN ??
User-agent: *
Allow: /*=site
JohnW
03-05-2009, 09:29 PM
The default is to allow everything that is not stated as disallow.I don't believe the Allow: statement is relevant, so I doubt if it matters how you do it. The statement you cite basically says:
Allow: /*=site
Allow: /
Dj Morri
03-06-2009, 11:23 AM
Hi John,
We are blocking all our "buscar folder" under search results but I want to allow only our own internal search results to be indexed.
The only word that is different from other search results on our site is the word "site" so how can I specify that all URLS that contains the word "site" under the "buscar Folder" are index?
Thank you
JohnW
03-06-2009, 11:40 AM
DJ, the robots.txt you originally provided, by itself, is not going to accomplish anything at all.
User-agent: Googlebot
Allow: /*=site
If you want it to matter, it would need to look like this:
User-agent: Googlebot
Disallow: /
Allow: /*=site
The above is now saying to disallow everything except for what is allowed.
If you do the same thing only instead of Googlebot you make it be User-agent:* then don't expect it to work for you at Yahoo, MSN etc. Last time I looked, the statement for Allow: was NOT really a part of robots.txt protocol and something only supported by Google.
So if I understand you correctly it could look something like this:
User-agent: Googlebot
Disallow: /
Allow: /*=site
As far as dealing with this for ALL search engines perhaps using a dynamically inserted noindex meta tag on certain pages would be better. But you could go ahead check and see if other SEs have decided to support Allow: and if so, how they use it.
Dj Morri
03-06-2009, 11:45 AM
Thanks John, I will try that string,
Thanks
Dj Morri
03-06-2009, 11:55 AM
This is the string that I am adding now:
User-agent: Googlebot
Disallow: /buscar/
Allow: /buscar/*=site