PDA

View Full Version : Robots: Allow: /*=site


Dj Morri
03-05-2009, 07:11 PM
Hi,

I would like to know if the following string on the robots.txt will allow Googlebot to read any URL that included the string =site no matter where that string occurs in the URL ???

User-agent: Googlebot
Allow: /*=site

What about Yahoo and MSN ??

User-agent: *
Allow: /*=site

JohnW
03-05-2009, 09:29 PM
The default is to allow everything that is not stated as disallow.I don't believe the Allow: statement is relevant, so I doubt if it matters how you do it. The statement you cite basically says:

Allow: /*=site
Allow: /

Dj Morri
03-06-2009, 11:23 AM
Hi John,

We are blocking all our "buscar folder" under search results but I want to allow only our own internal search results to be indexed.


The only word that is different from other search results on our site is the word "site" so how can I specify that all URLS that contains the word "site" under the "buscar Folder" are index?

Thank you

JohnW
03-06-2009, 11:40 AM
DJ, the robots.txt you originally provided, by itself, is not going to accomplish anything at all.

User-agent: Googlebot
Allow: /*=site

If you want it to matter, it would need to look like this:

User-agent: Googlebot
Disallow: /
Allow: /*=site

The above is now saying to disallow everything except for what is allowed.

If you do the same thing only instead of Googlebot you make it be User-agent:* then don't expect it to work for you at Yahoo, MSN etc. Last time I looked, the statement for Allow: was NOT really a part of robots.txt protocol and something only supported by Google.

So if I understand you correctly it could look something like this:

User-agent: Googlebot
Disallow: /
Allow: /*=site

As far as dealing with this for ALL search engines perhaps using a dynamically inserted noindex meta tag on certain pages would be better. But you could go ahead check and see if other SEs have decided to support Allow: and if so, how they use it.

Dj Morri
03-06-2009, 11:45 AM
Thanks John, I will try that string,

Thanks

Dj Morri
03-06-2009, 11:55 AM
This is the string that I am adding now:

User-agent: Googlebot
Disallow: /buscar/
Allow: /buscar/*=site