PDA

View Full Version : mod_rewrite and robots.txt


Jazajay
09-05-2007, 09:30 AM
Hi
I was thinking should I change all my urls in my robot.txt to the new re-wrote urls which I have done though mod_rewrite.

So if my file, and robot.txt, say example.co.uk/cat1.php?sel=1

but I have re-wrote it to

example.co.uk/cat/sel/1

should I put

example.co.uk/cat1.php?sel=1

or

example.co.uk/cat/sel/1

in my robot.txt? or doesn't it matter?

Cheers

Jaza

beu
09-05-2007, 03:00 PM
Hi
I was thinking should I change all my urls in my robot.txt to the new re-wrote urls which I have done though mod_rewrite.

So if my file, and robot.txt, say example.co.uk/cat1.php?sel=1

but I have re-wrote it to

example.co.uk/cat/sel/1

should I put

example.co.uk/cat1.php?sel=1

or

example.co.uk/cat/sel/1

in my robot.txt? or doesn't it matter?

Cheers

Jaza

You should redirect the old URLs to the new URLs via a 301 redirect. Disallowing URLs via robots.txt is not a good idea. Either way, there should be no problem with "duplicate content" if it's store items linked via multiple URLs.

Jazajay
09-05-2007, 04:05 PM
Hi Beu
Cheers for the advice.

It's a new site with no back links at the mo so a 301 will be unnessercary right? or am I wrong?

Secondly I need to use robot.txt as I have a few products in different categorys and I need to use it on those to avoid dup. issues.

So I take it I need to use the new rewritten url in the robot.txt?

beu
09-06-2007, 01:56 AM
Hi Beu
Cheers for the advice.

It's a new site with no back links at the mo so a 301 will be unnessercary right? or am I wrong?

Secondly I need to use robot.txt as I have a few products in different categorys and I need to use it on those to avoid dup. issues.

So I take it I need to use the new rewritten url in the robot.txt?

No problem Jazajay happy to help!:)

In most cases a robots.txt is used to disallow search engines from indexing (accessing) pages. If you want search engines to find your pages, you DO NOT want to disallow search engines from finding your pages via a robots.txt.

At the same time you also DO NOT want to disallow URLs with 301 redirects.

If you have no back links you may not need a 301 redirect from the old url to the new url but, it might still be a good idea to consider. In your case it seems like the best idea if you are concerned about duplicate content.

I don't see duplicate content causing problems because there is no really good way around it if you have product ids and category ids in your product URLs.(Even if you have one product with a category id in the URL, you could remove the "&cat=1111" portion of the URL and still get the product. Search engines understand this issue and don't penalize "store items shown or linked via multiple distinct URLs.")

Google says (see second point bolded by me):
"Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:

* Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
* Store items shown or linked via multiple distinct URLs
* Printer-only versions of web pages"
http://www.google.com/support/webmasters/bin/answer.py?answer=66359

Hope that helps!

Jazajay
09-06-2007, 07:05 AM
Hi
Yeah I know. The reason I want to block files is for example category 1 would be -

example.com/cat1/item1.php

However due to the size of the catolog I need a few items in different categories, as this makes sense from a useabilitiy point of view, as

example.com/cat8/item1.php

But the URL has been re-written. But in my robot.txt it comes up as, as I wrote this before the re-write.

example.com/cat8/product.php?name=item1

so in my robot.txt would I need to change it to the new url which has been re-written as in

Disallow: /cat8/item1.php

or keep it how it is, as I don't want this one to be indexed due to the reason you gave.

Hope I have made it clearer as to what I'm trying to accomplish sorry my fault.

Cheers again buddy.

beu
09-06-2007, 10:53 AM
Assuming that you will be using your robots.txt to disallow URLs, you would only want to list URLs therein that you don't want accessed by search engines.

Jazajay
09-06-2007, 02:32 PM
Ok
But which one do I need to put in the Robot.txt as in -

example.co.uk/toys/toyname.php

which is the re-wrote URL or

example.co.uk/product.php?name=toyname

I have the second one in the robot.txt at the mo, but as all my URLs appear as the first one in the address bar, as I have re-wrote them, do I need to change them to the new re-wrote URL

so instead of putting

example.co.uk/product.php?name=toyname (this is the one that is in my robot.txt at the mo)

do I need to change it in the robot.txt to

example.co.uk/toys/toyname.php

or doesn't it matter?

beu
09-07-2007, 12:28 AM
Ok
But which one do I need to put in the Robot.txt as in -

example.co.uk/toys/toyname.php

which is the re-wrote URL or

example.co.uk/product.php?name=toyname

I have the second one in the robot.txt at the mo, but as all my URLs appear as the first one in the address bar, as I have re-wrote them, do I need to change them to the new re-wrote URL

so instead of putting

example.co.uk/product.php?name=toyname (this is the one that is in my robot.txt at the mo)

do I need to change it in the robot.txt to

example.co.uk/toys/toyname.php

or doesn't it matter?

It matters, if you are sure you want to go this route you should disallow the URLs that you DO NOT want crawled in your robots.txt.

Jazajay
09-15-2007, 01:06 PM
right so just to clarify I should block the rewritten ones?
as in

disallow: /toys/toyname.php

and not

disallow: /product.php?name=toyname