View Full Version : mod_rewrite and robots.txt
Jazajay
09-05-2007, 09:30 AM
Hi
I was thinking should I change all my urls in my robot.txt to the new re-wrote urls which I have done though mod_rewrite.
So if my file, and robot.txt, say example.co.uk/cat1.php?sel=1
but I have re-wrote it to
example.co.uk/cat/sel/1
should I put
example.co.uk/cat1.php?sel=1
or
example.co.uk/cat/sel/1
in my robot.txt? or doesn't it matter?
Cheers
Jaza
Hi
I was thinking should I change all my urls in my robot.txt to the new re-wrote urls which I have done though mod_rewrite.
So if my file, and robot.txt, say example.co.uk/cat1.php?sel=1
but I have re-wrote it to
example.co.uk/cat/sel/1
should I put
example.co.uk/cat1.php?sel=1
or
example.co.uk/cat/sel/1
in my robot.txt? or doesn't it matter?
Cheers
Jaza
You should redirect the old URLs to the new URLs via a 301 redirect. Disallowing URLs via robots.txt is not a good idea. Either way, there should be no problem with "duplicate content" if it's store items linked via multiple URLs.
Jazajay
09-05-2007, 04:05 PM
Hi Beu
Cheers for the advice.
It's a new site with no back links at the mo so a 301 will be unnessercary right? or am I wrong?
Secondly I need to use robot.txt as I have a few products in different categorys and I need to use it on those to avoid dup. issues.
So I take it I need to use the new rewritten url in the robot.txt?
Hi Beu
Cheers for the advice.
It's a new site with no back links at the mo so a 301 will be unnessercary right? or am I wrong?
Secondly I need to use robot.txt as I have a few products in different categorys and I need to use it on those to avoid dup. issues.
So I take it I need to use the new rewritten url in the robot.txt?
No problem Jazajay happy to help!:)
In most cases a robots.txt is used to disallow search engines from indexing (accessing) pages. If you want search engines to find your pages, you DO NOT want to disallow search engines from finding your pages via a robots.txt.
At the same time you also DO NOT want to disallow URLs with 301 redirects.
If you have no back links you may not need a 301 redirect from the old url to the new url but, it might still be a good idea to consider. In your case it seems like the best idea if you are concerned about duplicate content.
I don't see duplicate content causing problems because there is no really good way around it if you have product ids and category ids in your product URLs.(Even if you have one product with a category id in the URL, you could remove the "&cat=1111" portion of the URL and still get the product. Search engines understand this issue and don't penalize "store items shown or linked via multiple distinct URLs.")
Google says (see second point bolded by me):
"Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:
* Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
* Store items shown or linked via multiple distinct URLs
* Printer-only versions of web pages"
http://www.google.com/support/webmasters/bin/answer.py?answer=66359
Hope that helps!
Jazajay
09-06-2007, 07:05 AM
Hi
Yeah I know. The reason I want to block files is for example category 1 would be -
example.com/cat1/item1.php
However due to the size of the catolog I need a few items in different categories, as this makes sense from a useabilitiy point of view, as
example.com/cat8/item1.php
But the URL has been re-written. But in my robot.txt it comes up as, as I wrote this before the re-write.
example.com/cat8/product.php?name=item1
so in my robot.txt would I need to change it to the new url which has been re-written as in
Disallow: /cat8/item1.php
or keep it how it is, as I don't want this one to be indexed due to the reason you gave.
Hope I have made it clearer as to what I'm trying to accomplish sorry my fault.
Cheers again buddy.
Assuming that you will be using your robots.txt to disallow URLs, you would only want to list URLs therein that you don't want accessed by search engines.
Jazajay
09-06-2007, 02:32 PM
Ok
But which one do I need to put in the Robot.txt as in -
example.co.uk/toys/toyname.php
which is the re-wrote URL or
example.co.uk/product.php?name=toyname
I have the second one in the robot.txt at the mo, but as all my URLs appear as the first one in the address bar, as I have re-wrote them, do I need to change them to the new re-wrote URL
so instead of putting
example.co.uk/product.php?name=toyname (this is the one that is in my robot.txt at the mo)
do I need to change it in the robot.txt to
example.co.uk/toys/toyname.php
or doesn't it matter?
Ok
But which one do I need to put in the Robot.txt as in -
example.co.uk/toys/toyname.php
which is the re-wrote URL or
example.co.uk/product.php?name=toyname
I have the second one in the robot.txt at the mo, but as all my URLs appear as the first one in the address bar, as I have re-wrote them, do I need to change them to the new re-wrote URL
so instead of putting
example.co.uk/product.php?name=toyname (this is the one that is in my robot.txt at the mo)
do I need to change it in the robot.txt to
example.co.uk/toys/toyname.php
or doesn't it matter?
It matters, if you are sure you want to go this route you should disallow the URLs that you DO NOT want crawled in your robots.txt.
Jazajay
09-15-2007, 01:06 PM
right so just to clarify I should block the rewritten ones?
as in
disallow: /toys/toyname.php
and not
disallow: /product.php?name=toyname