PDA

View Full Version : In a Muddle


Gooner151078
09-21-2007, 03:32 PM
Site Redesign Question I'm afraid:

The new site will have promotional pages that will all be equipped and focussed on SEO. This bit is fine. The difficulty comes when it comes to the transactional pages as for various reasons they must be split into:

ww1.example.com
ww2.example.com

These pages will have identical content. It is not necessary to have the transactional pages indexed by search engines therefore the plan is to disallow using robots.txt.

User-agent: *
Disallow: /transaction

The simple questions are:

Are there any drawbacks to this solution?
Does anyone have any better ideas?

and more tricky,

If a customer creates a link to a page within the transaction, will this be credited as a inbound link? Will the duplicate content be exposed even though the spiders are disallowed on these pages via the robots.txt.

Will the loss of internal links from the transactional pages be worth it for the sake of protecting from duplicate content issues? ie. should I not have the robots.txt at all.

I think I have the idea correct, but wanted to gain feedback from the community.

Thanks

jimbeetle
09-21-2007, 05:40 PM
If you don't think the pages have any search value the best bet might be slapping a meta robtots noindex on them.

Gooner151078
09-25-2007, 07:19 AM
So if a user creates an inbound link to a page that has been disallowed (through robots.txt or meta robots) Google will not read the page, but will it credit you with an inbound link?

Jazajay
09-28-2007, 01:31 PM
No if you block the page from being accessed by Google it wont be in it's index. If it's not in it's index it cant have any PR/equity and any link pointing to it is useless in Googles eyes.

You will not get it's equity.

Do this transition then simply 301 both sites to 1 when you put them back. That way any links to the blocked site will be passed over eventually.

Gooner151078
10-11-2007, 03:50 AM
Ok. I have now figured out a solution and thought I would share it with the group.

Rather than using the robots.txt file, I plan to use noindex in the meta description of www.example2.com. Using this method, the page will not show in the serps and I do not believe I can be penalised for duplicate content.

However, if anyone creates a link to www.example2.com then all link equity will still be passed through this page despite the noindex tag. I also plan to use nofollow where appropriate.

Jazajay
10-16-2007, 07:07 AM
That should work.
There is one more solution that I can think of.

You could right some code to detect the SE and then 301 them to the correct category. The site visitor will get the page while SE gets redirected to the other page.

However this does have a few downsides

1.It's cloaking and depending on your site if you do it right you wont be caught and get penalized. However if your site is a aurthority you probably wont be penalized. New york times is the best example of this. All their content appears in the results but you have to sign in to view it. They don't get penalized and I doubt wiki would either if they did.

2.301's take a while to pass equity.

3.you need to know what you are doing.

I would probably go with noindex though it's a lot easier.

jimbeetle
10-16-2007, 12:45 PM
No if you block the page from being accessed by Google it wont be in it's index. If it's not in it's index it cant have any PR/equity and any link pointing to it is useless in Googles eyes.
Not quite. From a recent Matt Cutts interview (http://www.stonetemple.com/articles/interview-matt-cutts.shtml):

Now, robots.txt says you are not allowed to crawl a page, and Google therefore does not crawl pages that are forbidden in robots.txt. However, they can accrue PageRank, and they can be returned in our search results.
and...
A NoIndex page can accumulate PageRank, because the links are still followed outwards from a NoIndex page.

Jazajay
10-16-2007, 10:23 PM
Intresting answer from Matt I'll definatly be changing to noindex in the furture after reading that.