hfire
06-30-2004, 02:41 PM
BACKGROUND
I work for a company with a large DB of content that is featured in two websites (one for Canada and Germany, the other for the UK) with two different domain names (one .NET and one .COM). The .COM site has a good page rank and is the older of the two.
Currently, we are only allowing Google to index .COM site and not the .NET. This is due to our fear that if Google indexes both we will be punished for duplicating content. (Please note that although .COM and .NET are in different languages, the content in the DB is not translated so it appears the same way in both websites).
QUESTIONS
-Are we correct in assuming that Google will pubish us for duplicated content?
-How do we avoid being punished while at the same time making sure that both websites are indexed?
-Would using different sub-domains instead of different domain names solve our problem?
Thanks for the help,
Chris
David Wallace
06-30-2004, 03:00 PM
Google does not punish for duplicate content in my experience. They will just not display duplicate content.
Now I have experienced Inktomi's punishing on a client web site for near duplicate content and that punishment still stands today with Yahoo's new search engine. they are the ones I'd watch out for.
donut
07-01-2004, 11:34 PM
There is tons and tons of duplicate content on the web. Usually, the engines just pick the "most important" version of that content and drop the rest.
Content in different languages is not duplicate at all- you should go ahead and allow indexing of the other site.
On the pages that are the same, just be aware that only one or the other will probably show up, not both. And it may not be the one you want! You might put the pages that are completely duplicate (same language and everything) in a separate folder and just exclude that folder using robots.txt.
Marcia
07-02-2004, 07:40 AM
It's not entirely impossible that I'm misunderstanding the original question, but If there are different languages involved you may want to go with separate country-specific domain names anyway and handle linking between and among them prudently.
Also, with what's an expected emphasis on local search in the not-so-distant future, it sure can't hurt to examine contributing factors as part of the strategy for sites with an element of language and/or geographical diversity.
Robert_Charlton
07-04-2004, 02:59 AM
...two websites (one for Canada and Germany, the other for the UK) with two different domain names (one .NET and one .COM). The .COM site has a good page rank and is the older of the two.
Chris - The grouping is curious and suggests that you are not separating by language... ie, that all or most of the content is in English. Thus, Google will see at least much of this as duplicate content.
-Are we correct in assuming that Google will pubish us for duplicated content?
As David Wallace mentioned, Google, at least in the past, doesn't punish for duplicated content, but it generally drops out the pages with lower PageRank.
A couple of qualifications to this, though...
a) I've seen from time to time that, say, an authoritative article will rank well on pages on several different domains. When these pages do well independently in the serps, I assume it's because each has sufficient inbounds to establish it as an authority page on the subject, and there's enough different about the page title and introduction to establish some kind of difference.
b) I've also been seeing that Google has been dropping related sites that apparently target similar words or phrases. When this happens, the "dropped" site will not even rank for its own very specific company name.
In these cases, the content is not even duplicate content... it is somewhere similar content, with some overlapping language, that appears on both related sites.
In general, the sites I've observed with this problem are on the same class C IPs, though I've also seen them on consecutive IPs... and they have inbound links from shared pages (ie, the same page will link to several of the sites).
Some of them have a fair number of inbound links from independent sources as well, but it's not clear how many independent inbounds are necessary to establish the domains as independent. Sometimes the sites are crosslinked, say from a company links page that might appear on each site... but sometimes there may be only one link between the sites.
In the cases where I've experienced problems with such related sites, they've been subsidiary brick and mortar companies managed by an umbrella company... they've have been in existence longer than Google... and they have separate off-web identities.
This is a long answer about duplicate content... but these considerations may well affect the sites you're discussing. I think, under the circumstances I describe, the similarity threshold has gotten lower.
-How do we avoid being punished while at the same time making sure that both websites are indexed?
I'm still trying to sort out the above myself. If you must put up dupe content, which I would not do, I'd suggest that for now you keep the sites absolutely independent... ie, different hosts, different inbound links, and either no cross-linking or javascript cross-linking (and I dislike depending on js for navigation).
I say "for now" because I'm hoping that Google will sort this out. I think they're penalizing some innocent sites while they're zapping some interlinked spam networks.
Where you want both of your websites indexed for the same content, though, you may well be asking too much of an engine.
-Would using different sub-domains instead of different domain names solve our problem?
I don't know why it should if the content is duplicated, but I'm not sure. It might work if you had enough independent inbounds to each sub-domain.