View Full Version : Duplicate Content...no worries?
Carlos Chacón
08-20-2005, 02:33 PM
All I know about duplicate content is: this penalty is pervasive and applies to many sites and it is applied when a large variety of the text on one page is replicated on another page. Result: the page with the lowest PageRank will get a penalty.
I’ve seen some sites with different URL "sharing" exactly the same content & ranking one above another…on the same web results page!? :confused:
How we can explain this?
Help will be appreciated
Mikkel deMib Svendsen
08-20-2005, 04:12 PM
All the major engines are trying to get rid of duplicate content (more or less agressivly) but it is just not as easy as a lot of people think.
With billions of documents - some of them updated on a daiy or weekly basis, it is an increadible ammount of resources it takes to just begin checking every single document for possibly duplicates. With 9 billion (and Yahoo now claiming 19.2) documents it would basically take the same number of "searches" to find any duplicate pages in the index. Thats 9 (pr 19.2) billion searches! Thats a lot even for Yahoo, Google or Yahoo. And thats just for exact duplicate page checks.
What about pages that are in fact duplicate but have some dynamic elements, such as time stamp, RSS feeds (inclusion) or advertising so that two versions of the page would actually not be 100% identical? Writing a effecient algorithm for proper duplicate identification that take this (and a lot of other similar issues) into account is definately not easy - and once you have it you need to apply it to the billion of searches mentioned before to find the duplicates.
And to be perfect all this have to be done at the same speed as the index changes ...!
The search engines are not perfect. They don't want duplicate content, in general, but fail to remove it all. They don't want spam either but some will allways slip through. That just seems to be the nature of the game.
Carlos Chacón
08-20-2005, 08:48 PM
All the major engines are trying to get rid of duplicate content (more or less agressivly) but it is just not as easy as a lot of people think...
...The search engines are not perfect. They don't want duplicate content, in general, but fail to remove it all. They don't want spam either but some will allways slip through. That just seems to be the nature of the game.
Yes, I’m agreeing with this, no doubt!
But is a shame how sometimes the search engine experience turns to be boring and frustrating.
Probably the majorities who participate on these forums have more knowledge about this and many other things related with the SE but some people don’t. What I believe is the search engine users will be learning little by little how the SE algorithm works and to choose the best options online.
glengara
08-21-2005, 02:18 AM
May also depend on the type of dup content, and if there's any "affiliate" connection, for an example take a look at "SEO school" on G.
ohcho
08-21-2005, 06:58 AM
Search engines are extremely stupid fellows that do not have capability to differentiate two different pages, other than literal comparisons. I spend significant time to make SEs pick correct pages! Unless you make file copies or copy and paste, duplicate is not much to worry about!
Mikkel deMib Svendsen
08-21-2005, 07:15 AM
I don't think this is a question of stupidity on the egnines side but rather one of balancing the ammount of resources spend on one thing with the real quality inprovements it brings. Its the same game with many other search related issues.
Unless you make file copies or copy and paste, duplicate is not much to worry about!
That is definately not my experience. Most of the large dynamic websites and large organisations I deal with have significan duplicate problems and very often it gives them serious headaches with search engines - and it has nothing to do with making copies of the content. The problem is multiple domains, crazy navigation schemes, session IDs and many other technical/architecture related issues.
The fact is that you can often get away with many forms of close or identical duplicate content because of the things I mentioned above (and others) but you just never know when and how you'll get hit by a duplicate-filter nand when you do it hurts.
So, I don't think it is in general a very good recomandation to just say that you should not worrie too much about duplicate content - deal with it! Don't let the engines fix your problems because they most likely won't fix them the way you would. But then again, thats just me that like to be in control of things :)
ohcho
08-21-2005, 08:29 AM
Actually we are discussing a bit different aspect of duplicate problems from the original poster. I might have different interpretation of the meaning of "penalty" and "duplicates".
If you are worried about being "penalized" by SEs, then you can forget about it. It's because SEs are too dumb to identify duplicates.
If you are to optimize multiple pages, then this is a big problem. For example, if you have a page optimized for "A", and another for "B". It is quite common to have other keywords in documents. When you search, you often get wrong pages! In this case, you have to avoid using other keywords in documents. One way to do is to break keywords using <WBR> into two or more pieces that mean nothing. I do quite a lot of this, due to stupidity of search engines.
Mikkel deMib Svendsen
08-21-2005, 10:04 AM
If you are worried about being "penalized" by SEs, then you can forget about it. It's because SEs are too dumb to identify duplicates.
If you want to believe that everything has to do with the engines just being stupid or dumb then go right ahread - I am just more interested in finding the real reasons for this, and other issues, so I can deal with it. Yelling "dumb" at the engines won't do you much good :)
Carlos Chacón
08-21-2005, 12:40 PM
Actually we are discussing a bit different aspect of duplicate problems from the original poster. I might have different interpretation of the meaning of "penalty" and "duplicates".
My original comment was related with the fact on some SE which are showing different pages with the same content ranking one above the other o viceversa.
I know maybe every day is going to be hard “penalize” those pages because the quantity coming up.
That’s why I think there are 2 parts here:
a) Those who write the content (Original)
b) Those who just copy & paste it (From a)
I consider myself from the “A” team. The thing is how to be worried for something that probably will never stop? I trust on the people behind the SE to optimize them improving the speed… some day.
Andy AtkinsKruger
08-21-2005, 01:55 PM
Also hailing from the 'search engines are not stupid school' - duplication is a problem for the search engines and also for sites where they have different pages targeting different countries.
However, in my experience, when the search engine is not fooled by something else going as (see Mikkel's earlier post), the search engines will plump for the result they think is best - not reject the content altogether.
So where duplication is most tricky is actually where your content is also on someone else's site - either because it was sold or some other reason.
The answer - don't use content which is duplicated on someone else's site or don't let them take your content if it's that valuable. Odds are, you'll lose.
ohcho
08-21-2005, 05:53 PM
I consider myself from the “A” team. The thing is how to be worried for something that probably will never stop? I trust on the people behind the SE to optimize them improving the speed… some day.
The use of "penalize" and "duplicates" is confusing. Generally, you don't get "penelized" by SEs. But you will get "un-desirable" serps. Once you settle with page topics (hence keywords), then write up with the topics. If some keywords giving confusion to SEs, then break them into un-recognizable pieces. I use <WBR>. But I found it's not pretty in serps. But it helps SEs to identify your more relevent pages.
Unless you do extreme SEO without real contents, you don't get "penalized". Don't worry about being penalized. Just focus on writting good quality pages that people will like and bookmark your pages.
Carlos Chacón
08-21-2005, 08:54 PM
... Just focus on writting good quality pages that people will like and bookmark your pages.
Exactly! That´s right!
Quality pages will make our site more relevant than others. People who just copy & paste the info from our site will get penalize some day soon... I hope.
Carlos Chacón
08-21-2005, 08:57 PM
...So where duplication is most tricky is actually where your content is also on someone else's site - either because it was sold or some other reason.
The answer - don't use content which is duplicated on someone else's site or don't let them take your content if it's that valuable. Odds are, you'll lose.
Yeah, no doubt!
But how you think we can safe our content from the "others"? Is any -All Rights Reserved- tool? :confused:
ohcho
08-21-2005, 09:57 PM
People who just copy & paste the info from our site will get penalize some day soon... I hope.
Why would anyone copy your pages? Note that SEs do not "penalize" for copying others! So you have to write good quality contents and optimize them so that they appear ahead of others. Look I am very curious about contents you are writting.
Carlos Chacón
08-21-2005, 11:07 PM
Why would anyone copy your pages?
Well maybe because they do not have good skills or knowledge about the topic they promote on their web sites. I’m not an excellent writer but I’ve seem some sites with a copy of the content that I wrote for some portals ... ranking over/under of my web site on the same page result! :mad:
The SE will have to do something sooner or later... no doubt!
ohcho
08-21-2005, 11:12 PM
Then this is more to do with copyright infringement. You should see a lawyer then.
Mikkel deMib Svendsen
08-22-2005, 02:16 AM
I have no idea why you keep stating that duplicate content is not a problem - it is, and thats a fact! If you don't want to deal with it, fine, but it is just not right that it is not a problem.
ohcho
08-22-2005, 02:40 AM
What Carlos is facing is not a problem of duplicates. It's a problem of plagerism. Someone copied his intellectual property which is his writing and using it to steal his visitors. If this happens between major businesses, lawyers will be battling each other. For small businesses, it's not practical to copyright for everything, it allows loopholes like this. What I can suggest is to develop new content which is better and better ranked, and more importantly get copyright. Then you will have legal protection.
I am aware many wrong-heade people do copy (or clone) other's web-site. You can see some of them from the following web-site. You may see cloning projects there;
www.getafreelancer.com
Carlos Chacón
08-22-2005, 07:43 PM
What Carlos is facing is not a problem of duplicates. It's a problem of plagerism.
Honestly I think duplicate content = plagiarism content
It doesn’t matter where the content comes from, it still being a simple copy/paste from another website. And this is a problem especially from the people who writes articles and edit content from one web site… and later, it is founded on another one! :eek:
ohcho
08-22-2005, 10:10 PM
Don't you have copyright statements in your web-pages?
Add some comments saying that the other side copied your writting, into your orginal contents. If SEs find that your site is duplicate of the other, they may think both are related and penalize both sites!
esoos
08-22-2005, 10:23 PM
Send those site that are duplicating your content an email. Give them two choices.
1 - Link back to your site with the anchor text of your choosing.
2 - Take down the content.
Let them know they have 1 week to comply with your request before you file a DMCA complaint with each of major search engines. They'll comply right quick.
That's the hard-ass approach. In reality, you'll probably want to be a bit more diplomatic, as there's no point in creating unnecessary enemies. You'll catch more flies with honey than vinegar. But copyright law puts a huge amount of power in your hands, if you know how to use it.
Andy AtkinsKruger
08-23-2005, 06:56 AM
Just for the record - NOT ALL duplicate content is plagiarism. It's often owned by the same company - who is promoting a presence in several countries. There are also organisations who give away of sell content for others to use (for example hotels) which then appears - legally - on many sites.
And then there are RSS feeds where you deliberately give away your content.
In my experience, duplicate content is often an internal problem - not a legal one. And often clients hadn't even realised they'd duplicated it in...
In order to protect you content - make sure you have a copyright statement of some sort on your site - and take action quickly when it occurs. Mostly it'll get fixed without the need for lawyers - but if all else fails....
Mikkel deMib Svendsen
08-23-2005, 09:29 AM
make sure you have a copyright statement of some sort on your site
Actually, there is no longer any legal requirement for having a copyright statement but I guess it dosn't exactly hurt to have it - and you'll still have the copright either way :)
Andy AtkinsKruger
08-23-2005, 09:45 AM
Absolutely right Mikkel - but it makes people aware that you're concerned about copyright - and you can say in there that you take action to protect your copyright. That will, of itself, prevent some of the abuse :)
Carlos Chacón
08-23-2005, 04:52 PM
Send those site that are duplicating your content an email. Give them two choices...
Thanks esoos,
I’ll send an email to this people and hope they understand the problem with duplicate content & the SE.
I know is hard to get any legal requirement for having a copyright statement on Internet, but at least we can fight for our rights.
Small countries without people knowledgably about SEO & SEM is hardest than working with professional marketers. :(
Carlos Chacón
08-23-2005, 04:56 PM
...In order to protect you content - make sure you have a copyright statement of some sort on your site - and take action quickly when it occurs. Mostly it'll get fixed without the need for lawyers - but if all else fails....
Hi Andy, thanks for your comments. I have a question:
What should be a copyright statement right text?
Maybe something like: "(© 2005 My site. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed)”
:confused:
esoos
08-23-2005, 06:00 PM
We're really talking about two separate issues here. I'll just discuss the copyright side.
That copyright statement looks fine to me, but the protection it gives you is somewhat limited. Registering your copyright with the US copyright office will give you much more comprehensive protection, should you have to go to court, including the ability to sue for attorney's fee plus statutory damages of up to $150,000 per infringement.
Protection of content under U.S. copyright law is not limited to U.S. citizens. From the U.S. Copyright office web site:
Copyright protection is available for all unpublished works, regardless of the nationality or domicile of the author.
Published works are eligible for copyright protection in the United States if any one of the following conditions is met:
On the date of first publication, one or more of the authors is a national or domiciliary of the United States, or is a national, domiciliary, or sovereign authority of a treaty party,* or is a stateless person wherever that person may be domiciled;
[...]
A treaty party is defined as "a country or intergovernmental organization other than the United States that is a party to an international agreement," in this case the Berne Convention (http://en.wikipedia.org/wiki/Berne_Convention_for_the_Protection_of_Literary_an d_Artistic_Works) should apply.
You can register your whole site for $30 USD. It's well worth it to have that extra firepower, should you need it, though the DMCA is usually enough to get most infringers quaking in their boots.
Of course, I'm not a lawyer, so don't consider this qualified legal advice. You may want to check out the GigaLaw Guide to Internet Law for details.
ohcho
08-23-2005, 06:25 PM
In any country, copyrighting costs very little. So if you to protect your contents, copyrighting is well worth the money. That's why I recommend to have copyrighted. When someone infringe on it, see your lawyer to get compensation for stealing your intellectual property and business. This is what I recommended from my previous posting.
Carlos Chacón
08-23-2005, 09:36 PM
Thanks to ohcho, esoos, Andy & Mikkel for your comments and suggestions.
I really appreciate that.
As soon I find a lawyer with some knowledge about this here (on my country), I´ll start working on.
:)
Marcia
08-23-2005, 09:53 PM
A personal issue with plagiarism and copyright infringement is up to the individual to deal with - Google doesn't get into that beyond the individual filing a DMCA.
Dealing with the duplicate content issue is a different matter, and it's one that's generally handled by the algo or by running filters.
Papadoc
08-25-2005, 10:20 AM
According to my experience, Google's approach is somewhat different than has been stated. If they've already indexed the article, it doesn't matter who has a higher PR. Seriously, would that make any sense? Why would they give credit for the article to whomever was most popular? If that were the case, then anyone who had a new site submitted their articles for inclusion in the free content sites would automatically be giving away their search engine rankings.
SEs also have another issue to deal with when assigning duplicate content penalties. In many cases, journalists will copy a section of an article they are referring to under the "fair use" clause of copyright in order to comment on it. This is totally legal so long as it is either a summary or part of a much larger article. Passing out dup penalties for something like this doesn't make sense. If a politician makes a statement and the NYT reprints part of it and comments on it, that's valid content.