View Full Version : Duplicate content issues
CaseyC
11-03-2005, 06:03 PM
I am a little confused about the issue of duplicate content. Our company makes a consistent practice of writing unique articles about our business and industry and then submitting them to places like PRweb and other article syndication websites. However, we also post the same stories on our websites to try and built out our content and capture different keywords. By doing this, are we risking getting penalized for duplicate content if other websites post our stories on their websites? Is there a smarter way to do this?
Thanks
bhartzer
11-03-2005, 06:15 PM
are we risking getting penalized for duplicate content
No, you're not. Just make sure that you post the article on your website first--and get it crawled faster than the other sites.
Google keeps the first one it crawls and then considers all the others to be duplicates. The most important factor here is getting the content crawled first.
bsaric
11-03-2005, 06:42 PM
My advice is that you write best possible articles and publish them on your site and then submit shorter and atractive version of this articles to other sites.
bhartzer
11-03-2005, 06:47 PM
There's really no need to change the articles--just make sure they're crawled on your site first.
orion
11-03-2005, 08:13 PM
Bhartzer is right. There is no need to over react to duplicated content stories as not all dupe content is viewed as bad content or as spam. The case of syndication or news and article is one.
There is a difference between duplicates, near duplicates and similar content. In a Google patent I reviewed at SES, San Jose this was discussed. For users the notion of duplicated content depends on what type of information they are searching for. This is why the Google dupe filter is called a query-dependent filter.
Unlike other dupe filters from other search engines, their filter defines dupes according to the relevancy of specific portions of linearized text in the document. This means that the entire content of a document is not required to assess whether or not two docs are dupes.
There is also the case of content-in-content. This is identical content placed in several other documents. Only because that content is replicated it does not work as a penalty for the host documents. In the case of Google, whether it triggers the shingles method used for detecting the dupes, again depends on the query.
Orion
CaseyC
11-04-2005, 11:34 AM
This is very helpful. So i guess moving forward we will put our article on our website a week or so in advance to make sure they get crawled first and then distribute them out.
How does the case os dupe content work when two sites are on the same server and have the some of the same articles. We operate a couple of websites that are hosted on the same server. They are not the same content through and through, but some of the articles we write are relevant to each site so we post them to both websites. Is this a bad idea?
orion
11-04-2005, 11:40 PM
This is very helpful. So i guess moving forward we will put our article on our website a week or so in advance to make sure they get crawled first and then distribute them out.
How does the case os dupe content work when two sites are on the same server and have the some of the same articles. We operate a couple of websites that are hosted on the same server. They are not the same content through and through, but some of the articles we write are relevant to each site so we post them to both websites. Is this a bad idea?
In recent patents, Google approach to filtering results is driven by users queries. While patent publication does not default to implementation it is a good idea to understand such patents.
In this regards, these describe query-dependent filters applied to the initial answer set of relevant and ranked results. Filters are then applied to this set to obtain a final answer set intended to be viewed by the end user.
One of such filters is the query-sensitive filters for dupes. Another can be found in the so-called "local rank". In this the initial answer set upon a query is checked to see if docs belonging to the same IP address are present in that set. If they do, then the least relevant are removed from the final answer set. At least that hows the patent goes.
There are other embodiments associated to the re-ranking of results patent based on link citation from within docs in that set.
More information on this re-ranking algorithm is here Ranking search results by reranking the results based on local inter-connectivity (http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=6,725,259.WKU.&OS=PN/6,725,259&RS=PN/6,725,259)
Orion
CaseyC
11-07-2005, 02:21 PM
I am sure you answered my question in that post, but that went way over my head and consequently I have no idea what that was about.