Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 01-16-2006   #1
Jenstar
 
Jenstar's Avatar
 
Join Date: Jun 2004
Location: Starbucks!
Posts: 345
Jenstar is a glorious beacon of lightJenstar is a glorious beacon of lightJenstar is a glorious beacon of lightJenstar is a glorious beacon of lightJenstar is a glorious beacon of lightJenstar is a glorious beacon of light
Tracking Down Stolen / Duplicate Content Through Google

Last year I wrote an article here What to do when someone steals your original content. And I am often asked how to know if you are a victim of someone swiping your content and hurting your rankings. Here are some ways to figure that out.

First, take an article you have written, and look at the paragraphs that are in the second half of the article. Then take a sentence from the second half of one of those paragraphs. Then, plug it into Google, being sure to put quotes around it. You need to put the quotes around it because then Google will search for the exact phrase.

I have often been asked why not take something near the top, and why not a sentence from the beginning of the paragraph. One reason is sometimes people (or paid content writers who are trying to get away with stealing content while not making it too obvious) will rewrite the first paragraph or two, while leaving the rest. Or they will rewrite the first sentence of every paragraph, while leaving the rest of each paragraph untouched. And this will also help weed out some of the scraper results (which generally take from the beginning of the page rather than the end) which usually fall under fair use anyway.

Now, once you have plugged your sentence into Google, with any luck, you will see your own site there, and no others. But that unfortunately won't always happen. First, double check that the others you see are infact copies. If the results are showing up completely different articles that by some fluke happen to share the same phrase you searched for, pick another selection of text and try again. Next, you will want to see if the following shows up:

Quote:
In order to show you the most relevant results, we have omitted some entries very similar to the X already displayed.
If you like, you can repeat the search with the omitted results included.

If it does, you will want to click it and see what additional results you see. If you see any duplicates of your articles where the content was republished without your permission, save the page to your hard drive as well as any pertinent information you will need for sending out a C&D or DMCA.

There are some other tools you can use for checking for duplicate content. Copyscape is one of the more popular tools, which allows you to enter your URL and it will check for copies. However, it will return a lot of copies of scraped pages, and it may also return pages that are utlizing the same quotes as you are (if you are quoting excerpts from an article on a blog for example). They also have a paid version that will monitor and email you automatically if it finds any duplication.

Google Alerts is another handy tool. You can enter in sentences or phrases with quotes around it, and Google will alert you (once a day, as it happens or once a week) with any new results where that specific phrase or sentence appears. You are limited to ten queries if you do not use a Google account, though, so if you have many articles you want to keep an eye on, it is worth it to sign up for a Google account if you do not already have one. Many have had great success using this for finding stolen content in Google.

There are some third party tools, but Google Alerts is quite good for getting the job done for keeping on top of potential stolen content in Google.

Now, if you have found duplicate content and want it gone, all the information you need is here.

Last edited by Jenstar : 01-16-2006 at 05:19 PM.
Jenstar is offline   Reply With Quote
Old 01-17-2006   #2
dannysullivan
Editor, SearchEngineLand.com (Info, Great Columns & Daily Recap Of Search News!)
 
Join Date: May 2004
Location: Search Engine Land
Posts: 2,085
dannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud of
You can also enter your URL into blogsearch.google.com, then sort by date, then subscribe to a feed of that page. Sit back and sadly watch how many people assume that because you put out a feed, it's fine to reprint entire articles. I admit, if you put out a full text feed, that might be confusing to some. But if you put out a summary feed?
dannysullivan is offline   Reply With Quote
Old 01-17-2006   #3
vayapues
10 kinds of people in the world. Those who know binary numbers, and those who don't
 
Join Date: Jan 2006
Location: Salt Lake City
Posts: 322
vayapues is just really nicevayapues is just really nicevayapues is just really nicevayapues is just really nice
Wow!!! That is actually a little scary. I checked my biggest sites and am blown away be the hundreds of copies. Of course they are educational sites, and the copies are mainly on teachers and students home pages.

Well, Watcha gonna do : )

Fortunatly, all that I checked send a link back my way, helping my PR. Just goes to show you that the habit of using others content begins in elementry school. Teachers should probably do a better job of educating their students about the difference between using a source, and copying a source.
vayapues is offline   Reply With Quote
Old 01-17-2006   #4
Nacho
 
Nacho's Avatar
 
Join Date: Jun 2004
Location: La Jolla, CA
Posts: 1,382
Nacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to behold
Quote:
Originally Posted by vayapues
Well, Watcha gonna do : )
http://forums.searchenginewatch.com/...ead.php?t=6560
Nacho is offline   Reply With Quote
Old 01-17-2006   #5
Jenstar
 
Jenstar's Avatar
 
Join Date: Jun 2004
Location: Starbucks!
Posts: 345
Jenstar is a glorious beacon of lightJenstar is a glorious beacon of lightJenstar is a glorious beacon of lightJenstar is a glorious beacon of lightJenstar is a glorious beacon of lightJenstar is a glorious beacon of light
There are so many sites republishing RSS feeds right now that I'd be spending all my time on it if I was sending out C&Ds or DMCAs for them all. Right now, when it comes to my blog, I only get cranky if whoever republishes it does not make it clear where it is from. It is amazing the number of people who are republishing it, and making it appear that they wrote it, instead of me. Those are the ones I take a whip too
Jenstar is offline   Reply With Quote
Old 01-17-2006   #6
Scottie
In search of stuff...
 
Join Date: Jun 2004
Location: Columbia, SC
Posts: 45
Scottie has a spectacular aura aboutScottie has a spectacular aura about
Selling them on Ebay

There are also people out there scraping articles, removing all traces of the author and original publishing website and selling them as "Top Revenue SEO Sites" for $100 on Ebay.

I've recently had 2 shut down by their hosts and I'm working on getting Ebay to shut down the seller of this garbage.

It's not only SEO sites they are selling- they are scraping any number of article sites for content and selling themed sites for Adsense.

I don't know what the answer is, other than as Jen says, stay on top of it with Google Alerts and follow up where you aren't getting a link credit. It does take way too much time, but if you don't defend your copyright you will get to the point where the problem is too big and widespread to get under control.

Last edited by Scottie : 01-17-2006 at 06:24 PM. Reason: spelling!
Scottie is offline   Reply With Quote
Old 01-18-2006   #7
Chris_D
 
Chris_D's Avatar
 
Join Date: Jun 2004
Location: Sydney Australia
Posts: 1,099
Chris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud of
I had a chunk of text, about 100 words, taken from a hobby site of mine, and used in a 3rd party document which was published.

My unique content was 'quoted' in the document - but no source was attributed - despite there being nearly 30 other attributed references in the 20 page document.....

So what - happens every day - doesn't it?

The irony was it was done by a UNIVERSITY PROFESSOR and was published on a TRADEMARK LAWYER ASSOCIATION website!!

Plagarism - alive and well.....

Chris_D is offline   Reply With Quote
Old 01-18-2006   #8
Dianna
 
Posts: n/a
Plagiarism not limited to Web content

People steal content from PowerPoint presentations, printed articles, and even whole business plans. I attended a workshop at a conference and listened to the speaker spout content I had recently written for my client. Another person I know attended a conference and saw a speaker using slides from one of her PP presentations. One woman copied one of my articles word for word. When I asked her to take it off her site, that it was my copyrighted work, she told me where to get off. Unbelievable.
  Reply With Quote
Old 01-18-2006   #9
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
While there are always slow processes, such as filing a DMCA, the simplest way to deal with it is to track down the webhosting company.

I know Jen mentions this in the other thread, but just to try and be helpful on tracking down hosts:

There are a few tools available to get WHOIS info - my personal favourite today is this: http://www.dnsstuff.com/

as the "WHOIS lookup" tool is more comprehensive for obscure domains than others I've used.

Simply check the nameserver data, then run another WHOIS check on that if the domain doesn't resolve to a specific company.

Then send the webhost a *polite* e-mail pointing out the problem, and requesting that the host addresses the issue.
I, Brian is offline   Reply With Quote
Old 01-18-2006   #10
vayapues
10 kinds of people in the world. Those who know binary numbers, and those who don't
 
Join Date: Jan 2006
Location: Salt Lake City
Posts: 322
vayapues is just really nicevayapues is just really nicevayapues is just really nicevayapues is just really nice
In my case, the articles on my sites are for the express purpose of education, and are on topics relating to elementary school subjects. It would certainly be intresting to send little Johnny or sussie a C&D. : )

As long as they give me credit, I don't care so much.

If a spammer were doing it on the otherhand, that would really tick me off. Unfortunatly, they probably are, but with thousands of returned results, it is impossible for me to dig into it.
vayapues is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off