View Full Version : What to do when someone steals your original content
Jenstar
06-29-2005, 10:13 AM
There has been a lot of buzz about the DMCA the last couple of days, and a few are misunderstanding exactly how it is used. I have done a lot of research on duplicate content - and taking care of the problem when it happens - so I thought I would clear up getting infringed content off of websites and out of the search engine results :)
If you are a writer of original content, you are probably familiar with people stealing your content, and then facing the prospect of getting "dup'd out" of the Google search results as a result. And no, just because you have the higher PR, better rankings, older site, older copy of the article, etc, does not mean you are safe. I have had a high PR article that has been online for 3+ years get dup'd out of Google by a PR0 domain registered less than a month earlier. So it can happen to you.
My first method of choice when I find one of my articles stolen is sending out a Cease & Desist (commonly referred to as a C&D). You can find many C&D templates on the web, and just make the changes before sending. I usually state the content must be removed within 48 to 72 hours. And I send it to every single email address I can ferret up (whois, on site, extracted from forms, by doing a web search).
If that doesn't work, you can try and get the host to remove it next. It is against nearly all hosting company's terms to publisher copyrighted content, so you just need to find out how you can get the host to take action. Some will require just a C&D, while others will ask you to fill out a DMCA they provide. However, not all hosting companies are created equal, and not all do anything, particularly those located outside of the US. If it is a reseller hosting company, you can go up the reseller's food chain and try to take action from one of the parent hosting companies instead.
If neither of those methods work, it is then time to contact the search engines to get those listings removed. Some will do this as a first step, but I personally find it easier to go the C&D route first because I usually see the pages removed from the site within a day, and it is less hassle than doing the DMCA and mailing it in. The three big search engines all have their own pages about doing a DMCA:
Google (http://www.google.com/dmca.html)
Yahoo! (http://docs.yahoo.com/info/copyright/copyright.html)
MSN Search (http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_CONC_AboutDMCA.h tm)
Make sure you follow the instructions to the letter, or else you might need to resend it again.
Google takes about 10 days to remove the results after receiving a DMCA, and they also send a copy of the complaint (with your personal information removed) to ChillingEffects.org.
Some people have asked about the "send a friendly email to the webmaster nicely asking him/her to remove the pages" method that some do use. Personally, I only had about 10% success rate doing this, while the C&D method works nearly 100% of the time. So I just go straight to the C&D. Legalese with threat of monetary damages seems to scare them into action much quicker than a nice rosy "please remove my article" will :)
People have also asked about using this for scraper sites. Generally, they fall under "fair use" since they only use snippets, so it is not true copyright infringement. And one person tried using a DMCA to oust a scraper site, and it was denied. On the positive side though, they don't generally trip the duplicate content filter.
Others have asked if I have a lawyer do this, or if I do it myself... if I hired a lawyer everytime I found someone had stolen an article, I would be completely broke from legal fees by now!
Now, go hit Google and see if you are being dup'd out for any of the articles you have written, by doing a quick snippet search. I'll save the post about all my methods on "how to find copyright infringers" for another time ;)
And for the record, I am not a lawyer, this is just how I personally handle cases when I find someone has stolen an article (or ten!)
grnidone
06-29-2005, 02:06 PM
How do you prove that the content is actually yours? It seems like I could go through the steps just to maliciously remove someone's article.
Jenstar
06-29-2005, 03:44 PM
In my experience, it is very rare that someone has stolen an article I have just written, so I always back it up with a link to archive.org.
The DMCA does include the implications if you file one and are lying. Google's, for example, has ""I swear, under penalty of perjury, that the information in the notification is accurate and that I am the copyright owner or am authorized to act on behalf of the owner of an exclusive right that is allegedly infringed" in it.
You also have the option to counter the DMCA as well.
rustybrick
06-29-2005, 04:06 PM
My biggest problem is that I do not have the time or resources to track these puppies down. I know all the tricks to finding stolen content (I heard your presentations on this Jenstar).
Just, it still takes lots of time.
Someone out there want to build a program to automate this?
- Locate Content
- Email Offenders
- Track Emails
- Reminders
- Work flow and so on.
I'd pay...
grnidone
06-29-2005, 04:20 PM
You also have the option to counter the DMCA as well.
Apparently not with Yahoo!. (http://www.threadwatch.org/node/2983)
Daria_Goetsch
06-29-2005, 04:46 PM
How do you prove that the content is actually yours? It seems like I could go through the steps just to maliciously remove someone's article.
This may be part of the method that could be used to build a case against someone using your original material.
Use Copyscape search engine to find duplicate text used from your web pages:
http://www.copyscape.com/
Use the Wayback archive to show dates of original content as opposed to the other website:
http://www.archive.org/
john22
07-07-2005, 12:57 PM
Yea.. but these content stealers are starting to block copyscape.com. So that will be a problem until they come up with some way to get around it.
Chris Boggs
07-08-2005, 04:41 PM
Thanks jenstar for the very informative post. I often run Copyscape when initially reviewing a prospective client's site, but as john points out, copyscape can be blocked from accesing a site or page.
The best way I have found to tell if your content is duplicated (or if it is a duplication of someone else's as is surprisingly often the case when we start working with clients) is to cut and paste at least two sentences worth of each content-rich page and place it into the Google search box surrounded by parentheses.
Jenstar: what to do if you have given permission to some sites to post your article, and it clearly links to your site, but then outranks you for that content? I guess this is a "too bad, so sad" scenario?
Jenstar
07-08-2005, 04:48 PM
Jenstar: what to do if you have given permission to some sites to post your article, and it clearly links to your site, but then outranks you for that content? I guess this is a "too bad, so sad" scenario?
Pretty much the "too bad, so sad" scenario. If someone emailed me and asked me to remove an article they asked me to place on my site, because it was outranking their own site, I'd probably remove it but think "what a jerk." And you could bet I would probably never print one of his or her articles again, and I likely would never link to that person again. And I would probably remove all other instances of links to that person's site across my own network of sites as well. I did that person a favor by promoting him/her and giving a nice PR link, and then they want me to go out of my way to remove it because it performed too well?
Personally, I always rewrite articles I want "reprinted" on other sites, so I don't have to worry about being dup'd out. That way you get the link but you don't have to worry about the duplicate content issue at all.
Chris Boggs
07-08-2005, 05:08 PM
thanks jenstar that's a good thought. Are you going to divulge some of your dupe-content-finding secrets soon? I always like to cover that issue and would appreciate your insight on other techniques than the obvious cut/paste snipet.
gidgreen
07-09-2005, 02:02 AM
I'm writing here from Indigo Stream Technologies, who develop Copyscape.
A couple of people have mentioned in this thread that plagiarists are starting to block our service, but I believe this is an incorrect assertion. Copyscape's results are based on Google's web index, so unless a site has actually blocked Google from indexing their site, they can't block Copyscape from finding them. I suspect that the last thing a content thief wants is to stay out of Google's index.
What they _could_ do is block the full comparison page that Copyscape shows when you click through on a result, but that will not prevent the result coming up in the Copyscape search results page. So a webmaster can still go straight to the URL of the offending site from that page and see for themselves. Furthermore, Copyscape lets you retrieve the cached versions of pages, so that's another easy way to find the offending text.
Gideon Greenspan
Indigo Stream Technologies
Chris_D
07-09-2005, 03:16 AM
Hi gidgreen, welcome to the SEW Forums!
Thanks for the clarification on blocking Copyscape.
And keep up the good work - Copyscape alerted me to a plagarised content issue back in February - which was able to be resolved in a matter of days.
Chris Boggs
07-09-2005, 09:41 AM
I double that welcome, gidgreen. :)
Thanks for providing a great service. I feel that your FAQ (http://copyscape.com/faqs.php) answers most questions people may have about your service. The Copysentry service does sound valuable for those that are very interested in protecting their information. I appologize if I may have been misleading in my statement regarding the ability to block Copyscape, so thanks for clarifying that issue.
I have a slightly pointed question, however. In a recent discussion, a member discussed a site that they felt was using a Copyscape banner on their site, yet was simultaneously using "stolen" content. Although I must clarify that this was an unproven allegation, I wonder if you have any method of preventing content-thieves from presenting your banner. It says clearly on your banner page "Please only use these banners if your content is your own." (Now I guess SEW will come up in a Copyscape search of your own site :p ) But it seems that you cannot verify everyone is who using your banner is actually providing only unique content. Is this the case? I am not trying to knock you. I know this would require even more use of the high-capacity API Google has provided you (I am jealous of that). Is anything happening or in the works to ensure that those who use your banner are doing so "ethically?"
Thanks again!
gidgreen
07-09-2005, 04:43 PM
Thanks for your question Chris. The answer is that of course we cannot prevent someone from placing our banner on a page containing stolen content. Although in theory we could prevent requests to the banner images from certain referring domains, all a content thief would have to do is serve the banner locally, and they would achieve the same effect. Content thieves by definition are going to have no scruples about doing that.
Gideon
orion
08-24-2005, 09:54 PM
From time to time I find grad students copying my material. This is something that has worked for me and may work when the thieve works for someone else. Send a kind email to the person and his/her superiors and ask for either the removal or proper acknowledgment of authorship, including a link referencing the original source.
Two month ago some grad students from the University of San Francisco did this to me and I complaint very softy to his PHD advisor and the Dean. We immediately came to terms.
I just find out today that it happened again, this time with the University of Waterloo, Canada. Someone at UofW copied my material for a presentation/project for a professorhttp://www.cs.uwaterloo.ca/~obaysal/2005/cs886/Presentation/VSM_presentation.pdf from this page on Term Vector http://www.miislita.com/term-vector/term-vector-3.html
I complained to everyone at UofW (the person, faculty and the Dean) and let see what they will do. I take this as a real life test. I take in good faith their actions.
The point is that, copying is not unique to spammers or to the SEO industry, but even some grad students and some in other "reputable" profession may be tempted to do some copying. This is a systemic problem on the Web and we all are at risk.
Orion
orion
08-25-2005, 05:41 AM
WOW, that was fast.
I want to thank the University of Waterloo, their academic staff and the grad student involved for promptly and graciously contacting me, apologizing, correcting the paper and giving proper credit in their research project. They even agreed to put a link pointing back to Mi Islita and asked if they can do anything else for me.
You see? It only took couple of hours to resolve this matter and without lawyers. The lesson here is that done properly, we are all happy and nobody get hurts.
One good score for accountability.
Orion
Chris Boggs
08-25-2005, 11:01 AM
wow leave it to you Orion to find the perfect way to get inbound .edu's (that's almost a scheme! :p)
orion
08-25-2005, 01:04 PM
A problem on the Web is that some don't bother to visit and read Terms of Services (TOS) or copyright pages of web properties. In addition to these, sometimes is necessary an explicit snippet on each page about copy guidelines. I think I'm going to add a copy guideline snippet on each page using document.write and pointing to an encrypted external .js
I know, Chris, sounds like a plan :) but as you know I never ask for links in the reciprocal link sense nor I care for link building programs/link strategic alliances. Those that link to me is because they like my content. The reverse is also true. I link to sites only if I find them relevant or important to my visitors. I think I will place important .edu and authority links pointing to me in a single Educational Links page.
Orion
orion
08-25-2005, 01:48 PM
Thanks for your question Chris. The answer is that of course we cannot prevent someone from placing our banner on a page containing stolen content.
Gideon
A partial solution I used to use was to document.write the banner lines and link to this as an external and encrypted file with a domain block key and a surprise at the host level. For more sensitive data you can use DES or MD encryption both supported by JavaScript, though in this case you would need to do add few lines of extra coding for the browser to render the code.
While this prevent most users, there are some real brilliant deciphers that could try to take time and effort to do unencryption. Fortunately most of them don't seem to care about banners.
If you use asp, you can flush the temp files and cash with some asp lines so they will not use cached copies. You can also include disabling the copy/paste and control command but that creates usability issues.
I guess thieves wilI always find their way.
Orion
I, Brian
01-17-2006, 04:10 PM
The concern I have with filing a DMCA with search engines is that with Google, for example, you're supposed to enter keyword searches that lead to the unauthorised copy.
However, even if you can pull a few in, the fact of Longtail means that you're never going to be able to have a full list of keyword searches or Google to deal with - hence the unauthorised content remains.
Or have I misunderstood the procedure?
Either way, you *may* find the following links particularly useful when filing a copyright infringement under the DMCA:
http://www.chillingeffects.org/sending.cgi
http://www.harvard.edu/copyright.html
Panthere Noir
01-23-2006, 05:31 AM
for a very informative piece and in particular for the links to the DMCA links.
One question I have left is what to do when the stealer has no contact information on his website and tracing him through a domain name only leads to a name, without any further contact info, either, and the ISP is based in a country like Russia, and neither cooperatively inclined nor subject to American or even EU law?
I'm afraid it's pretty much a case of having to bear the existence of the site and with that, the likelyhood of it eventually turning up again on searchengines, even if they did first remove it on complaint, but if you have any ideas what could be done I'd much appreciate it if you could share them.
monkeywp
06-22-2006, 02:55 PM
If you use asp, you can flush the temp files and cash with some asp lines so they will not use cached copies. You can also include disabling the copy/paste and control command but that creates usability issues.
I was thinking about disabling copy/paste/right clicks for my sites, but your mentioning that it creates usability issues has me thinking twice. What kind of issues can come from disabling those functions?
STuart
meganm1524
07-31-2006, 06:24 AM
Someone recently asked my to write him 30 articles a week for 10 weeks, and I stupidly started in and did the first week with no deposit. He never paid, (no surprise), and now he won't respond to me. So i know his name is Peter Crump and I can find his articles in many places online, but I haven't seen him using any of my articles yet, but am worried he may. I'd like to use the articles to sell to others now so it's not a waste but am worried I'll end up being the one to get in trouble? I have my own site, contentresourcecenter.com and am hoping that nothing will happen. Any suggestions or advice? Thanks
MoneyElite
08-22-2006, 02:33 PM
This feature does not work!
I was thinking about disabling copy/paste/right clicks for my sites, but your mentioning that it creates usability issues has me thinking twice. What kind of issues can come from disabling those functions?
STuart
MacCallow
09-09-2006, 02:11 AM
Perhaps if the general 'responsible' SEO 'public' started to simply link to external pages using relevant keywords as opposed to copying them it would help... but the whole duplicate content issue is still a mind boggle - even to the movers-and-shakers at the San Jose SES to some degree... The algos are simply computer driven, and to verify anything is not so easy...not that this little non-gem helps mind you, but reality can be a hard pill.
I've been hit... so I wrote new copy - pain in the ass, but if you know your topic and your stuff, then do it. There are times to fight and times to let go... right now fighting the google dance as regards DC is just not really worth it - for some anyway... but for those with seriously legitimate issues, I'd be surprised if Google didn't rectify - you're probably big players and have some weight - so use it. Google is after all a service provider - and a lot more approachable in some instances than people may think.