View Full Version : Search Engine scrapers
11-12-2004, 06:23 AM
Google makes a point in their terms of service (http://www.google.com/terms_of_service.html) makes a point of stating:
You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site.
However, when I search on my business names, I find a lot of sites where results from search engines have been "scraped" and reformated for the site.
If it means I get extra links and possible extra exposure to surfers then I have no real complaint (the ones that provide a modified link I find a little insulting).
However, how do search engines regard such pages? Are there any automated filters working against them? Are domains using scraping penalised?
How about legal action by Google where their own results are seen to be re-used as forbidden in their Terms of Service?
Judging by what I see Brian, there are literally hundreds of sites out there that think they can get good rankings by republishing SERPs, and they continue to be indexed and listed by search engines.
IMO this is a misguided endeavor, as seldom do such pages which republish these SERPs rank well but in many cases they do help me to rank well. :)
11-17-2004, 02:16 AM
In addition to the thousands of pages, there are hundreds of SEO tools that scrape Google, Yahoo!, etc. - hopefully they won't draw GG's ire too...
11-17-2004, 02:55 AM
You might also be seeing legitimate use of the G API - rather than illegit scraping...
though im sure at least half of it is suspect...
11-20-2004, 04:34 AM
And of course if you see some crufty/scraped/machine-generated page, you can always drop us an email to tell us about it.
11-20-2004, 04:46 AM
In one thread in webmasterworld.com I saw a poster mentioning about google detecting and penalizing these software generated scraped result pages, cant remember the URL of the thread, Dont know whether it is a manual penalty of automated penality,
I think google should have written an algo to detect these scraped directories, Google guy what do you say???
11-20-2004, 11:07 AM
GoogleGuy, what about all the webmaster tools that scrape results from your pages (rather than using the API)?
Also, does anyone know how Yahoo! & MSN feel about this?
11-20-2004, 12:01 PM
And of course if you see some crufty/scraped/machine-generated page, you can always drop us an email to tell us about it.That's a tactful way of putting it. :)
Yep, and there's loads of scraper crap - and crappy pseudo-directories too, out there running AdSense. They are without a shadow of a doubt totally worthless, strictly made_for_AdSense sites, with no mistaking that there isn't a shred of original content or original thought put into it - not one paragraph's worth.
They're just scraped listings, and some have long lists of every related keyword imaginable that they're targeting running down a column on all the pages, and gibberish text with headings and keywords pulled out of a database thrown in for density at the page top.
Also, does anyone know how Yahoo! & MSN feel about this?I think Yahoo takes about one or two business days to pull sites, from what I've seen.
Like I found some stuff a few weeks ago on a Saturday morning - not a scraped site; it was pretty complicated setup but was running PPC affiliate feeds, not AdSense. I spent a couple of hours checking them out, tracking down their backlinks, their other sites, etc., where the PPC adverts tracked to (surprisingly, to a well-known, well-reputed company) and hunting down other sites involved in the same.
I went back to take another look around Monday evening, and the page I'd found plus the entire site were gone from Yahoo's index by then.
IMO this is a misguided endeavor, as seldom do such pages which republish these SERPs rank well but in many cases they do help me to rank well. Mel, in some cases you can tell they're amateurs looking for a quick buck, and they more than likely got sold a bill of goods by some outfit that turned some cash selling them the software, telling them it would make them some easy money.
It isn't a clean, pretty business out there.
11-21-2004, 11:33 AM
I was actually asking, because I'm interested in perhaps keeping archive records online of about a dozen key search areas for a resource site.
The idea was to produce a reference for how the SERPs in different search engines have changed on perhaps a monthly business, with the change in rankings for individual sites producing potentially useful statistical data for basic research.
However, I'd rather not fall foul of any known penalties for scrapers, so I thought I'd best ask on the issue, and see whether there are known automated penalties involved.
Possibly will have to e-mail all search engines involved for permission, though.
11-21-2004, 11:58 AM
Don't 'know that I'd take a chance on that unless it's robots.txt excluded and won't show up in anyone's backlinks with their content on it, even if it's just their description and title.
It's mostly crash and burn domains using the scraped pages, but for an established site interested in longevity - unless it's 100% squeaky clean, beyond reproach, and can stand up under thorough scrutiny, it could be risky.
Some people get very irate when they find themselves on scraped pages, and if they finally get wound up enough about it, it's possible they'll write and whine to search quality about it - it has been known to happen. Sometimes I wonder if that's one of the reasons Google changed what they show and lets us see those pages, since they're not at all unfamiliar with webmaster-type mentality.