PDA

View Full Version : search engine DEoptimization?


tomjohnson1492
11-15-2004, 11:42 PM
How can I make someone else's web page disappear? I'm trying to cover up some black p.r., and want to know whether the following technique work?:

1. copy the article on the ugly webpage in its entirety
2. buy a separate, unconnected domain from my real site
3. post the duplicate content on the new domain. when search engines crawl and find the same content on both domains, perhaps they'll recognize it as trying to trick them and will ban both.

I know this would violate copyright law, but would it work to cover black p.r.? any other suggestions?

Tom

Jeff Martin
11-16-2004, 01:39 AM
Well, IMO trying to hijack content is not what I would call a "best practice". You would need to have a higher PR than where the original content currently sits.

There are other alternatives, you could diversify your site(s) content, then apply a good round of SEO on them and then use them to enhance 3rd party favorable media as well.

I've created a product for this, rightly called, Search Engine Reputation Management. Look in my profile for the site for more information.

seomike
11-16-2004, 05:55 PM
The technique you mentioned can work "in theory". Instead of copying the actual article. just copy the entire source code, cloak it via IP delivery on a page that has a higher PR, then redirect users to a page that favors you :)

But I'd listen to Jeff's advice. my solution is more of a last ditch effort. :D

tomjohnson1492
11-16-2004, 06:22 PM
I thought I posted a reply, but I didn't see it, so here it goes again.

You're right -- hijacking should never be something I try. Not only is it unethical, if it backfired, things could become worse.

I actually stumbled across the idea by accident.

We are trying to create new web pages and press releases to increase the postive PR. In fact, we have been sending a few press releases over www.prweb.com, an electronic press release delivery service.

When we sent over a release last week, it worked great. We got first position on the google rankings when someone searched the name.

However, the next week we did another release, with different content but with the name still optimized. However, this time our web designer made a duplicate copy of the press release on our site. So readers could find the release on prweb.com, as well as on our site.

From what I understand, this is called "duplicating content across domains," and it is not the best thing to do.

Now when you search for the name, you don't get the release we just sent; you get the release as we put it on our site. The newly released prweb version disappeared, or dropped so far in ranks that you can't really find it. The former, first prweb release still remains, but not the second.

I think we should not duplicate content. Perhaps the search engine saw the twin, and chose the one it felt was more relevant?

However, my co-workers really want to post both. Am I correct in my suspicions about why the second prweb release disappeared? Is there a way to block robots from indexing a specific page? Is there something else awry here? Thanks,

Tom

Jeff Martin
11-16-2004, 06:33 PM
Am I correct in my suspicions about why the second prweb release disappeared? Is there a way to block robots from indexing a specific page? Is there something else awry here?

Yes, yes and yes.

The press release featured on your site more than likely had more PR than the press release on the service so the service version got penalized.

Use the "no index" meta tag for robots on your press release web page to keep the bots away. Or better yet make an entry in robots.txt file and disallow the whole directory that your releases reside.

No matter how much you optimize your site you are more than likely going to be limited to two spots in the top 10-30. You must diversify your content to allow you more control over the SERPs. The more property you control the more control you have for searches for your company or product name in the SERPs.

Jill Whalen
11-16-2004, 10:27 PM
The press release featured on your site more than likely had more PR than the press release on the service so the service version got penalized.

Sorry, but that's just not true.

You absolutely do NOT get penalized for that sort of duplicate content. Just because the engine shows one copy of it, or one copy before the other does not make it a penalty.

There are tons and tons of duplicate pages in the search engines for legitimate reasons such as this one. The engines don't penalize these. They only penalize stuff that is there to deceive them.

People need to get the whole "I'm penalized" thing out of their heads!

projectphp
11-16-2004, 10:44 PM
Not a penalty, but see Google's Snippet Patent (http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=6615209.WKU.&OS=PN/6615209&RS=PN/6615209) for why some content never shows.

strategicrankings
11-17-2004, 12:29 AM
Agree with you Jill.

I cann't understand how people constantly speak about penalty on duplicate content without clearly explaining what is duplicate content.

As you said there are thousands of duplicate articles/pr on the web that are never banned or penalised.

Duplicate content as seen by the engines IMHO is the exact copy of the source code of a page. Once you just simply change the description tag and the title tag and leave the rest intact, can we say that is still duplicate IMH no. "duplicate content penalty" is IMHO simply a basic optimization to prevent true duplicate content as anyone would do to prevent duplicating the same employee record in a employee database.

Otherwise in IMHO we won't have seen that Google is now indexing 8billion+ pages.

just my 2 cents

Jeff Martin
11-17-2004, 01:31 AM
People need to get the whole "I'm penalized" thing out of their heads!

With all due respect Jill, that’s your opinion. I along with several colleagues have seen examples of this happen to sites under our control. In controlled and uncontrolled environments. I base my opinion upon testing and factual data.

Calm down Jill. I’m here to pass on what I have learned and share information. I have no agenda.


I cann't understand how people constantly speak about penalty on duplicate content without clearly explaining what is duplicate content.

Tom didn’t need an explanation of what happened, he already knew. I put a label on it based upon observations made by several colleagues and myself over a period of time. Now he can test accordingly on his own to see what works best for him, which is sound advice before taking any opinions on message boards and putting them into action.

Peace! (And I do mean Peace)

mcanerin
11-17-2004, 01:53 AM
IMO, there IS a duplication penalty AND a duplication filter - and usually only one will be in effect.

The duplication filter is far and above the most common effect - If a particular article is the best possible result it would simply not make sense to fill the SERPs with the same article from many different sites - typically the one with the highest PR gets shown and the others dumped, though timestamps apparently are looked at, at well.

But having your article, or parts of it, (or news story, song lyrics, or historical quote) reprinted does not make you or the publisher a spammer - if anything, it can be a sign of accurate research and proper citation. This is not an occasion for a "penalty", and IMO and experience Google and the other majors don't look at it this way, either.

The duplication penalty is a whole other ball of wax. It's almost never applied and when it is it's typically applied with a fair amount of care, and usually towards multiple sites with duplicate content. You can find yourself banned or removed over this.

It also usually takes a while for it to happen - many people get away with it for several months, and even longer. I suspect this is because it is manually checked, but I don't know for sure.

In my experience, it's been pretty accurate - every client who has come to me after being banned really did have multiple sites, and other clients were never banned even though it took several months for Google to figure out that they had 2 domains parked on the same website. Yahoo still hasn't figured it out, in some cases.

And duplication is not detected at the source code level (too easy to change) - it's detected based on text snippets and other content based measurement techniques.

For example, Googles duplication filter uses the search criteria to detect duplicates, so it's possible for two sites with nearly identical content except, for example, a city name, to show up like this:

EXAMPLE

Page A and Page B - duplicate real estate content except for city name. Page A has higher PR.

Results (Google only):

Search for "real estate" will show only Page A
- content related to and surrounding "real estate" is basically identical > filter applied > Page A has higher PR so Page B disappears.

Search for "city A real estate" will show only Page A
- City B is not on the page and therefore is not compared for filtering

Search for "city B real estate" will show only Page B
- City A is not on the page and therefore is not compared for filtering

As a practical matter and rule of thumb, I've noticed that a duplicate page (or two) typically results in a filter, and a duplicate site (or substantial part of it) typically results in a penalty, with a heavy weighting towards the home page content.

Ian

Robert_Charlton
11-17-2004, 03:20 AM
I agree that there is a duplicate content filter. I've seen syndicated versions of an article drop out of the index until only one was left... generally the one with the highest PageRank... even though all versions stayed online.

But... in one area I monitor, an authoritative article comes up in Google's top ten three times for a one word search (roughly 900K results), on three different sites. The sites are in no way mirrors... they've all just published the same article.

Titles are identical on all three. The article if printed runs roughly a dozen pages, single spaced, and there's not much of a page template to speak of on any of the sites... they're all article text... so, in terms of text percentage, the pages are very similar. Two are almost identical, even in their formatting.

My guess is that these pages survive because they all have a lot of inbounds. Each page has received link validation from other good sources, and Google has kept them all.

I'm surmising that there's probably some sort of threshold, and that when a page has enough good inbounds and reaches that threshold, Google won't drop it as a dupe.

Back to the question about making the page disappear, you might try to get 10 different pages to rank above it, assuming we're talking about a specific, non-competitive term, like a person's or company's name.

randfish
11-17-2004, 03:23 AM
I have to agree - I have fallen victim to inadvertent content duplication that has harshly penalized my site. The results are so obvious that as soon as I change my page or delete the offending copy, my site rockets 10-20 positions up at Google.

My experience was at a directory that copied my homepage source code and re-published it on their site as a weird kind of re-direct so they don't link directly to my site... It was very odd, but as soon as I caught it, I changed my homepage significantly and have climbed back up the rankings from around #40 to #20 almost as quickly as Google re-spidered my new page.

Duplicate content may not affect all sites, but it can, and we should be warning people about it.

Mel
11-17-2004, 05:46 AM
IMO there are both duplicate content filters which Google runs at ranking time, and which only serves to filter from the results the lesser ranked on the duplicate serp snippets. This filter does not run over the entire page but is based on the snippets which Google extract from the page for the SERPs plus possibly the page title. This would seem to be an implementation of Googles Detecting Query-Specific Duplicate Documents patent (http://http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=6615209.WKU.&OS=PN/6615209&RS=PN/6615209)

I have also seen the effects of Googles duplicate content penalty drop whole sites from the index, but this seems to be the result an seperate program which is run periodically, probably across the entire index on Googles servers.

Jill, If you want more details of how duplicate or partially duplicate content is determined you might want to look at Googles Detecting duplicate and near duplicate files patent (http://http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=6,658,423.WKU.&OS=PN/6,658,423&RS=PN/6,658,423) which goes into methodology in too much detail to post here.