Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 03-01-2006   #1
CaliforniaGirl
Member
 
Join Date: Sep 2005
Location: Sydney, Australia
Posts: 42
CaliforniaGirl is on a distinguished road
Verify my worry...

I have a client who has (had) a smallish site. In order to increase their content they created some city specific pages (1000's). However, the content of all the pages is the same except the city names have been changed. This is true with the titles as well.

They have no meta description or keywords tags in use on these pages so we are really just looking at the Title and the body content (approx. 50 words).

Google has picked up all the pages and the saturation has gone from 200 to 10,300. The pages are cached. Yahoo and MSN on the other hand, have yet to pick up these pages.

I have a few concerns:

1. Will this be considered duplicate content in anyway? It seems a possibility.
2. Might these pages be pushed into the supp index?
3. Is everything great and should I not be concerned?

Thanks for any advice,
CaliGirl
CaliforniaGirl is offline   Reply With Quote
Old 03-01-2006   #2
glengara
Member
 
Join Date: Nov 2004
Location: Done Leery
Posts: 1,118
glengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud of
I'd share your concerns, not just from the dupe content issue, but also from the artificial site inflation POV.
glengara is offline   Reply With Quote
Old 03-02-2006   #3
Black_Knight
Ancient SEO
 
Join Date: Jun 2004
Location: UK
Posts: 152
Black_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really nice
Hi CaliGirl, I think you are absolutely right to be very concerned about this. What you have described can only honestly be interpreted as search spam.
Quote:
Originally Posted by CaliforniaGirl
I have a client who has (had) a smallish site. In order to increase their content they created some city specific pages (1000's). However, the content of all the pages is the same except the city names have been changed. This is true with the titles as well.

They have no meta description or keywords tags in use on these pages so we are really just looking at the Title and the body content (approx. 50 words).
Let's look at the important categorizations in this explanation that point to it as being spam, and very likely to cause trouble.
1. Quality is zero. "the content of all the pages is the same except the city names have been changed".

They don't have to create a page for each city for this. They simply needed to add a list of locations to the one page that says the 50 words of content.

They chose to create more pages purely in the hopes this would have a benefit in search, without a thought for the costs in spidering. This really does have a huge impact on Google in the grand scale, and shows very short-sighted thinking.

It has to be punished (eventually), because there are limits to spidering.

Divide 8 billion pages by the number of seconds in a month and calculate how many pages per second Googlebot must grab just to refresh every page in the index just once in a month. Now factor in the sites that need to be spidered far more often. Let's say we double the number of pages needing to be grabbed once per month to factor in the news sites that need to be spidered at least once each hour, every hour, every day of that month.

Significantly above 6,000 pages per second?

Now bear in mind that calling more than 1 page a second from the same server is going to create problems where the server is already running anywhere close to capacity. Consider slower, processing intensive dynamic sites, or sites where transmitting some documents actually takes longer than 1 second. So to allow for that, how about we limit the spider to only making one request every 3 seconds from any one server.

Is the monumental headache of spidering starting to make sense? It is already a logistical nightmare. Now imagine that just 1% of those websites did what your client has foolishly decided to do, and artificially inflated their spidering demand from 200 page per month to over 10,200 pages per month (over 51 times as much spidering for no real new value) and you've just added 50% to the total spidering cost of the entire web. For no extra value in real terms. No new info exists, just the same content duplicated unnecessarily.

If you were Google, what would you do about it?

Quote:
Originally Posted by CaliforniaGirl
the saturation has gone from 200 to 10,300.
More than 51 times the demand for spidering, yet not a single bit of extra value in doing so. If I were creating an algorithm to prioritise spidering to ensure that it was done efficiently, then this would have decreased the priority of this site by the same amount. The site's spidering value is now just 2% of what it was before.

Quote:
Originally Posted by CaliforniaGirl
I have a few concerns:

1. Will this be considered duplicate content in anyway? It seems a possibility.
2. Might these pages be pushed into the supp index?
3. Is everything great and should I not be concerned?
Yes, or might even be considered deliberate spam, causing a complete ban.
Yes, or may even reduce the priority of spidering this site below the point where it is worth crawling at all.
No. Be very very concerned.

Last edited by Black_Knight : 03-02-2006 at 04:37 AM.
Black_Knight is offline   Reply With Quote
Old 03-02-2006   #4
tant
Member
 
Join Date: Oct 2005
Posts: 9
tant is on a distinguished road
Nice post Black_Knight.

I never looked at it like that before, I just saw Google as a bottomless pit with perhaps exception to the docID issue.

Parts of the jigsaw certainly start falling in place re depth of crawling and the sup index from that POV.

Thanks.
tant is offline   Reply With Quote
Old 03-02-2006   #5
CaliforniaGirl
Member
 
Join Date: Sep 2005
Location: Sydney, Australia
Posts: 42
CaliforniaGirl is on a distinguished road
Yes, thanks Black Night

That's a great repsonse and solidified my findings.

Thanks again,
CaliGirl
CaliforniaGirl is offline   Reply With Quote
Old 03-03-2006   #6
glengara
Member
 
Join Date: Nov 2004
Location: Done Leery
Posts: 1,118
glengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud of
Interesting (and unusual) POV BK, wonder if there's a connection with all the supplementals some large sites are having...
glengara is offline   Reply With Quote
Old 03-03-2006   #7
Galway
Say it how it is
 
Join Date: Feb 2006
Location: Colchester - Essex - UK
Posts: 6
Galway is an unknown quantity at this point
The answer is simple, ALL the pages must go immediately, the real issue is not what to do about the pages that are there its about what to do when they are gone.

Its long been known that websites spam the index in this way, the sort of technique used by Cali has been used since 2000 and was seen as spam then.

For what its worth the damage will have been done already, the question Cali needs to ask is how do you undo the damage that has been done. In my opinion the whole wesite integrity has been crashed, the spam has been implemented bang in the middle of Google's most vociferous war on duplication.

Its about as big an own goal as you could score, the problem now is that it may take some time for the real extent of the damage to be seen. Websites don't always just plummet, I have watched many a gradual but terminal decline where this has been done.

The way forward is the immediate removal of the pages but then the issue moves on to what you implement for the 10,000+ error pages. I think this is the question cali needs answering most. from then it is a question of giving Google some good reasons to keep your status in their index.
Galway is offline   Reply With Quote
Old 03-03-2006   #8
Black_Knight
Ancient SEO
 
Join Date: Jun 2004
Location: UK
Posts: 152
Black_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really nice
Quote:
Originally Posted by Galway
Its long been known that websites spam the index in this way, the sort of technique used by Cali has been used since 2000 and was seen as spam then.
I think you have misunderstood. CaliGirl is the one who is concerned about what a client has done, not the one who did, or advocated, this technique.


CaliGirl and tant, happy to have been able to help.


Glengara, I guess it may be an unusual POV - but it really shouldn't be. The SEO is at heart the mediator in finding an arrangement that satisfies the three parties involved in each search: The site that wants to rank for a certain query; the searcher who makes that query; and the Search engine that wants to give the searcher a good enough experience to make them a loyal customer.

Every SEO should have an understanding of the issues that a search engine must wrestle with, the compromises and solutions they create for those issues, and how that affects future developments and the future SERPs you hope to position sites within. If something is an issue for the engines, it is something they must adapt to beat, or they stagnate. Either way, it is absolutely certain to affect the future of any SEO campaigns for that engine.
Black_Knight is offline   Reply With Quote
Old 03-03-2006   #9
AussieWebmaster
Forums Editor, SearchEngineWatch
 
AussieWebmaster's Avatar
 
Join Date: Jun 2004
Location: NYC
Posts: 8,154
AussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant future
Good to see you back BK....
AussieWebmaster is offline   Reply With Quote
Old 03-03-2006   #10
Black_Knight
Ancient SEO
 
Join Date: Jun 2004
Location: UK
Posts: 152
Black_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really nice
Thanks Aussie. Always nice to be able to help folks with their worries.
Black_Knight is offline   Reply With Quote
Old 03-04-2006   #11
glengara
Member
 
Join Date: Nov 2004
Location: Done Leery
Posts: 1,118
glengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud of
Have too admit spidering isn't something I give much thought to, last thing I can remember on it was GG suggesting we make use of the "last modified" thing... ;-)

So this present supplemental/Big Daddy thing apart, are you expecting to see an increased culling of pages into the supplemental index?
glengara is offline   Reply With Quote
Old 03-04-2006   #12
mick g
Member
 
Join Date: Sep 2004
Posts: 126
mick g is a jewel in the roughmick g is a jewel in the roughmick g is a jewel in the rough
Do you mean the "304 not modified" glengara ?
mick g is offline   Reply With Quote
Old 03-04-2006   #13
Black_Knight
Ancient SEO
 
Join Date: Jun 2004
Location: UK
Posts: 152
Black_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really nice
Quote:
Originally Posted by glengara
So this present supplemental/Big Daddy thing apart, are you expecting to see an increased culling of pages into the supplemental index?
Yes and no. I think this may be the real reason for the Google SiteMaps initiative though - to reduce the need for spidering to find documents.

Just as an idea, how about if you made a way for webmasters to submit a whole batch of pages, and then just used random sampling rather than full spidering to determine quality control...

Sitemaps just might be one way of overcoming the fact that spidering is a serious logistical problem and finding a way to reduce the overall problem with scaling it up much further. Purest speculation of course.
Black_Knight is offline   Reply With Quote
Old 03-04-2006   #14
glengara
Member
 
Join Date: Nov 2004
Location: Done Leery
Posts: 1,118
glengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud of
At the time I just checked with my host MickG, here's the original thread:

http://www.webmasterworld.com/forum3/6005.htm
glengara is offline   Reply With Quote
Old 03-04-2006   #15
mick g
Member
 
Join Date: Sep 2004
Posts: 126
mick g is a jewel in the roughmick g is a jewel in the roughmick g is a jewel in the rough
yeah I meant to say "If Modified Since" instead of "not modified"

do you feel by implementing this googlebot has spidered deeper into your site and still visits as often because a 304 request is a not modified page where as a 200 OK request would be a fresh page and seeing googlebot likes sites with fresh content a page will be requested more often if its served as a 200 but not as often if its a 304 not modified

just a thought ?
mick g is offline   Reply With Quote
Old 03-04-2006   #16
Black_Knight
Ancient SEO
 
Join Date: Jun 2004
Location: UK
Posts: 152
Black_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really niceBlack_Knight is just really nice
I believe Google looks for new URLs to get the 'fresh' bonus, because it was built simply because new pages won't have much PR, meaning older documents would outrank newer (updated) versions.

Google's liking for fresh content other than in new urls seems to be connected to link spikes etc.

Nope, the last real benefits I fathomed for the 'modified' headers were both about reporting a page to be unmodified - either to reduce data transfer in wasted extra spidering, or for the blackhats, a possible way to make bait-n-switch pages last a bit longer.
Black_Knight is offline   Reply With Quote
Old 03-05-2006   #17
SEOBrains
SEOBrains aka Bob Rains
 
Join Date: Nov 2005
Location: Boston
Posts: 14
SEOBrains is on a distinguished road
Thumbs up SEO no no...

Before I had a better understanding of SEO an SEO firm I hired did the same thing for my product pages.

Sure we improved rankings but we lost conversions. There is so much more to providing a quality user experience that when focused on can help your rankings as well as conversion.

I've since spent budget and energy writing quality content, and optimizing the page for conversions not ranking, but guess what...

My rank is up higher from the new content and reoccurring updates than it was from the thousands of pages of duplicate spam content.
SEOBrains is offline   Reply With Quote
Old 03-05-2006   #18
CaliforniaGirl
Member
 
Join Date: Sep 2005
Location: Sydney, Australia
Posts: 42
CaliforniaGirl is on a distinguished road
Update

Thanks Black Night for clarifying ..... You are correct - it is a client, not me, doing this.

I thought I'd give an update on the situation. I have advised that this could be considered spam. That it is duplicated or near duplicated content. That is is artificial site inflation. That they also now have also artificially inflated their link popularity due to the internal linking on the pages.

The response - they understand there is a risk (penalty, bans, filters). However, another site within the same company have done the same thing and so far so good, so maybe it will be alright. They are worried but they are going to leave the pages up for now and keep an eye on it.

I will certainly post the results of their "experiement" in the coming days.

CaliGirl
CaliforniaGirl is offline   Reply With Quote
Old 03-05-2006   #19
mcanerin
 
mcanerin's Avatar
 
Join Date: Jun 2004
Location: Calgary, Alberta, Canada
Posts: 1,564
mcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond repute
Quote:
However, another site within the same company have done the same thing and so far so good, so maybe it will be alright.
<sigh> I know this isn't your fault - short sighted clients are the bane of good SEO.

Joke: Man jumps off a building. While plummeting to the ground, he passes a 2nd floor window and is heard to exclaim "So far, so good!"

The problem with monkey-see-monkey-do SEO is that many times a site is ranking well IN SPITE of some of the things they are doing, not because of them.

If your sales clients need an SEO like yourself, then I would respectfully suggest that they simply don't know enough about SEO to make that decision, particularly when it's overriding the SEO they hired.

Since you can't save them from themselves, it's time to let Darwin do the rest of the job and for you to begin damage control.

GET THAT DECISION IN WRITING!!!

Make absolutely certain that you have, in writing, instructions from them that this is against your advice and that they personally accept the risk.

I'll tell you right now from experience that when things go wrong (and they will go wrong), suddenly all that "we understand and accept the risks" talk will be forgotten and the fingers will be pointing towards YOU. After all, it's your job, not theirs, and they were just joking anyway, and you probably misunderstood them, etc, etc.

Cover your butt. Trust me on this. If they are that confident, then they will have no problem taking all the credit for this brilliant and effective maneuver.

If they won't put it in writing (which is my guess - want to take a bet on it?), consider working somewhere else while you are still on good terms and before it hits the fan. Get your reference in writing too.

My personal opinion,

Ian
__________________
International SEO
mcanerin is offline   Reply With Quote
Old 03-05-2006   #20
CaliforniaGirl
Member
 
Join Date: Sep 2005
Location: Sydney, Australia
Posts: 42
CaliforniaGirl is on a distinguished road
Advice welcome

Thanks Ian, your advice is always good.

May I ask - if I go to them and convince them to take the pages down what will the reprecussions be in the engines aside from the obvious like saturation and link popularity going down?

They have already stated that it is too much work to customise the content, title, decsription for all of the pages so I know they won't do that.

CaliGirl

I am also getting some agreement together in case they do leave up the pages in the end.
CaliforniaGirl is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off