Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Other Google Issues
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 07-17-2006   #1
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
Question Could Google Sitemaps Help with Load Balancing Issues?

As a follow up to this interesting thread discussing the use of Robot.txt files to deal with Load Balancing issues, I have a specific problem I wish to pose to the esteemed SEW community:

We are working with a fairly well-trafficked site that uses a load balancer to send traffic to ww1 and ww2 versions. The main site is actually a collection of organic landing points for people that search the brand (and hopefully non-branded searches). The goal is to then prompt the input of a zip code, at which point the visitors will never again see the main version, unless they delete cookies. The problem we are having is with duplicate content that has been indexed. We have both ww1 and ww2 pages indexed, although mysteriously the ww2 has more pages in the index (almost all of them).

We want to clearly show Google and other search engines that we only have one version of the site, however that it is load balanced. Apart from creating a "Load Balancer Tag," which may be a good idea, what other ways can we explain to SE's that we only want one site indexed? If we use Robtots.txt as described in the other thread, we fear that we will still have duplicate content within the index. If we 301 redirect all the pages from ww1 to ww2, the load will no longer be “balanced.”

Is SiteMaps capable of fixing this, in your opinions? The idea is that we could do a Sitemaps account for each version, including the “raw” www version. We would place all pages on the site within the sitemaps file for the www version, and have separate sitemaps files for the ww1 and ww2 versions – could the system tell Google “stay out/don’t index” these? One fear we have is that if we then achieved rankings for the raw version, it would still lead to a redirect from the indexed listing, which could then be flagged as possible cloaking/other “shadyness.”

Help anyone?
Chris Boggs is offline   Reply With Quote
Old 07-17-2006   #2
JohnW
 
JohnW's Avatar
 
Join Date: Jun 2004
Location: Virginia Beach, VA.
Posts: 976
JohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud of
Chris I am not sure that SiteMaps would handle this. But I think in a properly configured load balancing situation these urls should not be a problem anyway. If you are using DNS shouldn’t all the backend servers urls be exactly the same? And if the load balancer uses reverse proxy instead of DNS you would just rewrite the urls. Is there something else that is missing here?
JohnW is offline   Reply With Quote
Old 07-18-2006   #3
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
Hi John, here is a reply from a busy engineer on our end...

Quote:
John W is correct in the assessment that there should never have been 2 separate URLs in the first place. The client is handling the load balancing in a very amateur fashion. Helping them to set up their environment to do as he says would ultimately be the best solution in the long run. However, as the Project Manager has mentioned before, this would entail a totally different kind of consulting that is out of scope (investigating DNS settings, proxy issues etc)

The main issue, in my opinion, that John's response doesnt assess, is that there is more than one version of many of their pages in the Google and Yahoo index. So we have to do something to signal to the engines to drop one version of the page - if the client can implement a 301 on one version to another version, then I think this would be a good start. What I noticed when examining page saturation for both URLs (ww1 and ww2) was that Google had not found both versions of every page - when this occurred, the page in question retained its baseline ranking versus those pages for which Google had found both versions.

I double checked Google sitemaps yesterday and couldnt find a way to utilize it for setting a "no index, no follow" on a domain - though if there is a way to do this it would very useful for us.

Let me know what you think.
Chris Boggs is offline   Reply With Quote
Old 07-18-2006   #4
evilgreenmonkey
 
evilgreenmonkey's Avatar
 
Join Date: Feb 2006
Location: London, UK
Posts: 703
evilgreenmonkey has much to be proud ofevilgreenmonkey has much to be proud ofevilgreenmonkey has much to be proud ofevilgreenmonkey has much to be proud ofevilgreenmonkey has much to be proud ofevilgreenmonkey has much to be proud ofevilgreenmonkey has much to be proud ofevilgreenmonkey has much to be proud ofevilgreenmonkey has much to be proud of
A major airline uses this kind of "load balancing" with catastrophic effects to their (no existant) natural search. You could find that the site can use basic DNS balancing such as Round Robin for distributing load or IP failover if the main issue is uptime.

I use this company which can do both:
http://www.zoneedit.com



Rob
evilgreenmonkey is offline   Reply With Quote
Old 07-18-2006   #5
JohnW
 
JohnW's Avatar
 
Join Date: Jun 2004
Location: Virginia Beach, VA.
Posts: 976
JohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud of
Chris, your engineer seems to get it. But have you considered that neither ww1 nor ww2 are optimal? I would probably recommend using 301s to redirect everything currently used over to www as a part of fixing this.

>use basic DNS balancing

We have seen problems with DNS primarily with the way this info is cached across the web and TTL issues. I would recommend a reverse-proxy with url rewrite as the better solution.
JohnW is offline   Reply With Quote
Old 07-19-2006   #6
g1smd
Member
 
Join Date: Jun 2006
Location: UK
Posts: 253
g1smd will become famous soon enough
Is it a scripted site?

Modify the script so that it checks the requested URL.

If the URL is www then add nothing.

If the URL is www1 or www2 then add the <meta name="robots" content="noindex"> tag to the page.

Eventually you will only have one set of URLs indexed.
g1smd is offline   Reply With Quote
Old 07-28-2006   #7
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
Here is our programmer's reply...as he mentioned, we will follow up and share the results of this test if it is implemented on the client end.

Quote:
I agree with John W - neither ww1 or ww2 are as optimal as www. Hopefully the client can handle 301 redirecting both ww1 and ww2 to the www. We will explore reverse-proxy with URL rewrite.

One concern with the suggestion from g1smd, is that both the ww1 and ww2 domains have attracted inbound links (Google reports 386 links to the ww2). So we are hoping that by setting up 301s from the ww1 and ww2 to the www, any value those links have will be passed along.

When they finally implement our recommendation, we will pass along the results.

Last edited by Chris Boggs : 08-01-2006 at 11:22 AM. Reason: typo
Chris Boggs is offline   Reply With Quote
Old 08-01-2006   #8
Opie
Ethan Giffin
 
Join Date: Aug 2004
Location: Baltimore, Maryland
Posts: 18
Opie is on a distinguished road
This drives me crazy anytime I see it, especially by major retailers who should know better (better yet they should hire me).

We use a "big ip" load balancer in front of our server farm it it works wonders. We have 1 external ip address per domain (or sub domain) that map back to multiple internal IP's. We then use host header mapping at the server level to tie is all together.

This gives the appearance of 1 website per domain, even though there are multiple servers serving content. No ww1,ww2 or ww6.

Depending on the budget (and scale of the site) there are several ways to apporach this. Some hardware/appliance, like we have and some software based with the reverse proxy with mod rewrite.

I would also 301 any of the current ww2 url's in google back to the www.

Any specific landing page content that does not belong on the main site I would push out to a subdomain if needed.

Ethan
Opie is offline   Reply With Quote
Old 08-01-2006   #9
SEOEgghead
 
Posts: n/a
I'm with Opie

Load-balancing via round-robin DNS is fraught with problems. I'd use an appliance whenever possible to obscure the fact that you're dealing with multiple servers. The other alternative is to redirect a spider from a mirror to the primary site, but that would technically be cloaking. I documented that here

But I admit the solution is not ideal, since users will inevitably still link to the mirrors. They'd get 301'd, though.
  Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off