Special thanks to:
|
#1
|
||||
|
||||
|
As a follow up to this interesting thread discussing the use of Robot.txt files to deal with Load Balancing issues, I have a specific problem I wish to pose to the esteemed SEW community:
We are working with a fairly well-trafficked site that uses a load balancer to send traffic to ww1 and ww2 versions. The main site is actually a collection of organic landing points for people that search the brand (and hopefully non-branded searches). The goal is to then prompt the input of a zip code, at which point the visitors will never again see the main version, unless they delete cookies. The problem we are having is with duplicate content that has been indexed. We have both ww1 and ww2 pages indexed, although mysteriously the ww2 has more pages in the index (almost all of them). We want to clearly show Google and other search engines that we only have one version of the site, however that it is load balanced. Apart from creating a "Load Balancer Tag," which may be a good idea, what other ways can we explain to SE's that we only want one site indexed? If we use Robtots.txt as described in the other thread, we fear that we will still have duplicate content within the index. If we 301 redirect all the pages from ww1 to ww2, the load will no longer be “balanced.” Is SiteMaps capable of fixing this, in your opinions? The idea is that we could do a Sitemaps account for each version, including the “raw” www version. We would place all pages on the site within the sitemaps file for the www version, and have separate sitemaps files for the ww1 and ww2 versions – could the system tell Google “stay out/don’t index” these? One fear we have is that if we then achieved rankings for the raw version, it would still lead to a redirect from the indexed listing, which could then be flagged as possible cloaking/other “shadyness.” Help anyone? |
|
#2
|
||||
|
||||
|
Chris I am not sure that SiteMaps would handle this. But I think in a properly configured load balancing situation these urls should not be a problem anyway. If you are using DNS shouldn’t all the backend servers urls be exactly the same? And if the load balancer uses reverse proxy instead of DNS you would just rewrite the urls. Is there something else that is missing here?
|
|
#3
|
||||
|
||||
|
Hi John, here is a reply from a busy engineer on our end...
Quote:
|
|
#4
|
||||
|
||||
|
A major airline uses this kind of "load balancing" with catastrophic effects to their (no existant) natural search. You could find that the site can use basic DNS balancing such as Round Robin for distributing load or IP failover if the main issue is uptime.
I use this company which can do both: http://www.zoneedit.com ![]() Rob |
|
#5
|
||||
|
||||
|
Chris, your engineer seems to get it. But have you considered that neither ww1 nor ww2 are optimal? I would probably recommend using 301s to redirect everything currently used over to www as a part of fixing this.
>use basic DNS balancing We have seen problems with DNS primarily with the way this info is cached across the web and TTL issues. I would recommend a reverse-proxy with url rewrite as the better solution. |
|
#6
|
|||
|
|||
|
Is it a scripted site?
Modify the script so that it checks the requested URL. If the URL is www then add nothing. If the URL is www1 or www2 then add the <meta name="robots" content="noindex"> tag to the page. Eventually you will only have one set of URLs indexed. |
|
#7
|
||||
|
||||
|
Here is our programmer's reply...as he mentioned, we will follow up and share the results of this test if it is implemented on the client end.
Quote:
Last edited by Chris Boggs : 08-01-2006 at 11:22 AM. Reason: typo |
|
#8
|
|||
|
|||
|
This drives me crazy anytime I see it, especially by major retailers who should know better (better yet they should hire me).
We use a "big ip" load balancer in front of our server farm it it works wonders. We have 1 external ip address per domain (or sub domain) that map back to multiple internal IP's. We then use host header mapping at the server level to tie is all together. This gives the appearance of 1 website per domain, even though there are multiple servers serving content. No ww1,ww2 or ww6. Depending on the budget (and scale of the site) there are several ways to apporach this. Some hardware/appliance, like we have and some software based with the reverse proxy with mod rewrite. I would also 301 any of the current ww2 url's in google back to the www. Any specific landing page content that does not belong on the main site I would push out to a subdomain if needed. Ethan |
|
#9
|
|||
|
|||
|
I'm with Opie
Load-balancing via round-robin DNS is fraught with problems. I'd use an appliance whenever possible to obscure the fact that you're dealing with multiple servers. The other alternative is to redirect a spider from a mirror to the primary site, but that would technically be cloaking. I documented that here
But I admit the solution is not ideal, since users will inevitably still link to the mirrors. They'd get 301'd, though. |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|