PDA

View Full Version : domain to content association being lost


Kate
02-03-2008, 10:12 PM
I have about 50 local domains set up on the same servers, each one tied to one of my countries' content (.co.jp = Japan, .fr = France, etc.) I'm having this occasional recurring problem where the domain and the content become incorrectly associated in Google. So for example, just last week my .es domain was appearing with my Japan site content. Several particularlities:

1. I can never reproduce the problem on my own site, that is to say, when I visit the .es domain, it is in Spanish every time.

2. This has never happened on any engine but Google. It's happened 4 times now on Google with random site pairs, different each time.

3. The problem goes away on the subsequent crawl.

4. The problem always appears when the last crawl date is Thursday = our live release day.

5. The problem manifests itself with an incorrect language link and snippit, but the cached page is usually the correct language. For example, last week, .es was appearing for Spanish search terms as usual, but with Japanese title and snippit. However, the cached page was the normal Spanish one. At the same time, .es was appearing on google.jp for searches on Japanese terms. Neither site had noticeable ranking changes though, and the issue went away about 1 week after it appeared.

6. There is never anything odd in the Google webmaster tools. Also, the keyword data never says that .es domain was appearing for Japanese terms (only for Spanish ones).

Any ideas or channels for investigation you can suggest would be greatly appreciated. I'm going mad!

Jazajay
02-04-2008, 12:52 AM
Ok my 2p
1. How are your servers distributed?
Are they all in one place or in each country?

2. Other possibilty may be DNS I only know the basics but are they all set up correctly?

3. Have you tried changing the live release date per server ie do them on differnet days or is that not a possibilty?

4. Could you provide more information to how they are used and updated? IE do you have dynamic includes coming from one tracking routing to another etc... are they updated at once or seperatly?

5. Have you checked your headers being sent are propely set up, bit of a stretch.

6. Is it just one particular page? 2 pages? 3? or all of them - basically have you noticed a pattern.

AussieWebmaster
02-04-2008, 04:07 PM
do you have all sites in webmaster central

Kate
02-04-2008, 08:38 PM
Hi,
Thanks for reading the first post. I don't manage the servers myself, so I don't have all the details, but here are some answers to your questions from what I know:

1. All servers are in Boston in the same hosting facility. There are 3 of them with a load balancer to distribute between them. On the release day, one server is taken out of the LB, updated, then put back in and another taken out, etc. However, I think there is just 1 live DB, so updates to that happen all at once.

2. For DNS, all the domains are set up as CNAMEs of the primary .com domain. There is a DB that associates each domain to the correct local content. All content is actually available on the .com domain as well (all countries) but using IP geotargeting which is disabled for search engine user agents and anyone not accepting cookies. In the past we've had no issues with this setup (ie .com domain appears only in English on the search engines)

3. I don't think we can do releases on different days on the different servers. I've talked to the IT team about the problem of course and they "have no idea what could be happening" (of course). The association with live releases may be only circumstantial, since I only have a few examples of the problem happening.

4. I'm not sure about this question. The site is partially .net and partially ASP. The problem has only manifested itself on the homepages of the various domains, that is to say a .net page. This also could be circumstantial, since the homepages are crawled quite a lot more often, and also we would be much more likely to notice the problem on a homepage than on a subpage. The problem is only there for a few days at a time and doesn't seem to affect rankings, so unless you're doing manual searches, you don't see it.

5. I'm also not sure exactly what you mean about headers (you mean the html head section of the page?) When I view the pages on my own site, everything looks fine on-page and in the source (content is correctly associated to domain). When I view the cached version of the page on Google, also all looks fine (both on-page and in the source). It's just the search results that are wrong. On version A of google you have site A domain (correct) but with site B's title and description text (incorrect). On version B of google you have site A domain (incorrect) but site B's title and description text (correct). So it's like the two sites were merged into one, taking the domain from one and the text from the other, and then that monster site appears for all results (language A and language B). The cached pages though don't match what's in the results.

6. Yes, all sites are in webmaster central (Google you mean?). When the problem is appearing though on a given site, there is nothing unusual in webmaster central on either of the affected sites. Anything in particular you were thinking I could look for there?

thanks for your brainpower, if you have any ideas on what I can investigate further.

Jazajay
02-06-2008, 09:25 AM
To be honest I'm not 2 sure never heard of this situation to be honest but I can take a logical guess that you could investigate.
Ok
1. Could be the hosting company not doing what they are saying but if you trust them I would not count this as a possibility.

2. How is your Geo-targetting set up? How is it hid from G? This could be the issue to be honest. Unless you are doing it really really cleverly G will probably be getting caught in the "loops" to the different Ip's/sites.

3. Fair enough

4.Is the home pages the only recipient of the Geo-targetting? If it is you have your answer. IE do your other sites give links to those domain home pages only when GEO targetting kicks in or does the geo targetting give links to it's corresponding pages?

5. Arr sweet guess but no. When the browser requests a web page the web page will send a load of HTTP headers (not HTML) to let the browser know certain things? IF the page can use compression, what encoding type, server version that sort of thing.
Use web-sniffer dot net/ to get a run down of the headers being sent from a page.

My personnal suggestion is that it is your Geo-targetting. Which w/o seeing your code which to be honest I don't want to do :D. No offense as I know how mine is set up.
I cant say for sure. G can still come through a proxy thus hidding their user-agent therfore they could be getting the page. Does geo-targetting automatic re-direct to the right version or does it provide a hard HTML link?

Kate
02-07-2008, 02:03 AM
Hi,
On the geotargetting, we don't geotarget all the domains, only the .com. So basically if you come in on .es or .jp, you'll get the Spain or Japan content accordingly, regardless of who you are or where you are physically located. That makes the geotargetting theory a bit less promising to me, since I've only seen it on the local domains, never on .com. However, I'm open to all suggestions of course.

One thing about the local domains is if you do somehow manage to get a country cookie set on the local domain to something other than the country associated with the domain, the cookie will win. That basically shouldn't ever happen, and Google doesn't do cookies anyway, right?

The .com geotargetting works without links or redirects, so basically it's just showing different content on the same URL. It works across the whole site. For instance, if you're sitting in Spain and you visit mysite.com you'll see the Spanish homepage and if you're in Japan, you'll see the Japanese homepage. No URL change, no redirect, no links. Same applies to any page of the site.

We rely on you to accept cookies though for that to work, so we set a country cookie on that initial .com load. If you don't accept cookies, we'll send you to a 'pick your country' page, from which all links go to the local domains (.es, .jp, etc.) except for one link back to the .com homepage with the country element in the URL. That allows us to associate a country with your visit without the cookies, since it's in the domain. Without some country association, we can't decide what content to show, so the site won't work. The same function should also be preventing GG from seeing the .com site in any language but English. It works ok for the most part: most pages in the cache are in English on .com, and all pages on the various country domains are in the correct local language.

Except of course for the problem that brought me here in the first place...

Does any of that give any further clues?

Jazajay
02-07-2008, 02:41 AM
The same function should also be preventing GG from seeing the .com site in any language but English. It works ok for the most part: most pages in the cache are in English on .com,

This interested me.
You are saying that .com should only be english right but what do you mean by

It works ok for the most part: most pages in the cache are in English

Does that mean some of the .com are not english? That shouldn't be happening right?

By the way I like the solution it's very nice. I would still say it was the Geo-targetting as that is sending the different content, be it a re-direct or link - title, description if G goes through .com. Otherwise they would just get the normal content in the right language. The problems at .com it's the only variable the rest are constant from what I can see. Maybe some one can give you a more definitive answer as to be honest I haven't heard of this issue before. But I would say that was your best bet.

Stupid question but you have tested it in the other IP's right? As in view your site from those IP's and see what happens if cookies are off? Long shot but it may be an issue that you have not noticed. I had an issue not so long ago, the logic was sound but I just could not get it to work on one rare condition just a thought. Sorry I couldn't be much help.

Jaza

Kate
02-07-2008, 02:56 AM
thanks for the suggestions. It's gotten me thinking about other possibilities in any case.

About some .com showing not in English, sometimes people link to us using the country parameter hard coded into the URL. That screws things up for just that one page, since the engines can come in on the .com and see it in another language (query string parameter overrides everything else). However, it just lasts for that one page for the spiders, since they don't get cookied and when they follow any link to another page, they end up on the English content there. That's what I meant about "most" instead of "all".

thanks for the good discussion anyway.

Jazajay
02-07-2008, 03:00 AM
No worries I'll think on it.
If you come up with a suggestion let me know I hate problems that beat me. Plus it doesn't happen often :D
Take care
Jaza

Jazajay
02-09-2008, 01:06 PM
Hi Kate.
Could you give me the exact name of which load Balancer is used?
Also why are they taken out and updated and not all updated at once from the main DB?
Sorry just hate problems that beat me.
Jaza

Kate
02-13-2008, 09:54 PM
This is going beyond my technical knowledge unfortunately. I can find out which load balancer we use next week when I'm in London, but I don't think I'll be able to get a good answer on the release process and why it is the way it is. I'll give it a go though! More in a few days, once I've gotten to sit with some developers.

Jazajay
02-14-2008, 12:46 AM
No worries if it is the one I think it is I may have another lead for you to track down.
Let me know.
Jaza