PDA

View Full Version : Google indexing a multilanguage shop


xidos
06-15-2005, 05:05 AM
Hello everyone,

Currently I'm almost ready to publish a new webstore and I've come up with the following SEO construction. I would appreciate if you could take a critical look at the plan and point out any flaws in it. In my experience these things tend to get penalized the easiest with Google, so therefore I'll focus on that. Any advice on other engines is very welcome as well.

The store is available in three languages english, french & dutch.
The actual multi-language php shop is 'preceded' by a lot of html content pages.

My plan is to put the dutch index.htm at the root. The english one at root/en/index.htm and the french one at root/fr/index.htm.
From there I would like to have the URL www.mydomain.nl to point to the root, the URL www.mydomain.fr to root/fr/ and URL www.mydomain.com to root/en/.

Of course the search engines that spider from the root (ie dutch version) should not be allowed to index the en & fr directories since that would result in severe duplicate content upon the indexing from the en & fr directories from the .com & .fr URL's.

If I block the root/en/ & root/fr/ directories in my robots.txt file only the dutch version will be indexed at the .nl URL, so far so good. Can I adapt robots.txt's for the english and french versions in such a way that engines entering through the french and english URL's (ie entering in root/en/ & root/fr/) will not refer to the root (dutch) robots.txt and will not index the dutch version and eachother? Or will google revert to the root robots.txt when indexing from let's say the french url (mydomain.fr = root/fr/)

Another small question, if I place guestbook pages in different languages in the three language directories which will load from the same database, this probably will trigger as duplicate content, right? Any advice on that?

Many thanks in advance

Wail
06-15-2005, 06:12 AM
I'm afraid you can't do this.

There's only one robots.txt which sits in root/robots.txt. Spiders don't identify themselves as "An English spider" or "A French spider" so it's not possible to only allow some language spiders into different parts of your site.

In fact, spiders don't look at languages at all. Spiders just rea what's on your site and the indexer then reads it.

I've seen a UK/US store which simply blocked all spiders from the US section of the site in order to avoid duplicate content.

Duplicate but different language content doesn't count as duplicate. Er. If that makes sense! You can have the same description for a product in English and French and that's not a duplicate.

It's also worth noting that you won't have any contented listed in search engines for example.fr if your French domain simply redirects to example.nl/fr/. The search engines will simply treat that as example.nl.

Make sure you use the right type of redirect (301 not a 302) in these redirects too.

My suggested solution is to have as much unique text on each language version of the site. You could have a set of 10 random English straplines/footers which only show on the English site, 10 random French straplines/footers which only show on the French site and 10 random Dutch straplines which only show on the Dutch site. If your content is one which Googe rates more highly if pages are kept up to date then you'll benefit again.

Mikkel deMib Svendsen
06-15-2005, 06:25 AM
Únless you have to time and resurces to promote 3 domains I would suggest that you stay with just one. Remember, that if you have 3 domains you really need 3 times as many links to make each domain equally popular and likely to rank. If you take all those links and focus them on one domain you are much more lilely to rank well - in any language.

Keeping the language versions in seperate directpries is a good idea - not the least from a usability point of view. Just make sure that it is easy for users to jump to another language version in case they end up on the wrong one. Because, as Wail said, there is only one set of spiders and one index. The local filtering is a layer on top of that.

Wail
06-15-2005, 06:28 AM
By the way, it's worth saying that you've done the right thing by picking one version of the site and sticking it in the root's index. That gives spiders a content rich and link filled page to read first.

The wrong thing to do - which many people do - is have a "choose your language" splash/entrance page.

xidos
06-15-2005, 06:56 AM
Thanks for all your replies.

I will decide on using dedicated language url's or just one later on. If I choose the former, I believe that theoretically it is possible to separate the three 'sites' by giving the language selection links (their only physical links with eachother) a rel="nofollow" tag. This depends however on all spiders acting upon this tag. Anyone has experience with this construction, do spiders comply with it?

Greetings

Wail
06-15-2005, 07:01 AM
Hiya,

I'm afraid the rel="nofollow" is a bit misleading. The evidence so far is that Google, Yahoo and MSN do follow the links! It's just that these search engines do not credit the link with any signifiance when it comes to search rankings.

This is hard to actually text. I've not be satisified with any of the attempts I've seen posted online. In truth it's very hard to stop a spider finding a URL if you're visiting it yourself and so even if the only (known) link to a page has a nofollow relationship attribute then I think the spider can still find the page another way (you might be running the Google toolbar, for example).

The best ways to stop spiders from indexing a page are through meta tags (which means the spiders find the pages, read the meta tags and then decide not to index the page (noindex) or not to follow any more links (nofollow)) or robots.txt which prevent the spiders from accessing the excluded pages in the first place.

xidos
06-15-2005, 07:39 AM
Ok, thanks for the advice