Lance Housley
09-04-2004, 10:47 AM
OK, I give up. I'm stumped. :confused:
Some search engines appear to map accented characters to the non-accented equivalent, and I decided to see what Google does about accented characters. For example, if you enter Bjork does it also search for pages where the O has been given its proper diaresis mark. Here are the results of my little test (and - yes - it was a very little test).
I searched for bjork and got 878000 results. I searched for björk (with the umlaut) and got 609000 results. I figured that searching for bjork OR björk should get between 878000 and 1487000 results, depending on the degree of overlap. I actually got 429000 - so where did the rest go?
I tried another.
I searched for "place de la liberte" and got 760 results. I searched for "place de la liberté" (with the accent) and got 11700 results. I figured that searching for "place de la liberte" OR "place de la liberté" should get between 11700 and 12460 results, again depending on the degree of overlap. I actually got 12800 - so where did the rest come from?
OK, so this could be a hiccup in Google's OR function - but it still leaves me wondering whether Google really does do anything about mapping non-English characters when used with an English interface. The Help pages haven't given me any clues - does anyone else know?
Some search engines appear to map accented characters to the non-accented equivalent, and I decided to see what Google does about accented characters. For example, if you enter Bjork does it also search for pages where the O has been given its proper diaresis mark. Here are the results of my little test (and - yes - it was a very little test).
I searched for bjork and got 878000 results. I searched for björk (with the umlaut) and got 609000 results. I figured that searching for bjork OR björk should get between 878000 and 1487000 results, depending on the degree of overlap. I actually got 429000 - so where did the rest go?
I tried another.
I searched for "place de la liberte" and got 760 results. I searched for "place de la liberté" (with the accent) and got 11700 results. I figured that searching for "place de la liberte" OR "place de la liberté" should get between 11700 and 12460 results, again depending on the degree of overlap. I actually got 12800 - so where did the rest come from?
OK, so this could be a hiccup in Google's OR function - but it still leaves me wondering whether Google really does do anything about mapping non-English characters when used with an English interface. The Help pages haven't given me any clues - does anyone else know?