View Full Version : the character " | "
newreality
11-17-2004, 11:31 PM
Is it true that search engines don't especially like the character "|" when placed among hyperlinks or meta titles.
I'd like to bring this onto my pages in these ways. But recall hearing some chatter to this effect before.
Is it true?
mcanerin
11-17-2004, 11:33 PM
Nope - not true :)
Ian
Marcia
11-18-2004, 08:27 AM
I just recently tested it in a page title - no problem at all.
newreality
11-18-2004, 09:15 AM
I just came across some notes I had on ASCII hieracrchy. (dated 02')
Noticed that "|" sits next to the last on the last.
Then, to rephrase, would it be better to go by ASCII heirarchy and replace it with perhps "-" ?
Is ASCII still relevant?
orion
11-18-2004, 11:42 AM
Hi, all.
The use of pipes and other delimiters in connection with rankings, relevancy, and search engines is already discussed in details in these threads
http://forums.searchenginewatch.com/showthread.php?t=2163&highlight=pipes
http://forums.searchenginewatch.com/showthread.php?t=48&highlight=pipes
http://forums.searchenginewatch.com/showthread.php?t=700&highlight=pipes
Orion
mcanerin
11-18-2004, 11:44 AM
You mean something like this article?
http://www.xponsewebs.com/topic_7_246.html
The ASCII Hierarchy:
1. space 13. , 25. B 37. N 49. Z 61. f 73. r 85. ~
2. ! 14. -26. C 38. O 50. [ 62. g 74. s
3. " 15. . 27. D 39. P 51. \ 63. h 75. t
4. # 16. / 28. E 40. Q 52. ] 64. i 76. u
5. $ 17. : 29. F 41. R 53. ^ 65. j 77. v
6. % 18. ; 30. G 42. S 54. _ 66. k 78. w
7. & 19. < 31. H 43. T 55. ' 67. l 79. x
8. ' 20. = 32. I 44. U 56. a 68. m 80. y
9. ( 21. > 33. J 45. V 57. b 69. n 81. z
10. ) 22. ? 34. K 46. W 58. c 70. o 82. {
11. * 23. @ 35. L 47. X 59. d 71. p 83. |
12. + 24. A 36. M 48. Y 60. e 72. q 84. }Nope. First and most obviously, search engines don't produce results based on alphabetical rankings, so it's directly wrong there. The author you are reading is apparently confusing a directory with a search engine. To be fair, the author I'm quoting above is not confused and the post is in regard to Yahoo directory listings only.
Second, some directories do sort alphabetically, but usually filter out leading ASCII characters that are not A-Z.
I suppose if you had 2 competing titles that were completely identical in all other aspects and were only interested in directory lists, then this might be helpful, but in that case why not just start all your titles with AAAAAA1 Towing Ltd, or whatever? Better yet, use a space instead of any character at all, since it's the first character in the heirarchy.
If you and a competitor both were using "buy viagra" as your titles, a directory that uses alphabetizing would list them this order:
#1 buy viagra
1buy viagra
AAA buy viagra
buy viagra
buy-viagra
buy|viagra
As I said, it makes no difference to search engines though. Many people do name their companies AArdvark Webites, A1 Towing or $ales Associates due to this effect on directories.
Personally, I consider it kind of cheesy, but there is certainly nothing wrong with it.
Ian
GoogleGuy
11-20-2004, 04:32 AM
I got no problem with | in hyperlinks or meta titles. Quick tip: on Google, you can use "|" to mean "OR" in a search. Programmers will understand why. So you can do a search like
license "windows xp"|windowsxp
and it will search for pages that mention license and have either of the windows words, for example.
newreality
11-20-2004, 10:08 AM
mcanerin - those article values looks very similiar or identical to my source.
I am concerned about directories. But the '|' pipe is mainly to be placed within the body of the index page, to seperate main hyperlinks. Not used in the title or meta.
Didn't know if Yahoo had "written out" these rules in their formula.
orion
11-20-2004, 11:05 AM
Happy to have you at this thread, GoogleGuy.
Indeed, this test confirms that pipes in queries append results as in the OR mode.
car OR insurance = 47,200,000
car|insurance = 47,200,000
car | insurance = 47,200,000
“car OR insurance” = 47,200,000
“car|insurance” = 47,200,000
“car | insurance” = 47,200,000
We still can observe minor inaccuracies with the OR mode.
In Google Inconsistencies (http://www.searchengineshowdown.com/features/google/inconsistent.shtml), Greg Notess reports
“Google introduced the OR operator in Oct. 2000, a welcome addition when searching terms with synonyms. Unfortunately, the OR does not always work properly. There have been problems since at least Nov. 2002. One example of the problem reported to me was a search for kabul OR kaboul OR kaboel found less results than just kabul OR kaboul even though it should have found more. Status: Ongoing “
End of the quote.
We confirmed these findings. A test today gives
kabul OR kaboul OR kaboel = 1,690,000
kabul OR kaboul = 1,710,000
Same results are obtained with pipes.
Orion
PS. In this case we obtain a difference of 20,000 results. In the following example we obtain a difference of 700,000 results
car|insurance|auto = 46,500,000
car|insurance = 47,200,000
compare with
car|insurance|auto|quotes = 45,800,000
orion
11-20-2004, 12:03 PM
More Google inconsistencies with pipes
cats = 25,100,000
dogs = 35,900,000
cats|dogs = 16,300,000
dogs|cats = 16,500,000
Orion
Marcia
11-20-2004, 12:18 PM
With Google, you can do the same search consecutively, refreshing the browser in between, and come up with different results for the identical search. That's probably attributable to hitting different data centers - which has been seen to happen with two computers in the same room doing the same search and getting different results.
I think the trick is to try the different ways to search making sure it's all at the same data center, by IP number.
Joseph Morin
11-20-2004, 01:05 PM
I use it frequently but for different reasons, I think it looks cleaner for separating subjects within a title tag and yields a higher click conversion as the differentiation draws the end users eye to the search result, causing better click through.
Marcia
11-20-2004, 02:05 PM
I've always used it to separate links in site bottom navigation becaue it's user friendly by differentiating between them clearly. I also use a double colon as a separator there where I want a more delicate look.
I think the only question is whether it would interfere with picking up words for making phrases from both sides of the pipe, but it hasn't interfered witih the recent page title.
orion
11-20-2004, 02:32 PM
With Google, you can do the same search consecutively, refreshing the browser in between, and come up with different results for the identical search. That's probably attributable to hitting different data centers - which has been seen to happen with two computers in the same room doing the same search and getting different results.
I think the trick is to try the different ways to search making sure it's all at the same data center, by IP number.
Hi, Marcia
True that refreshing or even querying the same term(s) could produce minor differences. Based on testing, I’m inclined to say that there is more to their OR implementations
1. Inconsistencies with Google’s OR have been observed for several years now. See
http://searchengineshowdown.com/features/google/googleboolean.html
http://searchengineshowdown.com/features/google/review.html
2. OR is also known as the ANY mode http://searchenginewatch.com/facts/article.php/2155991
The system should return documents containing any of the specified query terms. Thus, terms transposition should not change total number of results. At least that’s what the theory says, unless the implementation has flaws or at the time of testing the db is being upgraded.
Teoma
cats = 8,377,000
dogs = 12,270,000
cats OR dogs = 20,640,000
dogs OR cats = 20,640,000
3. Similar patterns of inconsistencies can be observed in other systems.
Yahoo
cats = 17,800,000
dogs = 28,100,000
cats OR dogs = 40,100,000
dogs OR cats = 39,500,000
Results
1. unlike Google, Teoma and Yahoo do not recognize pipes as OR
2. unlike Google, Teoma and Yahoo produce more results when more terms are used in OR mode. This is what you should expect from OR queries.
Clearly Google’s OR implementation
1. is different from other OR's
2. is not a "true" Boolean OR (but others can debate this point).
I’m inclined to think the issue here relates more to the OR implementation, possibly connected with the regexps and parsing loops -for the query strings.
Orion
GoogleGuy
11-20-2004, 04:53 PM
Good observations, orion. A few things to bear in mind:
- if you hit different data centers with different queries, you can get (very slightly) different results, because not every data center is 100% identical in terms of number of machines, configuration, etc.
- Results estimates are only estimates. That's why we say things like "Results 1 - 10 of about 26,500,000" and only give three significant digits of precision when estimating results. Especially when you get off the beaten path with things like OR, site:, or the tilde operator, the code paths are a little different, so it shouldn't be a surprise that results estimates vary a little. Even with [cats|dogs] vs. [dogs|cats], because position matters in a query (i.e. [britney spears] is considered a different query than [spears britney]), you can end up with slightly different estimates. It's one of those things that we could probably revisit, but on the list of things to do, there always seem to be more important things above it.
GoogleGuy, who is a big fan of | because it's so compact. :)
orion
11-20-2004, 05:30 PM
- if you hit different data centers with different queries, you can get (very slightly) different results,
Indeed, this is to be expected for “very slightly” results.
Even with [cats|dogs] vs. [dogs|cats], because position matters in a query (i.e. [britney spears] is considered a different query than [spears britney]), you can end up with slightly different estimates.
I buy this for FINDALL (AND) or EXACT modes, but not for OR (ANY).
It's one of those things that we could probably revisit, but on the list of things to do, there always seem to be more important things above it.
Sorry to hear this.
Could you shed some light on why appending terms in OR tend to return less results? What is the reasoning involved?
Orion
GoogleGuy
11-21-2004, 04:35 AM
Could you shed some light on why appending terms in OR tend to return less results? What is the reasoning involved?
Again, it's not that fewer results are returned; it's an issue in how the results estimates are computed.
Let's take a concrete example. [stove] gives an estimate of 5.8M results when I do the query. [~stove] does a search for stove plus stove-related words such as stoves; in theory, [~word] will always give more results than [word], yet [~stove] gives an estimate of 4.71M words.
However, let's drill down and make sure that the tilde operator can return more docs by adding a few specific words, so that we can double-check. If you do the query [stove old-fashioned franklin venting], you'll get about 40 possible results. And the query [~stove old-fashioned franklin venting] will return about 75 results, including several where you can see "stoves" highlighted in the snippet. So tilde operator is acting like stove|stoves and does in fact return more results.
Why haven't we gone back to rework the results estimates code for some of these esoteric corner cases? Frankly, if you include all the other people who have asked me about this, the list would probably be:
- Gary Price
- orion
Very few people have shown an interest, so it hasn't been the highest priority. But I'm glad that you asked, and I hope this explains things a little. Just remember that the results estimates from any search engine are just estimates unless you narrow a search down enough to actually verify the results for yourself. :)
orion
11-21-2004, 08:43 AM
Thanks for your prompt response, GoogleGuy. I'm not sure that was the question. The examples provided are not OR operations.
What I’m interested is at implementation and query validation; in this case, at the OR implementation used by Google. Your implementation seem to work as expected in some cases, but not in others.
To illustrate, consider the following examples
In this case
Google Results
termita = 6,830
comejen = 1,530
termita|comejen = 8,250
comejen|termita = 8,250
terms appending in OR returns more results, which is one would expect from OR queries. Also transposition does not seem to matter.
However, in the following case, term appending in OR returns less results. Transposition does matter.
Google Results
car = 436,000,000
auto = 244,000,000
car|auto = 48,400,000
auto|car = 47,800,000
Now compare the followings with Google (taken from previous posts)
Google Results
cats = 25,100,000
dogs = 35,900,000
cats|dogs = 16,300,000
dogs|cats = 16,500,000
Teoma Results
cats = 8,377,000
dogs = 12,270,000
cats OR dogs = 20,640,000
dogs OR cats = 20,640,000
Yahoo Results
cats = 17,800,000
dogs = 28,100,000
cats OR dogs = 40,100,000
dogs OR cats = 39,500,000
Note that appending terms in OR returns less results in Google. Results almost reduced by half. Should we call this an OR response?
With OR queries, one would expect to get more results, not less (as shown in the case of Teoma and Yahoo and first case above with Google. In this first example (termita), Google returns more results when one appends terms in OR mode and transposition does not seem to matter.
Clearly there are inconsistencies in Google's OR implementation. However, if we go back to the idea of hitting different databases then it would make more sense thinking that OR is partially being implemented depending on which database is being commanded.
Orion
GoogleGuy
11-21-2004, 04:52 PM
Orion, the total number of available results for an OR query don't change; it's just that the order of the words in the query determines how the posting lists get traversed and sampled, which can lead to slight differences in the results estimates. But if you were to find two words [A OR B] such that the number of results was checkable, I believe you'd find that [B OR A] should return the same results. The only differences might be if you were hitting a different data center, but that should be pretty minimal/non-existent.
orion
11-21-2004, 07:30 PM
Thanks, GoogleGuy. For those like me that have been puzzled from day one by the OR implementation used by Google and other search engine systems this is an interesting query validation subject.
Orion, the total number of available results for an OR query don't change; it's just that the order of the words in the query determines how the posting lists get traversed and sampled, which can lead to slight differences in the results estimates. But if you were to find two words [A OR B] such that the number of results was checkable, I believe you'd find that [B OR A] should return the same results. The only differences might be if you were hitting a different data center, but that should be pretty minimal/non-existent.
In this, I agree. As a matter of fact, transposition should not matter when querying in OR for A or B or for B or A, unless we pull results from different databases or if at the time of testing these have been upgraded.
1. Still, it is unclear to me why appending terms in OR return less results in Google when in fact it should return more.
2. On the other hand, even within specific Google sections inconsistencies with the OR implementation arise.
For Google Groups, I get
Google Groups Results
rats = 890,000
pests = 103,000
Insects = 319,000
rats OR pests = 789,000
pests OR rats = 789,000
rats OR insects = 1,080,000
insects OR rats = 1,080,000
Now compare with this other example in Google Groups
kerry = 2,350,000
bush = 11,000,000
kerry OR bush = 3,660,000
bush OR kerry = 3,660,000
In these OR queries, pair-wise transposition does not seem to matter. Still one can see some inconsistencies from the expected results when one appends additional terms.
1. In the first case the results increase
2. in the latter case the results decrease
How the first case compares with its Spanish counterpart in Google Groups?
ratas = 17,600
plagas = 4,130
insectos = 7,030
ratas OR plagas = 19,600
plagas OR ratas = 19,600
ratas OR insectos = 22,300
insectos OR ratas = 22,300
Appending terms tend to increase the total number of hits. This is what one would expect from appending terms in OR mode by querying a given database. Note that pair-wise transposition does not seem to matter either. Overall, a true Boolean OR query of A or B should return documents containing A or B regardless of the query sequence.
Now appending a third term gives
pests OR rats OR insects = 913,000
plagas OR ratas OR insectos = 24,600
Compared with previous results, appending a third term to the two-term OR queries
1. decreases the number of results in the first case
2. increases the number of results in the latter
Orion
orion
11-22-2004, 09:11 AM
COMPARISON TESTS OF OR IMPLEMENTATIONS BETWEEN SEARCH ENGINES
Teoma Results
cats = 9,863,000
cats OR dogs = 24,720,000
cats OR dogs OR birds = 35,670,000
cats OR dogs OR birds OR rabbits = 37,290,000
Yahoo Results
cats = 17,900,000
cats OR dogs = 39,200,000
cats OR dogs OR birds = 54,400,000
cats OR dogs OR birds OR rabbits = 55,500,000
MSN Results
cats = 3,740,196
cats OR dogs = 8,106,370
cats OR dogs OR birds = 11,129,762
cats OR dogs OR birds OR rabbits = 11,445,538
MSN (BETA)
cats = 30,921,985
cats OR dogs = 31,020,412
cats OR dogs OR birds = 45,556,384
cats OR dogs OR birds OR rabbits = 46,963,360
Google Results
cats = 24,900,000
cats OR dogs = 16,400,000
cats OR dogs OR birds = 15,800,000
cats OR dogs OR birds OR rabbits = 12,800,000
Google Groups Results
cats = 3,440,000
cats OR dogs = 2,870,000
cats OR dogs OR birds = 2,930,000
cats OR dogs OR birds OR rabbits = 2,750,000
Users draw conclusions from what they get. From the user side, the message is that Google's OR implementation is contrary to what one would expect (more results, not less are expected). I'm sure there must be a reason, which I don't have.
This test is aimed at understanding a bit more Google query mechanisms; OR in this case. GoogleGuy Team, could you shed some light? It will be very much appreciated.
Orion
orion
11-22-2004, 01:24 PM
VISUALIZING BOOLEAN SEARCHES
For those interested at visualizing Boolean searches through Venn Diagrams, I invite you to read Laura Cohen’s Boolean Searching on the Internet (http://library.albany.edu/internet/boolean.html) Her excellent work should clarify any doubts on
1. what is/is not a true Boolean OR implementation
2. the expected effect of OR in the total number of retrieved results
Pipes in MSN
Now, a bit more in line with the original post of this thread. In MSN BETA, the use of pipes is interpreted differently when bounded by spaces. Check this out.
MSN (BETA)
cats OR dogs = 30,921,401
cats | dogs = 30,921,401
cats|dogs = 925,719
“cats dogs” = 925,719 (EXACT mode)
cats dogs = 30,755,988 (FINDALL mode)
WOW! Now this is interesting to point out to new users of MSN BETA.
In MSN BETA we get EXACT searches results when we don’t delimit the pipe by spaces. Now let’s compare this with the current OR implementation in MSN SEARCH
MSN SEARCH
cats OR dogs = 8,106,096
cats | dogs = 1,344,349
cats|dogs = 1,348,957
“cats dogs” = 170,749 (EXACT mode)
cats dogs = 1,348,522 (FINDALL mode)
With minor differences, queries with pipes with/without spaces seem to be implemented as FINDALL, also known as AND.
To verify these results, try queries with car insurance.
Orion