PDA

View Full Version : Change To Link Bomb Sign Of New Link Analysis Shift?


dannysullivan
07-22-2004, 11:10 AM
If you aren't aware, Daniel Brandt of Google Watch fame started (http://searchenginewatch.com/sereport/article.php/3335091#googlebomb) a link bomb to make the page (http://www.google.com/corporate/execs.html) about Google's management come up tops by linking to it with the words "out-of-touch executives." It's ranked tops until apparently this week.

Brandt emailed me today claiming that Google must have deliberately changed things like this:

If your search term includes the adjacent two words "touch" and "executives," regardless of which order they are in, then the links from my pages with the "out-of-touch executives" are discounted and do not provide any "juice" for the results you see. However, each of these words by itself provides juice, as long as the two words are not adjacent.
I found two exceptions to this. For example, executives out of touch (http://www.google.com/search?q=executives+out+of+touch) does NOT have the terms adjacent, yet the page is still buried. That runs counter to Brandt's theory.

Meanwhile, google executives touch (http://www.google.com/search?&q=google+executives+touch ) does have the words adjacent, yet the page comes up number one. Again, counter to the claim.

This all comes at a time when people are reporting (http://forums.searchenginewatch.com/forum/forumdisplay.php?f=34) a lot of backlink changes at Google. I also noticed yesterday that the Kerry campaign web site is no longer tops for "waffles" on Google, as he has been courtesy (http://searchenginewatch.com/searchday/article.php/3345071) of another link bomb campaign.

That makes me think that the change in Brandt's case didn't happen specifically because Google itself didn't like the publicity but instead because Google's trying some new link analysis changes in general. In fact, it seems like they might be leaning toward doing something that Brandt's wanted (http://searchenginewatch.com/sereport/article.php/3335091#googlebomb), to not let link text count (or as much) unless the words are actually on the page.

Of course, the miserable failure link bomb (http://searchenginewatch.com/sereport/article.php/3296101) still comes up tops with the Bush bio -- and those words don't appear on that page. That might make you think Google is indeed fixing thing selectively, given that major Democrats like Gore have visited Google in the past. But Michael Moore is hanging in there in the top results as well, and he doesn't use those terms on his page.

It think it's most interesting to focus on the word "executives" in Brandt's case. Consider the fact that out of touch management (http://www.google.com/search?q=out+of+touch+management) has also brought that Google page to the top (and still does) courtesy of Brandt's campaign. In fact, the New York Times embarrassingly reported (http://www.nytimes.com/2004/06/22/business/22google.html?ex=1403236800&en=b82d780141be8558&ei=5007&partner=USERLAND)this was something people upset and inside Google had caused, something they later corrected.

So, why does out of touch management still work but out of touch executives doesn't? One complication is that Brandt in the past few days has changed his links to "out of touch management."

Another reason may be that the word "executives" doesn't appear on the Google page in question -- while the word management does, and quite prominently.

Consider also that other searches for prominent words on the page combined with "out of touch" still bring the page up:

out of touch larry (1.2 million matches)*
out of touch sergey (14,200)*
out of touch brin (11,800)
out of touch google (605,000)*
out of touch cindy (619,000)

While these don't:

out of touch president (1.7 million)
out of touch products (5.2 million)
out of touch page (6.5 million)
out of touch technology (2.7 million)
out of touch employees (958,000)*
out of touch engineering (822,000)

It may be that a new system is working where the more popular a word is (in terms of appearing on pages across the web), the more required it is to appear on a particular page for link text to also work with it. In other words, "out of touch sergey" would work because there aren't that many pages with all those words on them, so the link text is allowed more influence. But "out of touch page," there are lots of pages with those words on them, so link text analysis might be suppressed more.

I doubt it's this simple -- nothing ever is. But I've talked with Google before about the entire link bombing issue, and they have said they're looking at new ways to avoid some of the problems that can happen. This might be the start of that. Or maybe Daniel's right, and they just don't like that management page coming up :)

By the way, those searches I noted with * above? Those all bring up this thread (http://www.webmasterworld.com/forum86/305.htm) from WebmasterWorld where this issue has also been brought up in the past. I thought it was interesting that this page shows up at times with the Google management page doesn't. Key reason that I can see? When it shows up, it has all the words in the query.

I asked Daniel to drop by to share his own thoughts -- and more speculation or observations are welcomed.

rustybrick
07-22-2004, 11:59 AM
Very interesting indeed.

I did some of my own tests on the keyword computers (http://www.google.com/search?q=computers&ie=UTF-8&oe=UTF-8).

Dell ranks #1. The keyword "computer" is not found anywhere within the on page copy, only within the meta information. Look at the Google cache (http://216.239.39.104/search?q=cache:W8wjTkP4L48J:www.dell.com/+computers&hl=en) for Dell on that search, "These terms only appear in links pointing to this page: computers." The allinanchor:computers (http://www.google.com/search?hl=en&lr=&ie=UTF-8&c2coff=1&q=allinanchor%3Acomputers&btnG=Search) brings up dell as #1 as well.

So, if I understand what you are saying, Dell should either (1) not rank for computers, or (2) Google looks at the meta information when determining rankings.

dannysullivan
07-22-2004, 12:41 PM
So, if I understand what you are saying, Dell should either (1) not rank for computers, or (2) Google looks at the meta information when determining rankings.
In this case, Google's definitely indexing the text in the meta description tag. Go back to the search on computers, and you'll see the word being highlighted.

That skews things. To explain, let's assume there was absolutely no use of the word "computers" on the Dell page and it was still ranking well, purely on the basis of link text. That has been the case for some pages in the past on Google. It's also why link bombs have worked.

My thought is that Google might have made a change so that for high frequency words, such as computers, link text might only be allowed to be credited to a page if the page itself also made use of that word.

So in this situation, if Dell didn't have the word computers on it, it might not rank well despite all the links point at it.

I don't think the case could be that simple. Google might also be trying to group links together, discount them in various ways, look at the frequency of words and so on. Nothing will ever be so simple. But it may be that they are introducing some new ways to counter link bombing.

Here's a different example: failure (http://www.google.com/search?q=failure). The Bush page still ranks tops yet it doesn't have that word on the page at all, according (http://66.102.9.104/search?q=cache:GPN6xA7xUV8J:www.whitehouse.gov/president/gwbbio.html+failure&hl=en) to the cache. And failure on its own is popular -- Google finding over 22 million matches for it. Not all of those matches are pages with the word on them -- but plenty are.

That's completely counter to what I've suggested. Here's a popular word and the link text is still being entirely credited for it for this page despite the fact that the page doesn't use it at all.

So, more may be going on. Here's something else to consider. I get plenty of matches for bush failure or moore failure (moore ranks tops for just failure, as well) -- both over a million. Perhaps part of the link analysis system is to look for the cooccurance of certain words.

In other words, Google sees lots of links saying "miserable failure" point at the Bush home page. Should I trust this link text for either one of the words or both of them? Let me look at ordinary text on pages across the web. Do a see many pages where some of the words on the Bush bio page (like the word "Bush" itself) also appear near the word "failure." Yep, over 1 million of them. OK, I'll trust the link text for this.

How about Kerry? Well, look for kerry waffels (http://www.google.com/search?hl=en&lr=&ie=UTF-8&c2coff=1&q=kerry+waffles), and you get only 19,700 matches. Far fewer pages out there apparently have the words kerry and waffles in close proximity. So, you might trust less the link text pointing at the Kerry campaign site and saying waffles.

SPECULATION WARNING! The above is purely speculation, may be completely wrong and almost certainly would overlook many issues with how links are actually being used. The point is, there are a variety of things you could try to reduce the impact of link bombing but still maintain the advanges of link analysis for finding popular sites on popular topics.

Everyman
07-22-2004, 01:32 PM
Meanwhile, google executives touch does have the words adjacent, yet the page comes up number one. Again, counter to the claim.
I see this too. But to confuse things further, google touch executives buries it again (leave in "google" but switch the order of the other two).

Here's another strange one: touch google executives is at number 1, but executives google touch is buried.

Okay, so the adjacency isn't it, and the order does matter. Two strikes against me. But I still insist that it was designed to zap my bomb. The reason "management" works is for the same reason that "miserable president" works. The word "miserable" alone is sufficient to bring up Bush, and the word "failure" alone is sufficient to bring up Bush. Now all you have to do is find a strong word on Bush's page to go with one or the other.

That's the same reason "management" continues to work. Yes, the fact that I changed my links 48 hours ago should strengthen "out of touch management" considerably. But it's been number one anyway ever since the New York Times wrote about it on June 22. I researched the issue then, and only one or two bloggers had accidently stumbled onto the "out of touch management" one week before the NYT. That's not enough juice. The juice for "management" came from Google's title and headline, and the juice for "touch" came from "out of touch executives."

The way I would phrase it is that while anchor text in links has, for two or more years now, given juice to the target page, suddenly we also have a situation where "negative juice" is also possible. When Google is inclined to rank a page based on external anchor text, perhaps it now has the option of consulting a list of "anti-keywords" found in anchor text for that page. And while I'm forced to agree that it's not as simple as I thought before Danny found his exceptions, I still think Google went after my bomb.

They probably did it before June 22 and weren't even aware that "management" would turn out to be an issue. I wasn't even aware of this until I saw the NYT piece. It's possible that the lists of anchor text and anti-anchor text associated with a page are set fairly early in the update cycle, and it was just too late to throw "management" into the list for Google's page at www.google.com/corporate/execs.html.

I'm wrong in thinking that this is a simple case of adjacency. But the words "touch" and "executives" are so hyper-sensitive for the ranking on Google's page (you're either at number one or you probably aren't even in the top 1000), that I think my claim of being the first Google bomber to get bombed by Google still stands up.

seobook
07-22-2004, 01:45 PM
while it may not be related...

there have been a decent number of people reporting home pages not showing any pagerank while inner pages do

they have been showing a different backlink profile for most sites than what they usually showed in the past

some people have recently been complaining about their sites being dropped for something (this happens all the time so it might mean nothing).

i think showing the different link profile might be done to confuse us while bigger changes are occuring. hopefully we are in for a really fun next couple weeks :)

Everyman
07-22-2004, 04:51 PM
I am reconsidering the possibility that the behavior with respect to "out of touch executives" is related to an extra algorithm layer in Google that was designed to help control spam. I considered this two days ago, but rejected it because there was no corroborating evidence from other webmasters observing their own sites. Since then I've seen a report on another forum from someone who got nailed for a single keyword that he started emphasizing in anchor text just two months ago. In every other respect his rankings are normal, including keywords he's used for months, but he's suddenly having a lot of trouble with this one keyword. It drops his page out of Google almost entirely.

Bear with me, as I take you from the obvious to the speculative.

1. It's obvious that Google maintains a list of keywords for many pages. For the most part, this list finds its way into the inverted index, along with the position of each word on the page. Let's call this the on-page word list.

2. It's also obvious that Google somehow adds to this list from inbound anchor text, regardless of whether the anchor text keywords are on the page. Let's call this the anchor-text list.

3. The third obvious point is that search terms related to either or both the on-page list and anchor-text list cause the page to be considered further for ranking on those words, word pairs, or phrases.

4. Here's where it gets speculative. Let's assume that Google installed an additional list, called the anti-anchor-text list. Only a small fraction of web pages have this list, so it doesn't really slow things down. The algorithm makes a pass at the SERPs page, before displaying it, to see if any of the links about to be displayed to the searcher are flagged as having this anti-anchor-text list.

5. If any of the links are flagged, then the anti-anchor-text list is pulled up for that link. Additional rules are applied to determine whether that link is maintained in the display, or rejected. The rules essentially represent some level of interaction between four things: a) the search terms entered by the user, b) the on-page word list, c) the anchor-text list, and d) the anti-anchor-text list. The rule set could be optimized so that you end up with conditions that appear almost random to us. For example, if you apply the rule set to each word in the search term separately, you then arrive at a yes/no decision for each. If you accept it based on the first word, you go on to look at the second word. This could explain what we're seeing with my Google bomb.

6. Now let's assume that Google developed a way to populate the anti-anchor-text list automatically, by using some sort of link analysis designed to snare spam. This would almost certainly be an off-line crunching sort of thing, like the old-time PageRank calculation that took days to do. Maybe my Google bomb got caught in such an algorithm?

The difference between what's happening now and what happened last November, is that this time the crunching is precomputed instead of being done on the fly. This is not a real-time filter (Scroogle is no help at all), but a new level of analysis for anchor text. The flag for whether the anti-anchor-text list exists is checked on the fly, but the anti-anchor-text list itself is precomputed. Essentially, it's a done deal. They could turn off the flag detection, but that would make it all-or-nothing. There's no knob they can turn back slightly.

How does this relate to the fact that so many more backlinks are showing now? Perhaps Google wants to place a layer of fog over this new algo. Perhaps Google feels that they have sufficient potential control now over backlink spam that they can afford to show more information. Perhaps it's unrelated.

How do we confirm this? Webmasters who see a page drop out for their keywords, and who have optimized for these words in inbound anchor text, should play with the placement of each keyword in the search box and see if this changes the results dramatically for the ranking of their page.

orion
07-23-2004, 04:54 PM
Since we are dealing with terms co-occurrence and term combinations, a co-occurrence and sequencing analysis may shed some light, at least partially.


Results are only valid for the Google database and may chance in time.

QUERY CONDITIONS

TARGET: GOOGLE
DATE/TIME: 07-23-2004 AT 11:00 AM
CASE: INSENSITIVE
MODE: FINDALL (for co-occurrence analysis, only)

Using the semantic tools described in the http://forums.searchenginewatch.com/showthread.php?t=48 thread we obtained the following (see http://www.miislita.com/semantics/c-index-8.html)

CASE I

Query 1
k1=touch n1=33,700,000
k2=executives n2=6,300,000
k12=touch executives n12=369,000
c-index = 9.31 ppt
touch executives, ef = (292/369,000)*100 = 0.0791%

Query 2
k1=executives n1=6,300,000
k2=touch n2=33,700,000
k12=executives touch n12=370,000
c-index = 9.34 ppt
EF ratio = (17/370,000)*100 = 0.0046%

Query 1 vs. Query 2: CO-OCCURRENCE AND SEQUENCING ANALYSIS
1. Queries show similar degree of co-occurrence (c-indices: 9.31 ppt vs 9.34 ppt).
2. Query 1 shows more degree of sequencing (EF ratios: 0.00791% vs 0.0046%)
3. Query 1 shows more co-occurrence results in EXACT mode (292 vs 17).
4. Query 1 shows more results related to the bomb in the top 10 positions.
5. Query 1 shows documents about the bomb in EXACT and FINDALL modes in the top 10 positions.
6. Query 2 does not show documents about the bomb in EXACT mode in the top 10 positions.
7. Query 2 show documents about the bomb in FINDALL mode in the top 10 positions.


As a "bomb", Query 1 appears to be more "loaded" than Query 2. In Query 1, some documents retrieved in EXACT mode find their way to the top positions in FIND ALL mode.


CASE II

Query 3
k1=out-of-touch n1=424,000
k2=executives n2=6,300,000
k12=out-of-touch executives n12=17,600
c-index = 2.62 ppt
EF ratio = (279/17,600)*100 = 1.5852%

Query 4
k1=executives n1=6,300,000
k2=out-of-touch n2=424,000
k12=executives out-of-touch n12=17,600
c-index = 2.62 ppt
EF ratio = 36/17,600*100 = 0.2045%

Query 3 vs. Query 4: CO-OCCURRENCE AND SEQUENCING ANALYSIS
1. Queries show identical degree of co-occurrence (c-indices: 2.61 ppt).
2. Query 3 shows more degree of sequencing (EF ratios: 1.5852% vs 0.2045%)
3. Query 3 shows more co-occurrence results in EXACT mode (279 vs 36).
4. Query 3 shows more results related to the bomb in the top 10 positions.
5. Query 3 shows documents about the bomb in EXACT and FINDALL modes in the top 10 positions.
6. Query 4 does not show documents about the bomb in EXACT mode in the top 10 positions.
7. Query 4 show documents about the bomb in FINDALL mode in the top 10 positions.

As a "bomb", Query 3 appears to be more "loaded" than Query 4. In Query 3, some documents retrieved in EXACT mode find their way to the top positions in FIND ALL mode.


CASE III

Query 5
k1=out of touch n1=11,900,000
k2=executives n2=6,300,000
k12=out of touch executives n12=312,000
c-index = 17.44 ppt
EF ratio = (279/312,000)*100 = 0.0894%

Query 6
k1=executives n1=6,300,000
k2=out of touch n2=11,900,000
k12=executives out of touch n12=314,000
c-index = 17.56
EF ratio = (36/314,000)*100 = 0.0115%

Query 5 vs. Query 6: CO-OCCURRENCE AND SEQUENCING ANALYSIS
1. Queries show similar degree of co-occurrence (c-indices: 17.44 ppt vs 17.56 ppt).
2. Query 5 shows more degree of sequencing (EF ratios: 0.0894% vs 0.0115%)
3. Query 5 shows more co-occurrence results in EXACT mode (279 vs 36).
4. Query 5 shows more results related to the bomb in the top 10 positions.
5. Query 5 shows documents about the bomb in EXACT and FINDALL modes in the top 10 positions.
6. Query 6 does not show documents about the bomb in EXACT mode in the top 10 positions.
7. Query 6 show documents about the bomb in FINDALL mode in the top 10 positions.


As a "bomb", Query 5 appears to be more "loaded" than Query 6. In Query 5, some documents retrieved in EXACT mode find their way to the top positions in FIND ALL mode.


CASE II AND CASE III ANALYSIS

1. out-of-touch returns 424,000 results.
2. out of touch returns 11,900,000 results.
3. out-of-touch executives returns 17,600 results.
4. out of touch executives returns 312,000 results.
5. CASE III shows more degree of co-occurrence than CASE II (about 17 ppt vs about 2.6 ppt)
6. CASE II shows more degree of ordering (sequencing) than CASE III (by more than 1 order of magnitude)

It is clear that the above use of hyphens introduce a degree of selectivity in the queries, affecting the end results (less documents retrieved). These results also suggest that CASE II and CASE III are different scenarios. In a more general sense, one should expect that the use of delimiters could affect query results, especially queries formulated in EXACT mode. The situation is more complex that it looks, especially when too lose or generic terms are used ("out","of", etc..).

EF ratios measure the degree of ordering present in searches conducted in FINDALL mode. In this particular experiment, the EF ratios of Queries 1 - 6 are too small to claim that search engine positioning was the result of ordering. Furthermore, note that in FINDALL mode Query 2, Query 4, and Query 6 produce top 10 results about the bomb, but in EXACT mode no top 10 results about the bomb are obtained. This confirms that the sequences utilized do not play a significative role.

Why then Query 2, 4, and 6 produce results about the bomb in FINDALL mode? It is possible that the returned documents may (or may not) be using other type of sequences, optimization strategies or linking techniques, including but not limited to "link bombs".


Orion

jseamless
07-23-2004, 06:27 PM
If google is being bombed for words which do not even exist on page that must cease. I really hope they are implementing this one.

Everyman
07-23-2004, 08:44 PM
If google is being bombed for words which do not even exist on page that must cease. I really hope they are implementing this one.
I agree, but I'm not hopeful. Why start on my puny little bomb? The miserable failure bomb goes to the White House, and John Kerry is even keying an AdWord off of it. (Oops, I answered my own question. Google gets money from it and besides, Eric Schmidt supports Kerry for President.)

I don't understand your point. Orion. Surely the fact that we're observing one particular page (www.google.com/corporate/execs.html) appear variously at number 1 to number 3, or not appear hardly at all (number 380 to 1000+), depending on how, where, and when the term "executives" is entered into the search box, suggests that this situation is rather binary in nature. It's very rare to get anything between 3 and 380.

I now think that "executives" is the word that Google went after, and "touch" may not have been targeted. If "executives" is preceded by "google" in the search box then they let it through on its merits, otherwise they zap their /corporate/execs.html page from the results.

For me the binary question is, "Did Google aim a shotgun at spam and my Google bomb got hit by some stray buckshot, or did Google go after "executives" with a sniper's rifle?" I still don't see enough reports of missing pages from other webmasters to convince me that this was a general attack on spam. It's been almost a week, and you'd think that some webmasters would be screaming by now.

I still claim to be the first bomber to get bombed by Google. Danny is eager to find contrary examples (he doth protest too much?), but I still think the weight of the evidence is with me.

Marcia
07-23-2004, 09:49 PM
The thought that came up when first reading the article that prompted the original forum post mentioned was that there's a potential for difference in effectiveness between a bomb for a a phrase that isn't competitive and isn't being targeted at all and one for which at least some people are making an effort to rank, which is useful to see for people who optimize sites.

dannysullivan
By the way, those searches I noted with * above? Those all bring up this thread from WebmasterWorld where this issue has also been brought up in the past. I thought it was interesting that this page shows up at times with the Google management page doesn't. Key reason that I can see? When it shows up, it has all the words in the query.

While it was posted because of reading the article, the first post was deliberately "conservatively" optimized with just a few simple basics to see how it would fare against the bombed page, as stated in message # 7

How much does it actually take to Googlebomb for an obscure phrase, if there's even just a tiny bit of "optimization" and the natural use of language, just in the normal course of conversation?

How many links would it take to move Google out of the #1 position there? My guess is that one, maybe two or three links would do it and they wouldn't all necessarily have to be the exact phrase.

There was very little done in the first post, as also indicated in message #7,

OK, now let's see how hard this is to do.

You'll please notice the title of this thread and the fact that the exact phrase was used in outbound anchor text - and deliberately put in bold font for my own personal amusement.

The rest would be determined by random occurrences in the rest of the thread, as in any normal course of natural conversation, with random density.

As we can see in the Google SERP, where that thread now sits at #1 & #2 indented, the original title read, which is in bold except for the stop word, was

Google: Out of Touch Management

which if we look at the current thread we can see was edited to read

Google Management Article

We can also see that there were originally 16 messages, which more than likely gave a good random number of occurrences and density, and spanning two pages explains the indented result - which in people's efforts to get indented results can tell us something about the use keywords in page titles.

This 16 message thread spans 2 pages: ( [1] 2 ) > >. Google: Out of Touch
Management FUD from the Pittsburgh Post-Gazette. ... out of touch management. ...

There are now only 10 messages left in the thread, so with the change in number of posts, and consequently in the of number of words, keyword occurrences and density, combined with the altered title not including the exact phrase, it will be interesting to see how it fares in near future compared with the original Google page in question, which is now at #5.

What's also interesting to note, regardless of the fact that Google's link: command is now worth less than nothing, is that there is only one rather strange backlink showing up for that thread

Link to Google Management Thread (http://www.google.com/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=link:http%3A%2F%2Fwww%2Ewebmasterworld%2Ecom%2Ff orum86%2F305%2Ehtm)

It's a given that scoring is based on a combination of off-page and on-page factors, but for all practical purpose what remains as a constant for effort and study in site construction is the many factors that go into on-page and site-wide optimization considerations, since in the long run those are generally more easily controllable by the webmaster.

Everyman:
I still don't see enough reports of missing pages from other webmasters to convince me that this was a general attack on spam. It's been almost a week, and you'd think that some webmasters would be screaming by now.

Daniel, people are screaming about missing or PR0 pages, but not necessarily those who have spam issues. Unfortunately, when the net is put out it manages to catch a lot of squid along with the yellowtail.

dannysullivan
07-23-2004, 09:54 PM
Why start on my puny little bomb?
Actually, you wouldn't be the first targeted even if you were indeed targeted.

After I did the write up on the miserable failure search, I did a follow-up interview with Google on the issue of how they deal with link bombs and the impact on relevancy. This was a couple of months ago and for a piece on relevancy I'm planning for the future.

One thing that was mentioned is that some time ago, Google blocked certain profane words. If I understand it right, you couldn't point at a page with some profane text to force that page to appear.

That's key reason that the dumb motherf--- search doesn't bring up the Bush campaign store as was the case (http://searchenginewatch.com/sereport/article.php/2163381) ages ago. So if anything was targeted specifically, ironically, it was a link bomb that hurt Bush.

Danny is eager to find contrary examples (he doth protest too much?)
No, I'm not. You emailed me that your link bomb was specifically targeted by Google and offered an explanation as proof. So I checked it out, because it's a serious allegation -- and it was unusual for that listing to have changed. It was rather easy to find exceptions to what you said. I didn't have to be eager to find it.

Also this same week -- and before you contacted me -- I also noticed that the Kerry waffles search had changed. It could indeed be that you and the Kerry link bomb were both targeted because Google doesn't like you dissing their management and they all love Kerry -- and I said that at the very beginning:

Or maybe Daniel's right, and they just don't like that management page coming up :)
OK, adding the smiley face perhaps took away from the statement. But I didn't mean for it to come across like I wasn't taking you serious. It could very well be that behind the scenes, they're doing something to influence these specific link bombs. I don't know.

It could also be something else -- and the suggestions of this are also worth exploring. That's what I do, Daniel -- try to look around at all the explanations I can find and lay them out for people, so people can make up their own minds. And that's why I'm glad you're also posted any further findings you come up with. The more information, the better.

Everyman
07-23-2004, 11:01 PM
Okay, Danny, I apologize for "doth protest too much." You've been fairly straightforward and objective with me for a year now, and less inclined to accept Google's spin uncritically, and I should give you the benefit of the doubt.

I agree that the implications of Google playing games with the algo, which is what we're basically suggesting by entertaining even the possibility of zapping my bomb, demoting the waffles bomb, and letting the miserable failure bomb fly high with Kerry's AdWord next to it, are quite serious. If fact, if it were possible to prove this with a smoking-gun internal memo or something, it would be a story. A couple of heads would have to roll at Google, assuming it's just some algo engineers doing it for fun. I've already told one reporter that I don't have enough for a story.

By the way, congrats on noticing the waffles thing. That is suspicious, falling to number 16 like that. But if my bomb was demoted to 16 from 1, instead of falling to somewhere between 380 and 1000+, I wouldn't have raised the issue.

Yes, Marcia, webmasters are always complaining about pages dropping out. And yes, bombs that have little competition for the keywords are many orders of magnitude easier to do. But the reason I'm suspicious is that my bomb was going along perfectly since late March, when it rose to number one, up through the last time I checked it, which was -- to the best of my memory -- about two weeks ago.

When I say it was number one from late March, that means over three solid months of number one, without a hint of anything even coming close to challenging its position. Then all of a sudden, it's crushed.

Perhaps it got caught in an anti-spam effort. But it must be something new that Google is doing with spam, otherwise the bomb wouldn't have been doing so well for three months. And why doesn't "miserable failure" get caught in the same anti-spam effort? I don't see anything really different being reported by webmasters over the last few weeks (yes, there's the backlink upheaval, but I mean pages disappearing for keyword searches). If what my bomb experienced was happening across the entire Google index, you'd have threads at WmW like they had last November. I just don't see it, and I've looked on most of the forums for evidence. I've found two or three posts that might apply, but that's not much.

orion
07-24-2004, 12:34 AM
OK, folks, back to the discussion of results...

Everyman (Daniel), the above results I provided were given in a general sense and are valid for the tested cases only. I haven't check combinations with the "google" or "management" terms (not yet). Including the term may (or may not) change the picture.

On other matters,...

The default search mode in Google is FINDALL (also known as AND) anywhere in the document and without regard for sequence or proximity. Thus, including all terms in a query doesn't mean that the retrieved pages are relevant to the sequence entered in the query box.

On the other hand, contrary to popular opinion, queries in EXACT mode are not searches for phrases. (See Keywords Co-Occurrence thread). These are searches with regard for sequecing and proximity. How proximity is defined varies between search engines and depends on many things, among others on the library of stopwords and delimiters to be ignored by the target system.

If one suspect that an exact sequence plays a role, then one must determine the fraction of documents containing the sequence when one search in the default mode (FINDALL). This is done with EF ratios.

Finally, for the combinations without hyphens and marked with an astherisc by Danny (original post of this thread), these queries should return the www.webmasterworld.com/forum86/305-2-10.htm page or related pages in the top 10 positions. This is what we found.

QUERY CONDITIONS

TARGET: GOOGLE
DATE/TIME: 07-23-2004 AT 9:30 PM
CASE: INSENSITIVE

EF RESULTS

1. out of touch larry *, EF ratio = (9/993,000)*100 = 0.0009%
2. out of touch sergey *, EF ratio = (6/14,400)*100 = 0.0417%
3. out of touch google *, EF ratio = (28/630,000)*100 = 0.0044%
4. out of touch employees *, EF ratio = (11/966,000)*100 = 0.0001%

As expected, Results 1, 2, and 3 in FINDALL return the www.webmasterworld.com/forum86/305-2-10.htm #1 or close.

Result 4 in FINDALL shows the www.webmasterworld.com/forum86/305.htm page in position #6 with the ChannelMinds.com site occupying positions #1 and #2.

How this could be possible considering the few results obtained in EXACT mode? Let see.

Result 1 in EXACT mode produced 9 results. Using "repeat the search with the omitted results included" reveals many secondary results coming from webmasterworld posts.

Result 2 in EXACT mode produced 6 results. Using "repeat the search with the omitted results included" reveals many secondary results coming from webmasterworld posts.

Result 3 in EXACT produced 28 results but displays only 3. Using "repeat the search with the omitted results included" reveal many secondary results coming from blog posts, especially the TechnologyReview.com site.

Result 4 in EXACT produced 11 results but displyas only 7. The webmaster.com page in question does not appear in the top 10. In this mode, the ChannelMinds.com site is #1. Using "repeat the search with the omitted results included" reveals many secondary results coming from the channelminds.com site and the rest coming from dissimilar sites.

Finally,...

In Results 1 and 2, the weight assigned to the page ranked #1 in FINDALL mode appears to be supported by "the suppressed" secondary documents found through the "repeat the search with the omitted results included" and containing the exact sequence. In Result 3 TechnologyReview.com site finds its way to the top 10.

It appears that not mere word sequencing is playing a role here. However, it does appears to play a role when links/documents are coming from the same domain pointing to a page from the same domain. This may explain why ChannelMinds.com site is #1 for out of touch employees and why the TechnologyReview.com site finds its way to the top 10 (Result 3).

These results may change over time.

Orion

orion
07-24-2004, 01:26 PM
Not sure if this add some real-life perspective to the "out of touch management" thing.

According to this news
http://business.bostonherald.com/technologyNews/view.bg?articleid=36942

Dr. Brian Reid is suing Google. In his complaint, Dr Reid, 54, and quote "alleges that the Mountain View, Calif., company fired him as its director of operations last February because he didn't fit in a culture emphasizing ``youth and energy.'' He also claims he was discriminated against because he's a diabetic."

The article mentions that "Page made the final decision to fire Reid, the lawsuit states. Reid said Shona Brown, vice president of business operations, told him he was incompatible with Google's youthful atmosphere. After he left, Reid said he learned he was replaced by someone in their 30s."

Gary has noticed this news in this SEW thread

http://forums.searchenginewatch.com/showthread.php?t=742

along with a bio of Dr Reid (http://justus.anglican.org/reid.html)


Orion

lots0
07-24-2004, 03:42 PM
However, it does appears to play a role when links/documents are coming from the same domain pointing to a page from the same domain.
Might be interesting to take some of these pages that have dropped like a rock and look just at internal links for a moment (for my purposes I am going to define internal links as links originating from the same class C IP).

I wonder if the pages were penalized because of what percentage of the internal links have duplicate anchor text...

Marcia
07-24-2004, 04:53 PM
>>penalized because of what percentage of the internal links have duplicate anchor text...

lots0, I've seen it happen - a problem with a site exacerbated by an excessive percentage of identical anchor text within the site itself.

Everyman
07-24-2004, 05:37 PM
At the peak I used ten links on eight different domains. One of the eight was a dot-com, the other seven were dot-orgs. Each domain had a static IP. Four of them were from one Class C, and four from a different Class C. The anchor text was identical: "out-of-touch executives". Seven of the link pages were PageRank 5 or 6; the other three were lower.

I'm not exactly in the same league as some spammers out there!

And why did it happen suddenly after more than three months? People have been talking about identical anchor text and same Class C for many months now; it's nothing new.

seobook
07-24-2004, 06:38 PM
>>penalized because of what percentage of the internal links have duplicate anchor text...

lots0, I've seen it happen - a problem with a site exacerbated by an excessive percentage of identical anchor text within the site itself.

many sites use the same phrase link using breadcrumb navigation. did the sites that got penalized usually not have many external links pointing into them? what happened when they were penalized? was it just a term penalty or like an out of the index penalty?

orion
07-24-2004, 09:33 PM
I must agree that there is "nothing new under the sun". The method of appending false relevancy to documents with keywords is well documented in this blog

"Attaching Keywords to Any Site"
http://blog.outer-court.com/archive/2004_05_06_index.html#108385851918769310
The original bomb was using http://www.cnn.com/?-gmail-account

Garrett French also discusses and tried here

http://www.webpronews.com/insiderreports/searchinsider/wpn-49-20040514NewGoogleBombingMethodFound.html

French tried with the following delimiters

http://www.google.com/?-teoma-rules
http://www.google.com/#-Teoma-Rules

The technique involves manipulation of delimiters in links and queries and also appears to work with Yahoo.

Is interesting to point out that querying Google with certain delimiters and Daniel's "out-of-touch management" expression brings up Google's www.google.com/corporate/execs.html page. Some delimiters brings nothing and other bring surprising results and as follow (query mode is FINDALL)

1. The following returns nothing

out-of-touch_management
out-of-touch&management

2. The following returns Google's page with many pages in the top 20 positions being quite critical about George Bush's administration. I count this one as a collateral bomb to the Bush administration, since few documents mention Kerry, too.

out-of-touch|management (multiple pipes -eg. ||, |||, |||- produce same results)

This particular query does not return the webmasterworld pages or blogger pages in the top 10 positions.

3. The following returns Google's page and webmasterworld pages in the top results

out-of-touch^management
out-of-touch?management
out-of-touch#management
out-of-touch;management
out-of-touch:management
out-of-touch::management (multiple colons produce same results)

4. The following returns the webmasterworld pages and other blog pages

out-of-touch-management
out-of-touch*management
out-of-touch & management

That bloggers are part of the mix is evident. The following urls appear in top positions in many of the above queries
http://members.cox.net/katheesue/2004/07/out-of-touch-management-google-search.html
www.rugles.com/weblog/archives/000177.html
and many more

So, it appears that SE Bombs are here to stay. Use the above strategy of appending keywords to links pointing to an external site or page, mix well with carefully pre-tested delimiters, link secondary pages with similar tricks, let bloggers link to the primary and secondary pages, and voila. Soon others will be jumping in the vanwaggon to the link scheme. Really, not many initial seed links are required to bring a page all the way up to the top and an algorithm all the way down.

Orion

I, Brian
07-25-2004, 08:09 AM
In reply to the first post - the simplest explanation is just that Google devalues links to a page where the majority of the links are same text. This sort of pattern is indicative of link manipulation - link bombing

It's been reported for quite some time now that a filter may be in place to devalue links overwhelmingly sharing the same link text (anchor text).

We've been advising our own link-building clients for some time to use 3-5 different variants in link text. We've also been recommending "naturalising" the links by adding non-keyword elements, in case such a filter is expanded to apply not simply to eqact text matching, but also to too overt repetition of matching key words across link variants.

The surprising thing is that anyone should be surprised if and when such practices may be evidenced - especially when we are possibly looking at nothing more complicated than a lowering of the devaluation threshold.

lots0
07-25-2004, 03:08 PM
At the peak I used ten links on eight different domains. One of the eight was a dot-com, the other seven were dot-orgs. Each domain had a static IP. Four of them were from one Class C, and four from a different Class C. The anchor text was identical: "out-of-touch executives".
Seven of the link pages were PageRank 5 or 6; the other three were lower.

I'm not exactly in the same league as some spammers out there!

And why did it happen suddenly after more than three months? People have been talking about identical anchor text and same Class C for many months now; it's nothing new.
Everyman, please note, I said the "percentage" of internal links with duplicate anchor text. You seem to be thinking that the total number of links is what triggers the flag, I contend that it is the percentage of links that have duplicate anchor text that is the trigger for the flag.

From what you described the percentage of identical anchor text from your internal links is 100%. If my theory is correct (yes it is just a theory), your page would have been flagged at about 65% (+/- 6%)

As to why this happened when it did, well all I can say is that after months and months and months of some folks (like me) preaching over and over again about the power of anchor text, google was bound to take some action.

Although people close to the SEO industry have been talking about duplicate or identical anchor text for some time, up until very recently, as far as I know, google had not taken any obvious actions to change the way they calculated the weight of anchor text.

seobook
07-25-2004, 03:45 PM
From what you described the percentage of identical anchor text from your internal links is 100%. If my theory is correct (yes it is just a theory), your page would have been flagged at about 65% (+/- 6%)

what makes you estimate that percentage?

many occurances on the web likely would naturally fall into that range

I would bet it would be much higher...like closer to 80 - 90%, but then again I am much newer to the web and SEO than you and many other people here are.

lots0
07-25-2004, 05:43 PM
what makes you estimate that percentage?
I did some very quick research on some pages that I control and a few I don't some dropped like a rock some did not. The data sample I used was very limited.

I am not fixed on these numbers. I just lost a lot of software and data so what I posted is not backed up by any real research or data, that is why I called it a theory. I guess for the sake of clarity I should have called it a minimally tested theory. ;)

rustybrick
07-25-2004, 06:26 PM
So your saying that, based on your data, if ~65% of all the anchor text pointing to a page is the EXACT same text, then it will raise a red flag?

seobook
07-25-2004, 06:40 PM
So your saying that, based on your data, if ~65% of all the anchor text pointing to a page is the EXACT same text, then it will raise a red flag?

he said it was a theory on a limited set of test data...

bethabernathy
07-25-2004, 08:35 PM
I have a website that dropped from 6/10 to 0/10 during this episode and I looked at the back links prior to the drop and this was the deal:

1) 50% were class C back links were all residing on the same IP address;

2) 25% backlinks were all using the same anchor text;

3) 25% were all using duplicate content surrounding the anchor text.

HOWZ that for a PENALTY???? :cool:

seobook
07-25-2004, 08:43 PM
I have a website that dropped from 6/10 to 0/10 during this episode and I looked at the back links prior to the drop and this was the deal:

1) 50% were class C back links were all residing on the same IP address;

2) 25% backlinks were all using the same anchor text;

3) 25% were all using duplicate content surrounding the anchor text.

HOWZ that for a PENALTY???? :cool:

were you linking out to any bad stuff?

bethabernathy
07-25-2004, 09:02 PM
No. Nothing like that. :p

Flagstuff
07-26-2004, 04:23 PM
Two observations:
1) Our site recently lost quite a few backlinks with Google - all with one thing in common. We link to those sites. Of all our remaining backlinks, only one is a site we link to. I don't know if it means anything, but it does seem odd.

2) If you search the term, "american flags" on Google, the first two results are a site where the search term does not appear in the text. Google states that the search term only appears in links pointing to the site. The third and fourth results for "american flags" are a site where the search term appears just twice in the text, the title tag is "flags" and there are no meta descrition or keyword tags. In addition, there are only 61 backlinks.

The first site is a ".org" and the second is a ".edu". Maybe I'm reading too much into this, but because of Googles seeming bias in favor of informational sites over commercial sites, is it possible that part of the equation is that all things being equal, a ".org" or ".edu" site will rank better than a ".com" site?

Golgotha
07-26-2004, 05:02 PM
So your saying that, based on your data, if ~65% of all the anchor text pointing to a page is the EXACT same text, then it will raise a red flag?

I have been suspicious of this for months, I called it an 'over optimization penalty' and some people thought I was nuts. Perhaps people don't like the term 'over optimization', but call it what you will.

I have a site that ranks in the top 5, if not 1st, in Yahoo, MSN, Inktomi, Teoma, Wisenut and Zapmeta for 4 of my keywords. That's great, but for Google I am no where to be found! Here's the real kicker - I was ranked in the top 3 in Google for all 4 keywords for 2 years and then after an update 3 to 4 months ago I was nowhere to be found. It's as in Google fliped a filter switch that says, he is too optimized for this keyword, pull him from the SERPS. They leave your PR alone, they just yank you from their SERPS on the over optimized keyword.

I don't know if it's 65% or what the percentage is, but I am of the belief that there is some sort of 'filter' in place.

seobook
07-26-2004, 05:40 PM
I don't know if it's 65% or what the percentage is, but I am of the belief that there is some sort of 'filter' in place.

I know acronyms are usually bad, but this is one spot where my domain name is great. I can get all sorts of "seo" and "search engine optimization" links to offset each other and do it totally naturally :)

bethabernathy
07-26-2004, 07:22 PM
Even though the site I mentioned above dropped from a PR 6/10 to 0/10 the SERPs remain good. This seems to make me think that while some mechanism is in place to catch some criteria related to anchor text in links the filter also must add in some keyword elimination, where our site doesn't contain those keywords and resultingly the SERP's haven't dropped. The only change is that frightening:

http://www.integratedresourcemgmt.com/images/nopr.gif

:eek:

Marcia
07-26-2004, 11:50 PM
Everyman:

>>>I'm not exactly in the same league as

<===== some spammers out there!

Daniel, that's meeeee! ;)


At the peak I used ten links on eight different domains. One of the eight was a dot-com, the other seven were dot-orgs. Each domain had a static IP. Four of them were from one Class C, and four from a different Class C. The anchor text was identical: "out-of-touch executives". Seven of the link pages were PageRank 5 or 6; the other three were lower.

I've got a site that was riding high at the very beginning for all the targeted keyphrases - still is at Yahoo, but gone for many months at Google. Not so one I fixed, which came back stronger than ever. It had far too few links from independent sites outside of the same C-class (which may or may not have been the issue, the jury's not in on that one yet): only about 4 or 5 total, with the main two word phrase as part of the domain name - not hyphenated. Yet, the site came up in the top 5 for an allinanchor: search for the main keyword phrase, and also for what's actually the most important phrase in the site, a section in a subdirectory. Out of about 1500 pages. How could that happen?


And why did it happen suddenly after more than three months? People have been talking about identical anchor text and same Class C for many months now; it's nothing new.

It isn't so much the number of links with the identical anchor text, it's the percentage, and that's been axiomatic for farther back than last Fall, it just wasn't as widespread and there wasn't such a broad-reaching net spread.

Aside from having the main full phrase in the domain name, which isn't necessarily always the best thing, what I did was Googlebomb myself with my own internal navigation,

Anthony Parsons
07-27-2004, 12:51 AM
I say good on Google if they have changed the algo. IMO, if you stuff them around, then you deserve to be stuffed around yourself.

seobook
07-27-2004, 01:00 AM
I say good on Google if they have changed the algo. IMO, if you stuff them around, then you deserve to be stuffed around yourself.

I say whatever man. the job of any decent aggressive SEO is to stuff them around. we are only kidding ourselves if we think our jobs are to assist search engines with their relevancy.

I am not saying that I necissarily promote the hardest stuff or promote stuff I do not agree with just to make money, but the SEO services that are worth the most money are those who know how to play within the rules but make their clients sites seem far more important than they really are.

Anthony Parsons
07-27-2004, 01:29 AM
the job of any decent aggressive SEO is to stuff them around

Aggressive SEO is not stuffing them around though, is it?

Using techniques to have something ranked for a term that isn't even relevant is a major stuff around for the engine, because users don't want to see "sex", when they type in "dog". Hence, users stop using the SE and the SE goes down the crap tube. We'd just be going full circle to the good old days.

If the technique is within the SE's guidlelines of acceptable use, then it isn't stuffing them around, is it? No.

Displaying results that are relevant for relevant terms, is not stuffing them around either, is it? No.

Providing a "newyork property agent" under the term "sydney property agent", for rankings, is stuffing them round once again.

we are only kidding ourselves if we think our jobs are to assist search engines with their relevancy

Well, that's about what I provide for my customers. Who says this guys website who has thousands of dollars to waste on getting top rankings for a particular term, is any more important than the guy ranked at #1000. The only difference is, the poor bloke at #1000 probably doesn't have the money, isn't willing to outlay for a return on investment, or simply has no idea about the web. Chances are, another site built by a designer with a "there ya go" attitude, my job is done, sorry I forgot to tell you that flash websites just won't rank well by themselves.

I don't seem to have any problem achieving relevant rankings for clients within the SE guidelines, as with many many others here.

Marcia
07-27-2004, 01:36 AM
Awww, have a heart Anthony!

There are people who are nothing but relevant, do nothing that's outside of guidelines, and still get stuffed by Google by getting caught up in filters - which incidentally, is easily reversible if someone knows where the dart hit. It's the easiest thing in the world to do, for example by just running site navigation through Dreamweaver Library items or Templates in the normal course of setting up sites. Loads of innocent people out there have been zapped without ever knowing what hit them and with no evil intentions whatsoever.

Anthony Parsons
07-27-2004, 01:45 AM
Totally agree with you Marcia, just not seobooks analogy.

Marcia
07-27-2004, 03:59 AM
I am not saying that I necissarily promote the hardest stuff or promote stuff I do not agree with just to make money, but the SEO services that are worth the most money are those who know how to play within the rules but make their clients sites seem far more important than they really are.

Not necessarily. From what I've gathered the ones that *really* make the most money are the crash_and_burn crowd that knows how to go well beyond the rules, and by the time they get hit they've already padded their wallets and are all set to just roll out some more for the next round.

I say whatever man. the job of any decent aggressive SEO is to stuff them around. we are only kidding ourselves if we think our jobs are to assist search engines with their relevancy.

If "decent aggressive SEO" refers to those hit_and_run folks it's one thing, but otherwise I think we'll have to clarify the terms "decent" and "aggressive" and make sure we know what we're referring to. There are "decent" SEOs who are competent and pro-active promoters, but not necessarily utilizing what are commonly known to be "aggressive" techniques. Then there are "aggressive" SEOs, some of whom are decent and others who are far from it. It's all to easy to get tripped up by ambiguity.

seobook
07-27-2004, 04:38 AM
Aggressive SEO is not stuffing them around though, is it?
yes it is. most of our actions emphasize the commercial portions of the web and hide some of the best info the web has to offer. do some research about various prescription drugs and you will see exactly what I am talking about.

Using techniques to have something ranked for a term that isn't even relevant is a major stuff around for the engine, because users don't want to see "sex", when they type in "dog". Hence, users stop using the SE and the SE goes down the crap tube. We'd just be going full circle to the good old days.
most SEOs do not try to grab extremely untargeted traffic because it is not smart marketing.

Well let me just say first that, in that sense Spam has gotten a lot better over the years. You don't really much have people trying to appear for off topic terms as they tended to. You now have people who are trying to be very relevant.
source: http://www.e-marketing-news.co.uk/april_2004.html

If the technique is within the SE's guidlelines of acceptable use, then it isn't stuffing them around, is it? No.
With enough time and money I can promote anything in the world within their guidelines. It may be more expensive to do that though. When people deviate from the guidelines it is about finding a functional business model and saving money.

Displaying results that are relevant for relevant terms, is not stuffing them around either, is it? No.
Usually we are promoting commercial stuff so this is just a matter of opinion. Search engines want people to search for information and buy from ads.

Who says this guys website who has thousands of dollars to waste on getting top rankings for a particular term, is any more important than the guy ranked at #1000. The only difference is, the poor bloke at #1000 probably doesn't have the money, isn't willing to outlay for a return on investment, or simply has no idea about the web.
And my job is to get clients to rank well in a cost effective manner.

I don't seem to have any problem achieving relevant rankings for clients within the SE guidelines, as with many many others here.
I generally do my best to stay inside the guidelines, but occassioally I go outside of them because it does not make good business sense for me to make it my first priority to stay inside them...especially since they are just scripts and on occasion the logic behind them has errors which can dump my site even if it is perfect.

Everyman
07-28-2004, 05:13 PM
My hypothesis is that in this case, Google zapped the sensitivity of their www.google.com/corporate/execs.html page when the word "executives" is in the search term, except in cases where the word is preceded by a word other than the word "touch."

I don't buy the Class C inspection theory, and I don't buy the requirement that you have to mix up your anchor text -- not when we're talking about a handful of domains, with all but one being dot-org domains.

Google doesn't combat spam as rigorously as you might think. But they are organized enough to do a hand tweak of "executives" because it ticked them off. Posters on SEO forums are inclined toward deus ex machina because it's the geeky thing to do.

Here's a real-time look (http://www.google-watch.org/cgi-bin/goobomb.cgi) at various forms of out of touch executives and out of touch management. I expect, over the next 30 days, that Yahoo will soften its ranking for "executives" because I changed the anchors, and Google will firm up on "management" for the same reason.

If management suddenly drops to zero then I'm sure you will all claim that the algo kicked in again. And I'll claim, once again, that they went for the hand tweak. But if it doesn't drop to zero then you are all wrong, and I'll claim that they're disinclined to do the hand tweak this time, since too many people are watching.

bethabernathy
07-28-2004, 05:26 PM
But then what explains the PR of websites' dropping to 0/10 during this same time period? :)

rustybrick
07-28-2004, 05:30 PM
beth, has your rankings been affected? I think there is a thread about your site. Lets try to keep this thread on the topic of link bombs.

bethabernathy
07-28-2004, 05:37 PM
ooops. sorry. :)

orion
07-28-2004, 10:19 PM
Let's put things in perspective.

1. The above scenarios are dealing with schemes to bring pages up (achieve top positions). One element of the scheme consists in adding false relevancy to target pages by linking to them with links stuffed with specific key phrases. The chosen key phrases do not appear in the target pages. Still, querying Google for the key phrases brings the page up. This way of adding false relevancy is nothing new and has been documented before. So no glory to gain here.

2. The second element of the scheme consists in linking secondary pages to the page containing the spurious key phrase from same IP or domain. This may also include hidden external files stuffed with text links.

3. The third element consists in letting others link to the scheme (eg., bloggers), adding a shear weight to the seed links.

Google's apparent reaction to the scheme: they dropped the pages in question like a rock.

Sites using similar link schemes were also affected. This appear to be the case described by the above posters, but not because they used the first element (appending false relevancy via spurious key phrases) or were part of the blogger network of links. Only they know that. Thus, we may be talking about different scenarios.

In Daniel's case, it appears he bombed Google using the first element of the scheme. Whether he used the other elements or not, only he knows. Me? I take his word as face value. You? I don't know. Google's reaction?: they took action. They have the right to do so. Now the question is: did Google removed his pages manually or not? This is a key question in Daniel's posts.

To be fair with the facts, Daniel's states "And I'll claim, once again, that they went for the hand tweak."

Whether or not this is the case, only Google knows. Anything else would be pure speculation. Certainly, this would not be the first time "that they went for the hand tweak" --a similar action was revealed in the famous SearchKing vs. Google case. Remember? So nothing new here and no new glory to gain out of all this from Google, either.

Daniel has raised an important issue, both from the relevancy and business standpoints by -not sure if "exposing" is the right word- using the "out-of-touch" + executives, management,....+ etc.- key phrase; He has added some focus to Google's management page. This is happening at a time where reality is eating Google's management "dream" team, apparently confirming Daniel's allegations from not from the link side but literally from the management side. Let see.

1. First, Google has fired its Director of Business Operation, venerable IT expert Dr. Brian Reid. Dr. Reid alleges age discrimination. Time will tell.

2. Now this news. According to http://www.enn.ie/frontpage/news-9545272.html, the SEC has served Google's general counsel and vice president of corporate development, David Drummond with an injunction "alleging violation of federal securities laws, including the anti-fraud provisions." The allegations relate to Drummond's work with Irish e-learning company SmartForce, which merged with US-based SkillSoft in 2002 and then admitted to several quarters of accounting problems, prompting an 18-month-long investigation. While this may not affect Google IPO, sure will has an impact in Wall Street circles, especially when the company's general counsel is under investigation by the SEC. In an amended S-1 document, Google has no option but to disclaim that Drummond received a notice from the SEC on July 20, saying the agency will recommend the injunction. Drummond intends to submit a response to the SEC. He has the right to do so.

The facts: in addition to link spammers, Google has now all the relevant sharks, inside and outside, ...waiting. Welcome to the real big leagues, Google.

Orion

AussieWebmaster
07-29-2004, 12:33 AM
<quote>How do we confirm this? Webmasters who see a page drop out for their keywords, and who have optimized for these words in inbound anchor text, should play with the placement of each keyword in the search box and see if this changes the results dramatically for the ranking of their page.</quote>

An interesting way to use the experience.

There is another aspect that Orion brought up and no one has commented on - it ties to the recent discussion on hyphens - when you do the hyphenated search for out-of-touch versus the out of touch the drop in SERPs numbers was interesting but it was lower because when you do the hyphenated the words have to be all there and they have to be together in order. The non-hyphenated search brings pages with one or all of the terms in any place.

Everyman
07-29-2004, 10:53 AM
Orion: I did number one only. I never even thought about number two. When I started the bomb in late February, I thought number three would be a factor because I thought bloggers, for example, would pick up on it. But no one picked up on it. As it turned out, I didn't need them anyway -- my domains were enough and exceeded all my expectations.

AussieWebmaster: What you say is true of using quotation marks in the search box, but not quite true when using hyphens. For the most part, Google treats hyphens as spaces. However, I agree that there is a difference in the ranking of out of touch (no hyphens) vs. out-of-touch. It probably has to do with when in the algo process the hyphens are stripped out.

Orion: How many stupid columns have you read that start out, "If you do a Google search for blah, blah, you find blah, blah results." It's entirely bogus, and Google knows it. But every column like this sounds like the "Ka-Ching" of a cash register for Google's public relations department, and for all those at Google waiting to cash out their options.

The reason why they don't like my bomb is because these same stupid columnists will use it for a cheap shot, once Google's reputation drops like a rock following the drop in their stock price. Do you think these same columnists will bother to find out why Google's execs.html page shows up as number one? No way. These same columnists can't even figure out the difference between using quotation marks in the search box and not using quotation marks.

(I believe Google is partly to blame. Their help system for searchers could be a lot more informative, creative, and interactive. Their "total hits" numbers could be a lot less inflated. They could all afford to have a less bloated image of themselves at the Googleplex.)

My bomb is a taste of their own medicine. They'd call it "blowback" at the CIA, where for decades during the Cold War they were manipulating press coverage of world events.

The reason Google doesn't like my bomb is that it reminds them of what they themselves have been doing in terms of spin and bogus public relations. It's much more powerful than mere spam, which Google has been tolerating for a long time now.

orion
07-29-2004, 11:52 AM
To Aussie and Daniel:

Google not always treats hyphens as mere spaces, especially if hyphens are present as part of an obvious natural language expression. In post #7 we did a quick analysis of hyphens and these were the results (CASE II AND CASE III)

CASE II AND CASE III ANALYSIS (The query mode was FINDALL)

1. out-of-touch returns 424,000 results.
2. out of touch returns 11,900,000 results.
3. out-of-touch executives returns 17,600 results.
4. out of touch executives returns 312,000 results.

We concluded as follow:

"It is clear that the above use of hyphens introduces a degree of selectivity in the queries, affecting the end results (less documents retrieved). These results also suggest that CASE II and CASE III are different scenarios."

If Google treats hyphens as spaces all the time there would be no reason for the huge difference of set returned (thousands vs. million) As for the drop in results, it is clear different set of results are returned with some documents occurring in both sets. We are currently investigating why this is happening and which selectivity criterion is influencing the results, asides natural occurrance of hyphens.

To Daniel:

1. "Orion, How many stupid columns have you read that start out, "If you do a Google search for blah, blah, you find blah, blah results."

Daniel, honestly, a lot.

2. "Do you think these same columnists will bother to find out why Google's execs.html page shows up as number one? No way. These same columnists can't even figure out the difference between using quotation marks in the search box and not using quotation marks."

Daniel, while some columnists may know the difference, not all columnists seem to know the difference or don't understand the topic or even the effect of hyphens in natural language and in forced expressions. I must agree with you here on a case-by-case basis.

Orion

AussieWebmaster
07-29-2004, 04:12 PM
To Aussie and Daniel:

Google not always treats hyphens as mere spaces, especially if hyphens are present as part of an obvious natural language expression. In post #7 we did a quick analysis of hyphens and these were the results (CASE II AND CASE III)

CASE II AND CASE III ANALYSIS (The query mode was FINDALL)

1. out-of-touch returns 424,000 results.
2. out of touch returns 11,900,000 results.
3. out-of-touch executives returns 17,600 results.
4. out of touch executives returns 312,000 results.

We concluded as follow:

"It is clear that the above use of hyphens introduces a degree of selectivity in the queries, affecting the end results (less documents retrieved). These results also suggest that CASE II and CASE III are different scenarios."

If Google treats hyphens as spaces all the time there would be no reason for the huge difference of set returned (thousands vs. million) As for the drop in results, it is clear different set of results are returned with some documents occurring in both sets. We are currently investigating why this is happening and which selectivity criterion is influencing the results, asides natural occurrance of hyphens.

Orion
I think you will find that there is a difference between the spider that collects for the database and what it sees and how it filters and the actual mechanics of the search engine itself.

What we are seeing here is that when you of hyphens in search the search engine does not see the hyphens as just spaces but also as a condition - that the words must all be on the page and in the same order as in the search.

orion
07-29-2004, 05:56 PM
Well put, Aussie. But Daniel has raised the issue of hyphens as being the same as spaces or interpreted as spaces by Google. The following information may shed more light to the hyphenation issue, spider or not and in the process, to parsing issues in connection with hyphenation.

I have mentioned that for a given query and queried database, the results obtained in EXACT mode are a subset of the results obtained in FINDALL mode. Unlike FINDALL searches, EXACT searches are searches with regard for ordering and proximity. I also have mentioned that the use of hyphens introduces a degree of selectivity, too. However, we haven't addressed the question of the effect of hyphens occurring in natural language expressions and in forced expressions. By "forced expressions" we mean expressions in which hyphens are included contrary to proper copy style, average usage or in an arbitrary fashion.

We conducted the searches in FINDALL (default in Google, also known as AND) and EXACT (in Google, same as using quotes) modes. We also conducted the search removing the hyphens. The following results were obtained. (For additional information, to learn what an EXACT and FINDALL search mean -or do not mean-, visit the "Keywords Co-Occurrence and Semantic Connectivity" SEW thread. The following results were added today to my research site. Check link in post #7 of this thread)

NATURAL SEQUENCES

FINDALL MODE
1. computer based instrumentation, 1,010,000
2. computer-based instrumentation, 67,800
EXACT MODE
3. computer based instrumentation, 1,870
4. computer-based instrumentation, 1,870

FORCED SEQUENCES

FINDALL MODE
5. computer-based-instrumentation, 1,870
EXACT MODE
6. computer-based-instrumentation, 1,870

It is clear that
1. Google not always interprets hyphens as spaces, as it can be seen from the corresponding set of results.
2. Results obtained in EXACT mode should be subsets of the corresponding results in FINDALL mode.
3. Results obtained in Queries 5-6 should be a subsets of the results obtained in query 1 (1,010,000 results).
4. Hyphenation in Query 5 and 6 is invalid from the copyright style standpoint.
5. Hyphenation in Query 2 and 4 is valid from the copyright style standpoint.
6. In query 2 being a natural occurrence, computer-based is considered a single term (67,800 results only). This degree of ordering is well discernible from the degree introduced by EXACT searches.
7. Queries 3-6 return identical results since the sequence is interpreted as a single queried term.

The degree of ordering introduced in the query in FINDALL mode and due to hyphenation is not always discernible from the degree of ordering introduced by EXACT queries, especially with short queries.

NATURAL SEQUENCES

FINDALL MODE

1. computer based, 15,700,000
2. computer-based, 3,330,000
EXACT MODE
3. computer based, 3,370,000
4. computer-based, 3,370,000

Discernibility is even worse with short and forced sequences

FORCED SEQUENCES

FINDALL MODE

1. ambiguous hello, 51,700
2. ambiguous-hello, 22
EXACT MODE
3. ambiguous hello, 22
4. ambiguous-hello, 22

The effect of hyphenation or what is considered "natural" or "forced" is not always a black-and-white thing. It can be affected by cultural usage and geographic locations. It also depends on the hyphenation rules used by the queried IR system. As pointed out in this article about http://www.tex.ac.uk/cgi-bin/texfaq2html?label=hyphen "Hyphenation styles are culturally-determined, and the same language may be hyphenated differently in different countries - for example, British and American styles of hyphenation of English are very different."

For an SEO, whether or not these results may be relevant within other scenarios (eg., hyphens in titles, urls, meta data, etc) depend on how the target system interprets hyphens. In the case of Google, this is quite predictable. The above numerical results may change over time.

Orion

orion
07-30-2004, 07:43 PM
What we are seeing here is that when you of hyphens in search the search engine does not see the hyphens as just spaces but also as a condition - that the words must all be on the page and in the same order as in the search.

Aussie, this would only be true for the terms that are hyphenated in the query since they are interpreted as a single term.

A search in FINDALL is a search without regard for sequence or proximity. Using hyphens with queries in FINDALL mode introduces a degree of selectivity. This effect is not the same behavior observed in EXACT mode, as we have demonstrated in previous posts. With long queries is well discernible. With short queries is not and can be mistaken for an EXACT sequence.

A search for computer-based instrumentation in FINDALL mode is interpreted as Query= k1 + k2 and returns results in which k1=computer-based must be present in that sequence (with or without hyphen in the set of retrieved results). However the k2= instrumentation can be anywhere in the document, before or after k1 and without regard for proximity.

We conclude that terms co-occurrence and sequencing as the degree of selectivity introduced by hyphen are things relevant to copy style. The fact that hypenation rules are different in different target IR systems and the fact that hyphenation is culturally-oriented has serious implications for copyright style. SEO and SEM specialists may need to revisit carefully this subject.

Orion

AussieWebmaster
07-30-2004, 08:42 PM
Thanks mate. I always enjoy reading your posts.
I think I am actually learning something. Now I have to find ways to use it.

Everyman
08-03-2004, 12:55 AM
My observations and interpretations about this topic have been enshrined at Google Watch: Google defuses a Google bomb (http://www.google-watch.org/gbomb.html).

orion
08-03-2004, 12:04 PM
Daniel, you have elegantly exposed the major flaws of rank models based on link citation: they are way too easy to manipulate.

Now as to why Google appears to selectively diffuse some bombs and not others speaks tons about them. If one wants to be fair, should be fair all the time, in all cases, not in a selective manner.


Orion

withoutwax
08-13-2004, 11:21 AM
If one wants to be fair, should be fair all the time, in all cases, not in a selective manner.

Assuming the playing field is level. i.e. if webmasters were ignorant of how manipulate it to improve their rankings then what you say is true. However, if they are not then it seems fair to be selective in an attempt to restore the balance.

I, Brian
09-09-2004, 07:33 AM
Didn't the hyphens issue only really start playing a role after the Florida Update?

The Googlewatch article does make an interesting argument regarding the removal of the "out of touch executives" bomb, while the Jewwatch bomb remains.

Everyman
09-09-2004, 08:10 AM
Jewwatch is at number two now, where it has been for months. The "waffles" bomb now has johnkerry.com at number two. I checked it a few weeks ago and it was back at number one. It's true that it had dropped to 16 or so in mid-July when Danny Sullivan mentioned it, but this was a temporary glitch.

The miserable failure and French military victories are still at number one.

My out-of-touch executives and out-of-management haven't moved a bit since mid-July, neither on Google nor on Yahoo. I'm still number one for both on Yahoo, which sort of surprises me since I changed my anchor text from "executives" to "management." I check it every day with my bomb tool.

I remain convinced that Google tweaked the sensitivity to "executives." All the issues I brought up in July are still true. There have been no update movements of any consequence for the results I've been watching since July. For "executives" to drop off the radar like it did all of a sudden in mid-July had to be a tweak at the Googleplex. It's just that Google neglected to tweak "management" at the same time, and at this point it would be too conspicuous to go back and correct this oversight.

I'm still the first Google bomb that got defused by Google. Now all I have to do is convince Danny, and it will be official.

orion
09-09-2004, 10:12 AM
Assuming the playing field is level. i.e. if webmasters were ignorant of how manipulate it to improve their rankings then what you say is true. However, if they are not then it seems fair to be selective in an attempt to restore the balance.

The issue here is not whether Google can run his web property as they please. This is more a credibility issue; doing one thing in private, but then... I would assume now they are public things should change for the better.

About hyphens.

Hyphens mapping or delimiters mapping in general depends on the parsing rules utilized by a SE. We have completed research work on hyphenated queries and soon we would present results. it is clear to me that some (not all) search engines have been interpreting hyphens in the same way as Google interprets this and other delimiters. Here is a hint. Some IR systems interpret hyphens as a "split" tokens, others as a "join" token. Even others interpret hyphenated queries in other manners. I will soon post the results.

In my personal view, Everyman's work points out to the obvious with regard to Google's link-based model.

Orion

lots0
09-15-2004, 03:12 AM
In regards to the hyphen - Over time google has changed how it treats the hyphen and the underscore. I believe that currently google is using the hyphen as what Orion calls a "join" token for most english queries, in the past google has treated the hyphen as a “split token” (space). I also believe that currently geo targeting and local grammar rules are now being played with a little by the googlies, so for some of the baby googles different rules apply.

Orion, I hope you do make your hyphen research public, I for one would love to see it.

-Different Subject-
In this thread I have read several references to SEOs creating “false relevance” or exaggerated relevance in pages so that the pages will rank better. I do not believe that there is such a thing as “false relevance”, something is either relevant or it is not and I don’t believe that you can exaggerate relevance. Relevance, the amount of relevance or the lack thereof (in this context) is totally, completely and ONLY in the mind of the person making the query.

I think that the job of any good SEO is to create a page that truly IS the most relevant (to the search engines and to the searcher) for the targeted keywords. If the searcher does not find your page relevant the chances of making a sale or a conversion drop to almost nothing and if the search engine does not find your page to be relevant, the searcher will never find it in the first place.

lots0
09-15-2004, 03:23 AM
My observations and interpretations about this topic have been enshrined...
I added the bolding.

Strange choice of words, in my opinion....

orion
09-15-2004, 11:51 AM
In regards to the hyphen - Over time google has changed how it treats the hyphen and the underscore. I believe that currently google is using the hyphen as what Orion calls a "join" token for most english queries, in the past google has treated the hyphen as a “split token” (space). I also believe that currently geo targeting and local grammar rules are now being played with a little by the googlies, so for some of the baby googles different rules apply.

Orion, I hope you do make your hyphen research public, I for one would love to see it.

-Different Subject-
In this thread I have read several references to SEOs creating “false relevance” or exaggerated relevance in pages so that the pages will rank better. I do not believe that there is such a thing as “false relevance”, something is either relevant or it is not and I don’t believe that you can exaggerate relevance. Relevance, the amount of relevance or the lack thereof (in this context) is totally, completely and ONLY in the mind of the person making the query.

I think that the job of any good SEO is to create a page that truly IS the most relevant (to the search engines and to the searcher) for the targeted keywords. If the searcher does not find your page relevant the chances of making a sale or a conversion drop to almost nothing and if the search engine does not find your page to be relevant, the searcher will never find it in the first place.

Hi, Lots0

About hyphens

Here is a preview on what we have. Hyphens in Google are interpreted like join token in the following sense: hyphenated queries introduce a degree of ordering to a query mode. So let say we search in the default FINDALL and we add hyphens to some portions of the query. Then the hyphenated portion becomes what I call a "localized EXACT" mode within the FINDALL mode. This effect causes the system to return less results. For short queries the effect is less dramatic, depending on the degree of hyphenation in both queries and documents. In most cases, a two-word phrase, when hyphenated and queried in FINDALL is interpreted as a query in EXACT mode. I'm working on several R&D projects and this one is in its last stages. Since I have to deal to the other projects, I haven't found time to completed, but a hint is given in my research site.

About false relevance.

You're right. What is relevant is something that a user decides, which makes it in the domain of human perception.

The false relevance above, is in regard to the scoring of relevance by SEs (relevance ranking is the common technical term), then the expression "false relevance" implies "false scoring". When one tricks an IR system to assign a score to a phrase or content, then it is about gaming the relevance scoring. So the expression "false relevance" is valid in this context. It is in this context that I used the expression and not in the context of human perception.

I hope this helps to clarify the above.

Orion

lots0
09-17-2004, 01:05 AM
The false relevance above, is in regard to the scoring of relevance by SEs (relevance ranking is the common technical term), then the expression "false relevance" implies "false scoring". When one tricks an IR system to assign a score to a phrase or content, then it is about gaming the relevance scoring. So the expression "false relevance" is valid in this context. It is in this context that I used the expression and not in the context of human perception.

Playing devils advocate;
If someone is, as you say, “gaming” the relevance scoring (increasing the relevance factors), then as a point of fact would you not also be increasing the actual relevance of the page (as determined by the SE algo), thus making the page truly relevant, in the eyes of the SEs, for the query?

“False Scoring”? If you increase the relevance score, is not the page by default more relevant?

orion
09-17-2004, 10:41 PM
About hyphens and Google. Our research suggests that hyphens act as localized EXACT mode within a FIND ALL mode. Because of this many think that Google interprets hyphens as spaces. We have found many instances in which this is not the case. In the particular case of a two terms query that is hyphenated, this query in FINDALL mode tend to work as an EXACT mode.

About the relevance part. There are many ways to game a scoring system, especially those based on link metrics. Daniel Brandt link bomb is a good example. One can force a link-based scoring system to assign false weight to a document for a given query when no even the queried terms are present in the document or when no even the query relates to the content of the document.


Orion

Lance Housley
09-20-2004, 12:01 PM
When I'm teaching a course on how to search Google I usually observe that hyphenating search terms is equivalent to putting them in quotation marks as a phrase search. However, it's not really that simple. For the purposes of the research reported below, I have assumed that the highlighting of terms in the SERPS is accurate, and really does reflect what Google has matched...

If I search for the phrase "apple pie" using the quotation marks, then Google matches documents which have apple pie as adjacent words, and also documents that have
apple-pie as a hyphenated term in the text or the title.It does not seem entirely consistent about matching apple-pie where it occurs in the URL - some get highlighted, others do not. It does not match applepie where that occurs as a single word in the text or in the URL.


If I search for the hyphenated term apple-pie then Google matches documents which have apple pie as adjacent words, and also documents that have
apple-pie as a hyphenated term, and also documents that have
applepie as a single word.It also seems fairly consistently to match occurences of apple-pie and of applepie in the URL.


If I search (using quotation marks as well as a hyphen) for "apple-pie" then Google matches documents which have
apple pie as adjacent words, and also documents that have
apple-pie as a hyphenated term in the text or the title.It does seem rather more consistent about matching apple-pie where it occurs in the URL. It does not match applepie where that occurs as a single word in the text or in the URL.


Finally, if I search for applepie as a single term, Google matches ONLY those documents where the single word appears.


My conclusions, based on these very simple tests, are that Google treats the hyphen as a "split token" when indexing a page, but as both a "split token" AND a "join token" when conducting a search. Nonetheless, Google does seem a little less sure about how to match sub-strings in the URL.

orion
09-20-2004, 01:24 PM
When I'm teaching a course on how to search Google I usually observe that hyphenating search terms is equivalent to putting them in quotation marks as a phrase search. However, it's not really that simple. For the purposes of the research reported below, I have assumed that the highlighting of terms in the SERPS is accurate, and really does reflect what Google has matched...

If I search for the phrase "apple pie" using the quotation marks, then Google matches documents which have apple pie as adjacent words, and also documents that have
apple-pie as a hyphenated term in the text or the title.It does not seem entirely consistent about matching apple-pie where it occurs in the URL - some get highlighted, others do not. It does not match applepie where that occurs as a single word in the text or in the URL.


If I search for the hyphenated term apple-pie then Google matches documents which have apple pie as adjacent words, and also documents that have
apple-pie as a hyphenated term, and also documents that have
applepie as a single word.It also seems fairly consistently to match occurences of apple-pie and of applepie in the URL.


If I search (using quotation marks as well as a hyphen) for "apple-pie" then Google matches documents which have
apple pie as adjacent words, and also documents that have
apple-pie as a hyphenated term in the text or the title.It does seem rather more consistent about matching apple-pie where it occurs in the URL. It does not match applepie where that occurs as a single word in the text or in the URL.


Finally, if I search for applepie as a single term, Google matches ONLY those documents where the single word appears.


My conclusions, based on these very simple tests, are that Google treats the hyphen as a "split token" when indexing a page, but as both a "split token" AND a "join token" when conducting a search. Nonetheless, Google does seem a little less sure about how to match sub-strings in the URL.

Hi. Lance.

Precisely, that's what our research reveals. The reason is that hyphens act as localized EXACT mode (same as using double quotes). However, note that a search in EXACT mode is not exactly a search for phrases (see previous post at this thread or at the Keywords Co-occurrence thread), but a search with regard for order and proximity.

For a two-term hyphenated query in FINDALL mode of the form k1-k2, the query is often interpreted by Google as a query in EXACT mode; thus total number of results should be almost the same. It is this effect what make people to believe that Google interprets hyphens as spaces.

However, for k1-k2 + k3 in FINDALL mode (a search without regard for order or proximity) k3 can appear before or after k1-k2 but k1 and k2 often appear as prescribed by the EXACT mode; but now the total number of results will be different as for a query of the type k1 + k2 + k3. This dissimilarity effect increases with the size of the query.

I will soon publish the entire research work.

Orion

Lance Housley
09-21-2004, 05:48 AM
I will soon publish the entire research work.Orion

I'm looking forward to it, Orion, but would make one plea -

Most of what is written in these forums (?fora) is written from the point of view of the website creator and often seems irrelevant to the searcher. Of course, in reality it is not irrelevant at all, since it impacts on the results.

So when you write up your research, do please remember that the results could be (perhaps ought to be?) of enormous interest to those who retrieve as well as to those who create...

Lance Housley
09-21-2004, 06:16 AM
For the purposes of the research reported below, I have assumed that the highlighting of terms in the SERPS is accurate, and really does reflect what Google has matched...

I really ought to point out the flaw in my reasoning in this post - it is not entirely true that one can rely on the highlighting in Google's SERPS to show what Google has matched.

For example, if I try a search for "arvo part" I get about 24000 results. But pretty close to the top (currently the 5th result) is a link to Classical Net - Basic Repertoire List - Pärt at www.classical.net/music/comp.lst/part.html (http://www.classical.net/music/comp.lst/part.html) where the phrase arvo pärt occurs a good number of times but the unaccented phrase occurs absolutely nowhere in the text, and only once in the URL of an outbound link.
(One might suppose this link to indicate that the linked-to page is actually more relevant than the page I found, and probably should not be taken as evidence of the found page's value at all. Fortunately Google is too clever to accept that supposition.)

However, Google has obviously matched something. The simple fact that the page figures in my results demonstrates that. But nothing is highlighted since the phrase I searched for does not actually occur. I draw the conclusion that the highlighting cannot be taken to reliably show what has been matched in all circumstances.

Meanwhile, I'm still trying to fathom what, if anything, Google does about cross-matching accented characters with non-accented ones.

orion
09-21-2004, 11:14 AM
So when you write up your research, do please remember that the results could be (perhaps ought to be?) of enormous interest to those who retrieve as well as to those who create...

Hi, Lance.

Good point, Lance. I must do that.

Often the ones that design as the ones that search have different degrees of knowledge about querying. In my view, once one understand the difference between EXACT searches, phrase searches and exact matches things are a lot clear. Not all SEs interpret these in the same way and even other make no distinction between these.

In the case of Google, an EXACT mode search (using quotes) is definitely not a phrase search, but a search with regard for order and proximity. This in great part may explain your example, above (k1=apple, k2=pie). The proximity part is affected by which delimiters, symbols and stopwords are to be ignored (stopword and regexp library). This also explains why EXACT searches (with quotes) in Google return documents with k1 and k2 separated by certain symbols, delimiters and stop terms. From the user standpoint the result text may not look like a "phrase", but still is a match with regard for order and proximity (k1 then k2).

I'll also discuss the highlighting part, which in some engines is just an usability feature, commonly mistaken for a phrase match.

Orion

Everyman
02-27-2005, 05:00 PM
Final proof that Google did a hand job on my "out-of-touch executives" Googlebomb last July. This is a report on how it ranks as of February 27. It's been a year now since this Googlebomb was first launched. It was doing extremely well in Google until it disappeared suddenly in July. There has been lots of churn in the rankings since July, so it's time to re-examine Danny's suggestion that the disappearance of my Googlebomb was merely a "sign of new link analysis shift."

There are two forms of the Googlebomb. One is "out-of-touch executives" and the other is "out-of-touch management." The second form is largely derivative of the first, because the word "management" is in Google's title for the target page. My argument was that Google's page at www.google.com/corporate/execs.html was desensitized for this bomb.

I took all my Googlebomb links down last July, but the bomb lives on through other links on forums and blogs. Each of the two forms of the bomb was tested with three variations. One is with quotes around the phrase plus the hyphens in "out-of-touch," another is no quotes but using the hyphens, and the third is no quotes and no hyphens.

That gives you two forms and three variations per form, for a total of six tests.

All six tests rank Google's page at number one today on both Yahoo and MSN.

On Google, the quotation marks plus the hyphens rank at number 66 for "out-of-touch executives" and at 69 for "out-of-touch management."

The other four do not show up in Google, except that I did find out of touch management without quotes ranking at 790.

"Well, that just shows that Google, unlike Yahoo and Microsoft, is fixing the Googlebomb problem," all the idiot Google cultists will reply. "Take off your tin foil hat!"

Sure they are. Look at the Googlebombs that are still thriving:

french military victories -- still number one after two years

miserable failure -- still number one

waffles -- number one (for johnkerry.com)

jew -- almost always number one for jewwatch.com, but number three on a few IP addresses

Why is it that Google can do a hand job on "out-of-touch executives" but they cannot do anything about jewwatch.com except show their apology in a special sponsored link:

Our search results are generated completely objectively and are independent of the beliefs and preferences of those who work at Google. Some people concerned about this issue have created online petitions to encourage us to remove particular links or otherwise adjust search results. Because of our objective and automated ranking system, Google cannot be influenced by these petitions. The only sites we omit are those we are legally compelled to remove or those maliciously attempting to manipulate our results.
The answer is that criticizing Google, Inc. is always considered "malicious" by Google. Even though everyone might be laughing, either Sergey or Larry didn't think it was funny, and that was the end of my Googlebomb. But criticizing the French or a presidential candidate is considered funny by Sergey or Larry, so those survive.

Unless, of course, you are doing AdWords. In that case you couldn't even mildly criticize a presidential candidate.

AussieWebmaster
02-27-2005, 10:07 PM
It's a pity they fired the now infamous blogger, he may have uncovered and spilled some insight into this!

I, Brian
02-28-2005, 08:06 AM
Google is pretty screwy at the moment - it's not the best time for forming conclusions on linking patterns.

jorock
02-28-2005, 02:30 PM
Is there any way to tell how google indexes dashes in inbound link text?

apple-pie = apple pie
or
apple-pie = applepie
or
apple-pie = apple-pie