Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 11-16-2004   #1
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
Google quirks summary (all lies...)

Google is experiencing some… quirks currently. Why not, I thought, put them together in a summary and see if there is a trend. I think there might be a common factor between two of four quirks.

Quirk One:

Google lies.

The link: feature does not seem able to return any meaningful information. We still get PR 4 pages with backlinks (and this tally is occasionally adjusted) but we now also get pages without PageRank 4 and backlinks on. We also get pages with no link to the site in question and often it seems highly unlikely that there was ever a link to our study site.

Some observations

That Google has dropped the PageRank qualifier is an assumption - but a fairly strong one. The PageRank 4 value was imposed by Google back when PageRank 4 probably held a different value in their algorithms. As Yahoo does not filter backlinks in this way many people have noticed that Yahoo’s the better search engine to go to for backlink investigations and so we might assume Google move to counter this.

Previously redirecting links (such as affiliate links) have not counted as a backlink. These days it is possible to find sites with only affiliate link-throughs to the study site via the link: command.

Quirk Two:

Google lies.

The site: feature does not always return pages from the specified domain. When this occurs and other domains are included in the restricted search the other domain pages all redirect to the main site. The redirect will be either a zero second meta refresh (a technique/occurrence now known as PageJacking) or a 301 redirect.

Some observations

Some users report quirk two as a cache issue. If they repeat the site search often enough then, eventually, Google’s results update and report on only one domain. This is not the case in all the searches where this quirk occurs. Some of the redirecting domains which can be included in this quirk are old and have not seen the light of day in a normal Google search in months.

It seems quite possible that Quirk One and Quirk Two are related. Quirk Two shows that Google’s understanding/definition of what a page’s URL might be (ie, what domain it is on) is not as expected and therefore it’s not surprising we get mysterious backlink results.

Quirk Three:

Google lies.

‘“the” is a very common word and was not included in your search’

A google for [r the b] returns 46,200,000 results, the word “the” is included in the blue Web bar and the Google desktop search highlights the “the”.

A google for [r b] returns 15,900, 000 results (ie, fewer) and different SERPS compared to [r the b]

Similar SERP differences can be had by including or omitting the search term “and” for [r and b] or [r & b].

Some observations

It seems likely that the [r the b] search is being translated as [r %wildcard% b] as this could explain why there are three times as many results. The same could be true of [r and b].

The Google desktop does not omit ‘stop words’. You can search happily for the word ‘the’. [Note; there’s no lie here, Google doesn’t say the Desktop Search omits common words.]

Quirk Four:

Google lies.

The Google toolbar’s assessment of a page’s PageRank can vary by as much as two units on the Google directory’s assessment of a page’s PageRank.

Some observations

This is not a new quirk (none of these are especially new) but does illustrate yet another example of disharmony within the collection of Google technologies.
Wail is offline   Reply With Quote
Old 11-16-2004   #2
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
Forgive me for asking a silly question: What is your point?
Nick W is offline   Reply With Quote
Old 11-16-2004   #3
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
There's no point of action here. This is an informal analysis. This is a set of observations which, I hope, some people might find useful, interesting, or use as a discussion spark.
Wail is offline   Reply With Quote
Old 11-16-2004   #4
dannysullivan
Editor, SearchEngineLand.com (Info, Great Columns & Daily Recap Of Search News!)
 
Join Date: May 2004
Location: Search Engine Land
Posts: 2,085
dannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud of
It resonates with me. We had a session recently at SES Stockholm where people were once again complaining about the inaccuracy of Google's backlink command. My comment was I understand why they limit the tool -- but if they are going to do so, they should make that clear on the site. The site says:
Some words, when followed by a colon, have special meanings to Google. One such word for Google is the link: operator. The query link:siteURL shows you all the pages that point to that URL. For example, link:www.google.com will show you all the pages that point to Google's home page. You cannot combine a link: search with a regular keyword search


The command, of course, does NOT show ALL the links that point to a URL. So Google ought to make that clear.

It would be useful to compare and contrast each of the lies to Google's competitors, though. Google's a big target -- but if others are the same, then the entire industry should be called to task over "lies." And some of this isn't new. AltaVista used to get plagued with complaints over inaccuracy. And so the circle turns...

FYI, Jeremy Zawodny had a similar post to this one back last year: Lies Google Tells Me. He was upset about how Google was saying stopwords were ignored when in reality that wasn't quite the case.



My personal favorite is this. Check out this Google cached page from HowStuffWorks.com. At the top, there's the usual Google disclaimer:
Google is not affiliated with the authors of this page nor responsible for its content.


That's not true. Google is indeed affiliated with the page. That page carries AdSense ads, not to mention a Google search box on it. Google is earning money off that page. There's definitely an affiliation.

I pointed this out in my What Happened To My Site On Google? article last year, the members edition that dealt with AdSense and site ranking. Since then, still no change to the text.

I still don't feel as strong to call them lies. I don't think Google or other search engines, when these things happen, are overtly trying to lie. To me, it's more a case that they are so busy going forward that they don't go back and catch-up on the documentation. But it doesn't make it less annoying.

Last edited by dannysullivan : 11-16-2004 at 07:36 AM. Reason: fixed typo
dannysullivan is offline   Reply With Quote
Old 11-16-2004   #5
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
Google have a long history of (as danny has pointed out above) telling fibs and generally misbehaving with tools and commands that search marketers might use.

We were talking about copyright issues recently and the fact that they do not provide much in the way of payback (in terms of traffic) from either cached pages or specifically the 'define:' command.

All in all there are a few skeletons in the closet and a number of things that may be tested in law before to long.

oh... one last thing: Im not certain on the validity of this but check out tis post at passingnotes google encourages cheating but does google cheat?

I'd say that they have some house cleaning to do before the £@!t hits the fan in court over some of this stuff...
Nick W is offline   Reply With Quote
Old 11-16-2004   #6
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
Thanks Danny,

I was posting a talking point... and yeh, drumming at the dramatic with the lies angle It is interesting to see what people think of issues/quirks as they arrive. I guess a quirk only comes an issue when Joe Public gets concerned about it.

The cache is a great example. Here's my favourite Google's not affiliated with... Blogger.
Wail is offline   Reply With Quote
Old 11-19-2004   #7
webcertain
Multilingual web marketing
 
Join Date: Jun 2004
Location: York, North Yorkshire, UK
Posts: 53
webcertain is on a distinguished road
Just to add one problem to the cache issue:
This cache provides interesting data:
"This is G o o g l e's cache of http://www.kra.co.uk/web/portfolio.html as retrieved on 31 Dec 1969 23:59:59 GMT.
G o o g l e's cache is the snapshot that we took of the page as we crawled the web..."
I wasn't even born in 1969
Cheers,
Johann

Last edited by webcertain : 11-19-2004 at 05:49 AM.
webcertain is offline   Reply With Quote
Old 11-19-2004   #8
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
Yeah, that's some kind of timestamp issue they have right now, there are quite a few reports of that floating around...

the UNIX epoch starts Jan 1st 1970 and a unix timestamp is the amount of seconds elapsed since that time...

Nick
Nick W is offline   Reply With Quote
Old 11-19-2004   #9
webcertain
Multilingual web marketing
 
Join Date: Jun 2004
Location: York, North Yorkshire, UK
Posts: 53
webcertain is on a distinguished road
Google's own 31/12/1999 23:59 bug
webcertain is offline   Reply With Quote
Old 11-19-2004   #10
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
I actually have an outstanding bug report at Google too; where the toolbar button couldn't find a cache but the search results could.

It's not been a healthy week for Google tech!
Wail is offline   Reply With Quote
Old 11-20-2004   #11
GoogleGuy
Unofficial Representative
 
Join Date: Jul 2004
Location: Mountain View, CA
Posts: 66
GoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of light
Strange way to put it, Wail. Any search engine has quirks, but that doesn't mean that they're lies or intentional. I'll take a stab at the things you mentioned.

Quirk 1: The link: command. I think I gave a pretty complete backgrounder in this thread to let people know the status of the link: command. Someone pointed out that on our features.html page we were saying that we returned all backlinks, and our webmaster changed the page the next day (today) to make the page more accurate.

Quirk 2: site: command returns pages from additional domains. I recall seeing this happen several months ago, but I haven't seen it recently. I can believe that if we consider two pages as dupes, we might do this though. If you want to mention specific instances, I'll be happy to ask someone to check it out and see if there's any bug involved, but it doesn't seem as if Google's intentionally doing anything wrong here?

Quirk 3: the. Yes, "the" is treated as a stopword. You can add a plus sign (+the) if you really want to search for "the". You can also quote the search, so [three the mice] will match lots of docs, but ["three the mice"] will find the 50 or so docs that actually have the phrase "three the mice".
Handy tip #1: Danny's written about stopwords plenty of times in the past. (So has Tara Calashain, if you actually look at the results for ["three the mice".) You can also use '*' to match any word in quotes. So "three * mice" will match "three blind mice", "three rapping mice", "three lying mice", but not "three sarcastic weasel mice" because that's too many words.
Handy tip #2: When you're talking about searches, use [ and ] to make the search clear. So [to be or not to be] would be a search without the quotes, while ["to be or not to be"] would be the search with the quotes.

Quirk 4: "toolbar PageRank and directory PageRank can differ by up to two units". Given that the toolbar and the directory go up to different maximum values (10 and 8, yah?), and that in a fully incremental index, updates can happen asynchronously in one area like the directory compared to the toolbar display, this wouldn't surprise me an iota.
Handy tip #3: If you're trying to get subpixel accuracy by comparing directory vs. toolbar PageRank, you're paying too much attention to PageRank. The time would be better spent looking at your server logs to see what keywords people are using to reach your site, in my opinion.

I agree that it's a good discussion spark, but making the leap to say that toolbar PR and directory PR having different scales is somehow Google lying sounds like too much of a leap, to me at least.

Okay, I gotta get some sleep. GregB: I'm working, but I'm not at work right now. Sorry, inside joke for Greg..
GoogleGuy is offline   Reply With Quote
Old 11-20-2004   #12
bobmutch
seocomapny.ca|Project Support Open Source|Top 40 Dirs rated by Inbound Link Quality
 
Join Date: Aug 2004
Location: london.on.ca
Posts: 575
bobmutch has a spectacular aura aboutbobmutch has a spectacular aura about
GoogleGuy: Ok, I will go and check my logs after this post. Just a quick questions concerning one of your comments.

"10 and 8 yah?": So the Toolbar Scale is 10 (1-10) and the Google Directory is a scale of 8 (cleardot.gif, 5/35, 11/29, 16/24, 22/18, 27/13, 32/8, 38/2, 44/0 [pos.gif/neg.gif]). Ok there is 9 units here. Which one are you not counting?

Currently www*google*com is the only domain as far as I can see that has a Ranking in the Google Directory of 44/0. How come www*google*com has a 44/0 and no one else does when it is not even the strongest site (most PR10 pages) on the internet?

(Currenty my PR10 Pages page shows that Google has 42 PR10 pages and Adobe has 49 PR10 pages.)

I have seen a number of different graphics comparing the 2 scales.

[e]Chris Raimondi (Raimondi[/e] site is down) has one with a GD scale of 8 and a TB scale of 11.


(Graphic is mine but based off [e]Raimondi[/e] scale.)

Mark Sobek has one with a GD scale of 7 and a TB scale of 10.


(This is Graphic is a copy off Sobek's site.)

I am guessing from what you are saying that the GD scale of 8 and a TB scale of 10 look link this?


(Graphic is mine based off GG comments?)

While some may think these kind of questions are silly and don't matter, this kind of information is important to me.

Last edited by bobmutch : 11-21-2004 at 10:04 AM.
bobmutch is offline   Reply With Quote
Old 11-20-2004   #13
GoogleGuy
Unofficial Representative
 
Join Date: Jul 2004
Location: Mountain View, CA
Posts: 66
GoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of light
Quote:
How come www*google*com has a 44/0 and no one else does when it is not even the strongest site (most PR10 pages) on the internet?
We didn't do anything special for Google, if that's what you're asking. There's too many things to do and not enough time to do it in; it would be silly/sophomoric to do things like tweak the display of Google's PR up.

I understand if people think that discerning fine levels of granularity of PageRank are fun; I just think there's better ways to spend your time. I don't think much about PageRank these days--that's why I wasn't even positive whether eight was the maximum value over in the directory. I'd recommend things like log analysis, keyword research, spending time developing new content that targets new phrases instead of one single glory or vanity keyword phrase, reading new papers, or spending time on non-SEO things like the family, pets, etc.
GoogleGuy is offline   Reply With Quote
Old 11-20-2004   #14
bobmutch
seocomapny.ca|Project Support Open Source|Top 40 Dirs rated by Inbound Link Quality
 
Join Date: Aug 2004
Location: london.on.ca
Posts: 575
bobmutch has a spectacular aura aboutbobmutch has a spectacular aura about
GoogleGuy: No I wasn't fishing to see if Google hand changed it. I was asking why it had a 44/0 and I am thinking your answer is because it is stronger than any of the other sites in the Google Directory. I will accept that.

I would love to know how many units in the TBPR and GDPR scales, you said 10 and 8? Could you check on that for me please : )

Last edited by bobmutch : 11-20-2004 at 11:30 PM.
bobmutch is offline   Reply With Quote
Old 11-21-2004   #15
chris
www.searchguild.com - like Threadwatch.org but with more threads :)
 
Join Date: Jun 2004
Posts: 21
chris has disabled reputation
Quote:
Chris Riding (Ridings site is down) has one with a GD scale of 8 and a TB scale of 11
Nope, not I. That would have been Chris Raimondi's searchnerd.com one. (I disagree that you could make that comparison between the two scales).
chris is offline   Reply With Quote
Old 11-22-2004   #16
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
Cool.

Thanks for the reply GoogleGuy. The forums are a strange world; any post with gets a GoogleGuy reply (I'm a poet and don't even know-it) counts as a success.
Wail is offline   Reply With Quote
Old 11-26-2004   #17
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation PR/Damage Control

I’m still waiting to hear the GoogleGuy Team explanation on their OR implementation at the the character “|” thread. The message users, testers, and university MLS researchers are getting is that Google’s OR implementation is contrary to query theory. See

1. Laura Cohen’s Boolean Searching on the Internet
2. Greg Notess’s Google Inconsistencies

Meanwhile, someone named kendos at the http://googleguy.zorgloob.com site conveniently and selectively is quoting some SEW and WMW posts.

Orion

Last edited by orion : 11-26-2004 at 11:58 AM.
orion is offline   Reply With Quote
Old 11-26-2004   #18
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
Another nice one to look at is:

inurl:-
versus
inurl:_

That's a hark to the Unix way of doing it; where, weirdly, _ counts within the alpha-numeric range (whereas | doesn't).
Wail is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off