View Full Version : Google malfunctioning?
I, Brian
09-07-2004, 09:30 AM
http://www.business-talk.co.uk/showthread.php?t=947
While I know that the lack of PR update isn't necessarily a worry issue, there seems to be some interesting indexing problems.
Business.com seems one of the most serious indicators of something up - but I've had someone ring me today about disappearing from the index, and another forum member still hasn't had their index page cached after an expired domain filter was lifted a few weeks back - despite that he has a few thousand decent links to his sites.
I've set up new pages recently as well, and despite links to them being on PR6 pages, Google does seem to be taking it's sweet time to come through and index the new pages.
Is something major happening to the Google crawl? Or is this just another minor hiccup, akin to when google made the Google directory PR0 a few weeks back?
Marcia
09-07-2004, 12:24 PM
Absolutely no information whatsoever Brian, but my intuition is telling me that they just *may* be working on dealing with that homepage problem that's been cropping up for months. If they're not they need to be, so it's just liable to be a close enough wild guess.
I did see something on one of the data centers that was different from the regular SERPs. A site showed up that's been MIA from the index because of a homepage mess-up on the site's part; they messed up big time with linking with and without the www (and threw some dup pages and meta directs in to boot). So it figures that the site disappeared, but there it was for the search at the other data center just like it would have normally been without the problem.
I can see the issue from Google's end, there was a lot of spamming going on a couple of years ago using the www and non-www with inbounds, so it can't be easy if that's what's happening.
That's my guess with recent events - I imagine it would make it difficult to come up with accurate PR.
sugarrae
09-07-2004, 12:31 PM
Business.com may not come up when you type www.business.com in the search, but Google sure knows it's there:
http://www.google.com/search?hl=en&lr=&ie=UTF-8&c2coff=1&q=allinurl%3Abusiness.com
and has a cache from yesterday:
http://216.239.39.104/search?q=cache:Zrzt5s-k5sMJ:www.business.com/+allinurl:business.com&hl=en
Lots of results also show for:
http://www.google.com/search?hl=en&lr=&ie=UTF-8&c2coff=1&q=site%3Abusiness.com
Interesting glitch though.
andrewgoodman
09-07-2004, 01:08 PM
The exact same type of thing happened to LookSmart's home page some time ago (I think it was December 2002). PR0 for awhile.
My belief at the time was that Google might be messing around with a direct competitor. Perhaps there were partnership or buyout talks ongoing. Anything is possible I suppose.
Business.com profits from every click on its results in the organic listings... to the point where they are actually doing a fair bit of AdWords advertising to increase traffic to directory pages. ("Keyword arbitrage.")
So if Google "accidentally" yanks companies like this out of their index for awhile, it wouldn't be that shocking to me.
(Although Business.com's home page seems to be at PR0, many internal pages seem to be at 5 or 6. And oh yes, of course we need to be aware that PageRank Doesn't Matter (http://www.google.com/search?sourceid=navclient&ie=UTF-8&q=%22pagerank+doesn%27t+matter%22). Or at least the PR reported on the toolbar doesn't necessarily matter.)
LookSmart's home page is currently a PR9. Category pages (http://search.looksmart.com/p/browse/us1/us317828/us317851/) in LookSmart's directory appear to be showing PR's of between 0 and 7.
Another thing to note is that both LookSmart and Business.com include AdSense ads as part of their mix. The question is the extent to which they do it, and the revenue shares involved. Those negotiations must be interesting to say the least.
I, Brian
09-07-2004, 02:02 PM
Certainly there's a cache - but not via the toolbar (though links are now showing). Toolbar issue only? I'm figuring it's a symptom of a possibly deeper problem.
I'm not sure if Google has purposefully blasted business.com - but it definitely looks like a new design since I last looked, and that infamously complicates indexing.
I also remember the Google directory went PR0 a couple of weeks back - but, overall, I'm wondering if something somewhere has gone a little awry at Google.
Compar just posted a link in v7n that suggests I'm not the only one thinking this way:
http://www.w3reports.com/index.php?itemid=549
The author claims that the way Google stores page info is probably maxxed out - though that does depend upon the assumptions being presented.
Overall, the article suggests that either there's a strawman to burn, or else Google is currently - and has been for while now - reconstructing how it stores page information. This could explain not simply major sites going PR0 and losing cache, but also the PR update drought.
Something to think about, anyway.
cariboo
09-08-2004, 03:31 AM
I, Brian, I think you pointed out the right issue : the size of Google's index, but I don't agree with you when you explain it by a "technical limitation".
This ID problem is an urban legend. A low skilled techie can solve it without any problems, and Google have a big team of PhD's. In fact, Google has a special "know how" about building huge "scalable" systems, with hundreds of machines working together... Gmail is their last demonstration of this know how.
Somebody already gave a good explanation of this size issue : Danny Sullivan.
http://searchenginewatch.com/searchday/article.php/3071371
This article dates back to september 2003...
Well, I firmly believe Danny's assumptions are not just assumptions, but a rather good description of how it works...
The index size have nothing to do with relevancy. It's a good thing to show a bigger index than your competitors, but it's only a matter of communication.
The fundamental problem is not to index everything, but to index important pages... The web contains a huge amount of "junk" pages, with no information in it... Dynamic pages can produce many pages with duplicate content. And there is also many "out of date" pages, that nobody is interested in, even as an archive...
I think Google doesn't crawl as many pages as they claim on their home page... Danny estimated it to 75% (Googleguy quoted this article a few months ago. What he said about it is interesting : he described danny sullivan's sayings as being the best things written about this subject)
They don't do it, because they doesn't have to... I believe they have been building, brick by brick, a more efficient search engine, focused on "freshness". The first brick, layed in april/may 2003, was to change their way of crawling sites. It became fully operational in july 2003.
Googlebot used to be an archetype of a batch crawler... But it became an incremental crawler last year.
People interested in this subject can read these papers, written by Cho and Molina, from Stanford.
The Evolution of the Web and
Implications for an Incremental Crawler
Junghoo Cho, Hector Garcia-Molina
Department of Computer Science
Stanford, CA 94305
December 2, 1999
Searching the Web
Arvind Arasu, Junghoo Cho, Hector Garcia-Molina,
Andreas Paepcke, and Sriram Raghavan
Efficient Crawling Through URL Ordering
Junghoo Cho, Hector Garcia-Molina, Lawrence Page
Department of Computer Science
Stanford University
Parallel Crawlers
Junghoo Cho
University of California, Los Angeles
Hector Garcia-Molina
Stanford University
unreviewed
09-09-2004, 10:08 PM
I think the trouble at business.com, is that they seem to have changed all sub-pages to point to business.com/index.asp. If you look at the Wayback Machine, they didn't link to index.asp, the links were direct to the root url business.com.
AussieWebmaster
09-09-2004, 11:19 PM
SiteProNews did a story today on the topic. It was the changes made at Business.com that losty them PR apparently. A 302 vs a 301 redirect.
unreviewed
09-10-2004, 08:38 AM
apple.com - redirect 302 - PR 10
ibm.com - redirect 302 - PR 9
microsoft.com - redirect 302 - PR 10
Lots of good web sites using a 302.
AussieWebmaster
09-10-2004, 10:44 AM
apple.com - redirect 302 - PR 10
ibm.com - redirect 302 - PR 9
microsoft.com - redirect 302 - PR 10
Lots of good web sites using a 302.
But are all their pages 302 redirects that are permanent redirects? or do they use 302s to temporarily redirect?
unreviewed
09-10-2004, 01:06 PM
Not sure what you mean AW, a permanent redirect would be a 301.
The only "big gun" web site that I know of, that did use a 301, was w3.org, and they are no longer using a redirect. At least not a couple of days ago when I last checked.
I think the web master at Business.com just plain goofed. You should never link to your homepage as in domain.xxx/index.xxx There are very good reasons to link only to the root domain. PR reasons, and the fact that you may change your homepage, perhaps in this case they switched to ASP ... I don't know, but I can see in the WayBack Machine, that they were properly linking to the root domain and now they are not.
Regardless if that is or isn't the problem, they should change the site wide sub links to point just to the root url.
AussieWebmaster
09-10-2004, 01:19 PM
Not sure what you mean AW, a permanent redirect would be a 301.
The only "big gun" web site that I know of, that did use a 301, was w3.org, and they are no longer using a redirect. At least not a couple of days ago when I last checked.
I think the web master at Business.com just plain goofed. You should never link to your homepage as in domain.xxx/index.xxx There are very good reasons to link only to the root domain. PR reasons, and the fact that you may change your homepage, perhaps in this case they switched to ASP ... I don't know, but I can see in the WayBack Machine, that they were properly linking to the root domain and now they are not.
Regardless if that is or isn't the problem, they should change the site wide sub links to point just to the root url.
I think there has just been miscommunication here.
Business.com had problems with 302 when 301 should have been used as well as the index.html etc issue.
I now 301s are permanent... but you had mentioned others using 302s and I was clarifying that they were temp redirects for short-terms solutions to promotions or other reasons.
unreviewed
09-10-2004, 07:04 PM
I see.
But no, those site listed have been using a 302 since I first noticed, late last year. They simply prefer to use a 302 rather than a 301. To suggest that Google will give you a PR 0 for using a 302 would mean that Google is indeed broken. I think that the fact Google will treat,
domain.xxx
www.domain.xxx (http://www.domain.xxx/)
domain/homepage.xxx
www.domain.xxx/homepage.xxx (http://www.domain.xxx/homepage.xxx)
... all as different pages, is the most obvious factor. I certainly hope that Google isn't having problems with 302s, just think of how THAT would effect us, and I think the forums would be "lousy" with posts of webmasters crying out loud.
I, Brian
09-11-2004, 04:22 AM
Just to add to the topic, even if Google is not malfunctioning, it seems to be changing other aspects - here's a report claiming that the toolbar checksum has been changed after the prior one was cracked (actually, much earlier than the reported June):
http://www.prweaver.com/blog/2004/09/09/10-toolbar-checksum-algorithm