Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 09-07-2004   #1
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
Google malfunctioning?

http://www.business-talk.co.uk/showthread.php?t=947

While I know that the lack of PR update isn't necessarily a worry issue, there seems to be some interesting indexing problems.

Business.com seems one of the most serious indicators of something up - but I've had someone ring me today about disappearing from the index, and another forum member still hasn't had their index page cached after an expired domain filter was lifted a few weeks back - despite that he has a few thousand decent links to his sites.

I've set up new pages recently as well, and despite links to them being on PR6 pages, Google does seem to be taking it's sweet time to come through and index the new pages.

Is something major happening to the Google crawl? Or is this just another minor hiccup, akin to when google made the Google directory PR0 a few weeks back?
I, Brian is offline   Reply With Quote
Old 09-07-2004   #2
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
Absolutely no information whatsoever Brian, but my intuition is telling me that they just *may* be working on dealing with that homepage problem that's been cropping up for months. If they're not they need to be, so it's just liable to be a close enough wild guess.

I did see something on one of the data centers that was different from the regular SERPs. A site showed up that's been MIA from the index because of a homepage mess-up on the site's part; they messed up big time with linking with and without the www (and threw some dup pages and meta directs in to boot). So it figures that the site disappeared, but there it was for the search at the other data center just like it would have normally been without the problem.

I can see the issue from Google's end, there was a lot of spamming going on a couple of years ago using the www and non-www with inbounds, so it can't be easy if that's what's happening.

That's my guess with recent events - I imagine it would make it difficult to come up with accurate PR.
Marcia is offline   Reply With Quote
Old 09-07-2004   #3
sugarrae
I Should Be Working
 
Join Date: Jun 2004
Location: In front of the computer.
Posts: 116
sugarrae has a spectacular aura aboutsugarrae has a spectacular aura aboutsugarrae has a spectacular aura about
Business.com may not come up when you type www.business.com in the search, but Google sure knows it's there:

http://www.google.com/search?hl=en&l...3Abusiness.com

and has a cache from yesterday:

http://216.239.39.104/search?q=cache...ness.com&hl=en

Lots of results also show for:

http://www.google.com/search?hl=en&l...3Abusiness.com

Interesting glitch though.
sugarrae is offline   Reply With Quote
Old 09-07-2004   #4
andrewgoodman
 
andrewgoodman's Avatar
 
Join Date: Jun 2004
Location: Toronto
Posts: 637
andrewgoodman is a name known to allandrewgoodman is a name known to allandrewgoodman is a name known to allandrewgoodman is a name known to allandrewgoodman is a name known to allandrewgoodman is a name known to all
The exact same type of thing happened to LookSmart's home page some time ago (I think it was December 2002). PR0 for awhile.

My belief at the time was that Google might be messing around with a direct competitor. Perhaps there were partnership or buyout talks ongoing. Anything is possible I suppose.

Business.com profits from every click on its results in the organic listings... to the point where they are actually doing a fair bit of AdWords advertising to increase traffic to directory pages. ("Keyword arbitrage.")

So if Google "accidentally" yanks companies like this out of their index for awhile, it wouldn't be that shocking to me.

(Although Business.com's home page seems to be at PR0, many internal pages seem to be at 5 or 6. And oh yes, of course we need to be aware that PageRank Doesn't Matter. Or at least the PR reported on the toolbar doesn't necessarily matter.)

LookSmart's home page is currently a PR9. Category pages in LookSmart's directory appear to be showing PR's of between 0 and 7.

Another thing to note is that both LookSmart and Business.com include AdSense ads as part of their mix. The question is the extent to which they do it, and the revenue shares involved. Those negotiations must be interesting to say the least.
andrewgoodman is offline   Reply With Quote
Old 09-07-2004   #5
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
Certainly there's a cache - but not via the toolbar (though links are now showing). Toolbar issue only? I'm figuring it's a symptom of a possibly deeper problem.

I'm not sure if Google has purposefully blasted business.com - but it definitely looks like a new design since I last looked, and that infamously complicates indexing.

I also remember the Google directory went PR0 a couple of weeks back - but, overall, I'm wondering if something somewhere has gone a little awry at Google.

Compar just posted a link in v7n that suggests I'm not the only one thinking this way:
http://www.w3reports.com/index.php?itemid=549

The author claims that the way Google stores page info is probably maxxed out - though that does depend upon the assumptions being presented.

Overall, the article suggests that either there's a strawman to burn, or else Google is currently - and has been for while now - reconstructing how it stores page information. This could explain not simply major sites going PR0 and losing cache, but also the PR update drought.

Something to think about, anyway.
I, Brian is offline   Reply With Quote
Old 09-08-2004   #6
cariboo
Member
 
Join Date: Jun 2004
Location: Paris, France
Posts: 33
cariboo is on a distinguished road
I, Brian, I think you pointed out the right issue : the size of Google's index, but I don't agree with you when you explain it by a "technical limitation".

This ID problem is an urban legend. A low skilled techie can solve it without any problems, and Google have a big team of PhD's. In fact, Google has a special "know how" about building huge "scalable" systems, with hundreds of machines working together... Gmail is their last demonstration of this know how.

Somebody already gave a good explanation of this size issue : Danny Sullivan.

http://searchenginewatch.com/searchd...le.php/3071371

This article dates back to september 2003...

Well, I firmly believe Danny's assumptions are not just assumptions, but a rather good description of how it works...

The index size have nothing to do with relevancy. It's a good thing to show a bigger index than your competitors, but it's only a matter of communication.

The fundamental problem is not to index everything, but to index important pages... The web contains a huge amount of "junk" pages, with no information in it... Dynamic pages can produce many pages with duplicate content. And there is also many "out of date" pages, that nobody is interested in, even as an archive...

I think Google doesn't crawl as many pages as they claim on their home page... Danny estimated it to 75% (Googleguy quoted this article a few months ago. What he said about it is interesting : he described danny sullivan's sayings as being the best things written about this subject)

They don't do it, because they doesn't have to... I believe they have been building, brick by brick, a more efficient search engine, focused on "freshness". The first brick, layed in april/may 2003, was to change their way of crawling sites. It became fully operational in july 2003.

Googlebot used to be an archetype of a batch crawler... But it became an incremental crawler last year.

People interested in this subject can read these papers, written by Cho and Molina, from Stanford.

The Evolution of the Web and
Implications for an Incremental Crawler
Junghoo Cho, Hector Garcia-Molina
Department of Computer Science
Stanford, CA 94305
December 2, 1999

Searching the Web
Arvind Arasu, Junghoo Cho, Hector Garcia-Molina,
Andreas Paepcke, and Sriram Raghavan

Efficient Crawling Through URL Ordering
Junghoo Cho, Hector Garcia-Molina, Lawrence Page
Department of Computer Science
Stanford University

Parallel Crawlers
Junghoo Cho
University of California, Los Angeles

Hector Garcia-Molina
Stanford University
cariboo is offline   Reply With Quote
Old 09-09-2004   #7
unreviewed
Member
 
Join Date: Jun 2004
Posts: 46
unreviewed has disabled reputation
I think the trouble at business.com, is that they seem to have changed all sub-pages to point to business.com/index.asp. If you look at the Wayback Machine, they didn't link to index.asp, the links were direct to the root url business.com.
unreviewed is offline   Reply With Quote
Old 09-10-2004   #8
AussieWebmaster
Forums Editor, SearchEngineWatch
 
AussieWebmaster's Avatar
 
Join Date: Jun 2004
Location: NYC
Posts: 8,154
AussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant future
SiteProNews did a story today on the topic. It was the changes made at Business.com that losty them PR apparently. A 302 vs a 301 redirect.
AussieWebmaster is offline   Reply With Quote
Old 09-10-2004   #9
unreviewed
Member
 
Join Date: Jun 2004
Posts: 46
unreviewed has disabled reputation
apple.com - redirect 302 - PR 10
ibm.com - redirect 302 - PR 9
microsoft.com - redirect 302 - PR 10


Lots of good web sites using a 302.
unreviewed is offline   Reply With Quote
Old 09-10-2004   #10
AussieWebmaster
Forums Editor, SearchEngineWatch
 
AussieWebmaster's Avatar
 
Join Date: Jun 2004
Location: NYC
Posts: 8,154
AussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant future
Quote:
Originally Posted by unreviewed
apple.com - redirect 302 - PR 10
ibm.com - redirect 302 - PR 9
microsoft.com - redirect 302 - PR 10


Lots of good web sites using a 302.
But are all their pages 302 redirects that are permanent redirects? or do they use 302s to temporarily redirect?
AussieWebmaster is offline   Reply With Quote
Old 09-10-2004   #11
unreviewed
Member
 
Join Date: Jun 2004
Posts: 46
unreviewed has disabled reputation
Not sure what you mean AW, a permanent redirect would be a 301.

The only "big gun" web site that I know of, that did use a 301, was w3.org, and they are no longer using a redirect. At least not a couple of days ago when I last checked.

I think the web master at Business.com just plain goofed. You should never link to your homepage as in domain.xxx/index.xxx There are very good reasons to link only to the root domain. PR reasons, and the fact that you may change your homepage, perhaps in this case they switched to ASP ... I don't know, but I can see in the WayBack Machine, that they were properly linking to the root domain and now they are not.

Regardless if that is or isn't the problem, they should change the site wide sub links to point just to the root url.
unreviewed is offline   Reply With Quote
Old 09-10-2004   #12
AussieWebmaster
Forums Editor, SearchEngineWatch
 
AussieWebmaster's Avatar
 
Join Date: Jun 2004
Location: NYC
Posts: 8,154
AussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant future
Quote:
Originally Posted by unreviewed
Not sure what you mean AW, a permanent redirect would be a 301.

The only "big gun" web site that I know of, that did use a 301, was w3.org, and they are no longer using a redirect. At least not a couple of days ago when I last checked.

I think the web master at Business.com just plain goofed. You should never link to your homepage as in domain.xxx/index.xxx There are very good reasons to link only to the root domain. PR reasons, and the fact that you may change your homepage, perhaps in this case they switched to ASP ... I don't know, but I can see in the WayBack Machine, that they were properly linking to the root domain and now they are not.

Regardless if that is or isn't the problem, they should change the site wide sub links to point just to the root url.
I think there has just been miscommunication here.
Business.com had problems with 302 when 301 should have been used as well as the index.html etc issue.

I now 301s are permanent... but you had mentioned others using 302s and I was clarifying that they were temp redirects for short-terms solutions to promotions or other reasons.
AussieWebmaster is offline   Reply With Quote
Old 09-10-2004   #13
unreviewed
Member
 
Join Date: Jun 2004
Posts: 46
unreviewed has disabled reputation
I see.

But no, those site listed have been using a 302 since I first noticed, late last year. They simply prefer to use a 302 rather than a 301. To suggest that Google will give you a PR 0 for using a 302 would mean that Google is indeed broken. I think that the fact Google will treat,

domain.xxx
www.domain.xxx
domain/homepage.xxx
www.domain.xxx/homepage.xxx

... all as different pages, is the most obvious factor. I certainly hope that Google isn't having problems with 302s, just think of how THAT would effect us, and I think the forums would be "lousy" with posts of webmasters crying out loud.
unreviewed is offline   Reply With Quote
Old 09-11-2004   #14
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
Just to add to the topic, even if Google is not malfunctioning, it seems to be changing other aspects - here's a report claiming that the toolbar checksum has been changed after the prior one was cracked (actually, much earlier than the reported June):
http://www.prweaver.com/blog/2004/09...ksum-algorithm
I, Brian is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off