PDA

View Full Version : Googlebot Making Invalid Page Requests?


devised
09-09-2005, 07:42 PM
We've had a few nice ranking pages slowly start to drop in rankings. Upon further investigation, we noticed that a handful of our deep pages (2 categories deep) only have the url showing in the index now. When checking the log files, we found that Gbot was not requesting the right pages anymore and was leaving out the category and filenames of the page requests. The 10 page requests were similar to the following and all came from various categories of our site (notice it left out the 2nd category and filename in the page request):
xx.xxx.xx.xx - - [17/Jun/2005:00:34:19 -0400] "GET /category_name//.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1;

I've read/heard recently (can't remember where) that Google has been experimenting with removing query string variables to see if the page loads without them and was wondering if that might have something to do with it, but I can't find any more info on it. Does that ring a bell with anyone? Can someone let me know where that article/interview is?

Has anyone seen Gbot behave like this?

Many thanks in advance for any info!

softplus
09-11-2005, 03:04 PM
... or perhaps it's testing for 404's that don't 404? It does do that if you've submitted a Google sitemap... however, the URL you mention doesn't look like the other Bad-Page-Check-URLs Google (and Yahoo) use...

Testing parameters would make sense as well...

Chris_D
09-11-2005, 07:37 PM
As softplus noted, it could be Googlebot testing your sites 404 response. However, usually they request completely unrelated URLS when testing for 404 response.

Do you have Javascript based navigation on the site?

Gbot was not requesting the right pages anymore and was leaving out the category and filenames of the page requests

I've seen this recently on another site - the issue seems to be that Google is now trying to parse the Javascript based navigation (the client also had some non Javascript nav).

The way the Javascript is written means that Googlebot is only getting it half right. The Javascript URLS don't have the full path name - so Googlebot is requesting urls like www.somesite.com/blahblah.php - when the correct url is actually www.somesite.com/dir1/dir2/blahblah.php - because only the blahblah.php part is easily identifiable in the script - and some other vaiables are also missing from the file name (again. where they aren't obvious from the Javascript).

<EDIT>20051003 - It turns out that Google wasn't trying to parse the Javascript - there were some server issues related to testing an alternate Nav system - which Google picked up - nothing to do with Googlebot parsing Javascript</EDIT>

Let me know - but I'm reasonably sure that you have some Javascript nav on the site....

devised
09-11-2005, 08:29 PM
Do you have Javascript based navigation on the site?

Nope. I'm personally against using JavaScript for navigation and most other purposes. If a user has control over making something on the site work or not, then I'd rather use other means.

Over the last couple days, Googlebot has returned and has requested those 10 pages that it had problems with, but this time it requested the correct versions of the url. The index still hasn't been updated yet (it still shows the url-only pages when using the site: operator), but hopefully that will change soon.

Also, I've never submitted a sitemap to Google either.

Chris_D
09-11-2005, 08:55 PM
I'm personally against using JavaScript for navigation and most other purposes

Thats 2 of us

:)

Are there any external links to these odd pages?

devised
09-12-2005, 01:16 AM
Are there any external links to these odd pages?Not that I can find.

Rob
09-14-2005, 02:34 PM
Could it be that they are relative links that the crawler is having problems with?

I've seen this on a couple sites that use frontpage includes that have relative links - the links work in browsers but can cause spidering problems.

devised
09-14-2005, 03:41 PM
Could it be that they are relative links that the crawler is having problems with?

I've seen this on a couple sites that use frontpage includes that have relative links - the links work in browsers but can cause spidering problems.
No, I used absolute links.

Googlebot came back and requested those 10 pages again and they are all back in the serps ranking again. However, now there's 10 other deep pages that have the same problem. I think Googlebot's been drinking on the job.