View Full Version : Google Bots Overloading Server?
panana
09-18-2008, 02:15 AM
Hello All -
I had a friend email me that he needed help. He got a message from his hosting company saying that they were on the verge of suspending his site because the Google bot(s)was hitting his site so much that he was bringing the shared server down to a snail's pace.
What I want to know is - Is it possible for Google bots to be accessing a site so much that they would be bringing down a server? Here's a quote from the message they sent him:
"We have found that your site is causing server wide load problems for the
shared web server it is using. Specifically, the site is being crawled
excessively by Googlebots.
In order to resolve this issue, we will need for you to restrict the
information on your site available to a Googlebot. You can do so with the
appropriate directives in a robots.txt file, or by adding the meta tag <meta
name="Googlebot" content="nofollow" /> to the webpage."
This just doesn't sound right. To be an ongoing problem, Google would have to be hitting the site hundreds of times each and every day. His site is not ranked very high, so I can't see Google coming by that often.
As a friend, I especially don't like them forcing him to use a NoFollow and bringing internal pages value down just to satisfy them.
Is this possibly an attack of some kind, using the Google bot identity as a front? I can't see this being a legitimate, constant problem originating from Google's cataloging.
Any help is appreciated. Hope everyone is having a great Fall so far -
Jon
JohnW
09-18-2008, 07:38 AM
>You can do so with the
appropriate directives in a robots.txt file, or by adding the meta tag <meta
name="Googlebot" content="nofollow" /> to the webpage."
What a load of garbage. Anyhow, in Google webmaster tools there is an option where you can have some control of the rate of Gbot crawling. Also if you are worried, the log files may help you confirm if it's really Gbot.
jimbeetle
09-18-2008, 09:15 AM
Yeah, garbage. I'd also have your friend either trace the e-mail headers or contact the host to make sure it actually came from them.
Wouldn't be the first time that a competitor tried to get someone to nuke themselves.
freeflyer
09-18-2008, 12:22 PM
its not garbage... a bot can create thousands of consecutive requests, particularly to the database. A poolrly executed ecommerce site (such as certain modded oscommerce sites) have particularly bad problems with bots slowing the server to a crawl, but the site have to be badly written. Admittedly the server has to be pants in the first place, but it can happen.
mcanerin
09-18-2008, 01:08 PM
This is why most search engines support the crawl-delay command.
The syntax is:
User-agent: *
Disallow:
Crawl-delay: 5
The number is seconds.
Having said this, I'm very suspicious of this email. Either your ISP is admitting to having hardware that is extremely underpowered (search engines don't even download images - it's not bandwidth, it's just hits) or this is a trick by someone to get you to commit suicide.
If it's actually from your ISP - switch ISP's or fix the underlying issue with your database or trace the IP's of the googlebot visists and verify that they ARE googlebots and not just a DOS attack using googlebot agent strings.
If it's actually from your competitor (or anyone other than your ISP) - keep doing what you are doing, since you are obviously scaring someone who is too lazy to actually try to compete with you instead, and therefore doesn't deserve to rank well anyway.
Above all, NEVER use a robots.txt disallow function for bandwidth control unless the robots are actually spidering areas you don't want them to spider.
Ian.
JohnW
09-18-2008, 02:57 PM
>its not garbage
Sorry, but I think it is garbage for a hosting company to give this kind of stupid advice about nofollow (like this is supposed to stop crawling?)and encourage their clients to commit nofollow/robots.txt suicide with their Google rankings. If it's free hosting, then fine, let the client know that they have to start paying.
JohnW
09-18-2008, 03:04 PM
Ian, good point about robots crawl delay. Still, if Gbot is in fact the problem isn't WMT a good place to deal with it?
panana
09-18-2008, 04:01 PM
Thanks everyone for the replies and all the help! My friend actually did call the host and talked to them so this is a real email from them(!).
Again, greatly appreciate the info and confirming that this just couldn't be happening. At least not the way the host is saying it is.
jimbeetle
09-19-2008, 11:33 AM
Okay then, legitimate, but still not the best advice. Best bet now is too take the steps Ian outlined above:
-crawl delay
-check that requests actually are from googlebot
-check db and site structure
-consider host change if all above checks out
freeflyer
09-20-2008, 05:20 AM
Sorry, but I think it is garbage for a hosting company to give this kind of stupid advice about nofollow (like this is supposed to stop crawling?)and encourage their clients to commit nofollow/robots.txt suicide with their Google rankings. If it's free hosting, then fine, let the client know that they have to start paying.
john, admittedly the advice was garbage, but not the reasoning behind it (ie bots can slow servers), which is what it looked like was being said :)
Marcia
09-21-2008, 04:48 AM
(ie bots can slow servers)It sound like those servers are lacking capability in the first place. Also, in cases I've heard of where a site was using too much in the way of server resources, the host asked them to upgrade to a dedicated server.
Their suggestions is a little too "SEO-savvy" for my liking, it sounds like an out of line suggestion.
freeflyer
09-22-2008, 10:27 AM
the bots only slow the server IF the site is written poorly, ie generating hundreds of unnecessary queries every time, when a handful is all thats needed.
jimbeetle
09-22-2008, 12:21 PM
the host asked them to upgrade to a dedicated server
The absence of the upsell in the e-mail from the host is what first made me think it might be a scam.
ScottG
09-22-2008, 06:39 PM
(search engines don't even download images - it's not bandwidth, it's just hits)
Really? I thought that they just didn't come around as often as their normal bot might.
http://images.google.com/images?as_st=y&um=1&hl=en&client=firefox-a&rls=org.mozilla%3Aen-GB%3Aofficial&q=site%3Ablog.searchenginewatch.com+2008&btnG=Search+Images
And here is some "obama" pic (fake?):
http://blog.searchenginewatch.com/blog/img/obama%20picture.jpg
that is on SEW. Another site re-used it, and this was re-associated with SEW as well:
http://images.google.com/images?um=1&hl=en&client=firefox-a&rls=org.mozilla%3Aen-GB%3Aofficial&q=http%3A%2F%2Fblog.searchenginewatch.com%2Fblog%2 Fimg%2F&btnG=Search+Images
And with engines like Cuil, lol, image bot = fail:
http://www.robleto.com/2008/07/28/cuil-picture-association-fail/
http://pixelbits.wordpress.com/2008/07/29/hey-cuil-got-glasses/
And I've seen some references in the past to: "Googlebot-Image/1.0"
thedevnull
09-26-2008, 09:29 AM
I think you can set the crawl rate in Google Webmaster Tools as well...
JohnW
09-26-2008, 10:16 AM
Yeah, that was my take as well but WMT is not totally clear how it works. I think that in robots.txt you can only set the delay not the rate.