PDA

View Full Version : Site getting hammered by crawler


Marcia
01-16-2007, 08:34 PM
It keeps showing this

Host: 72.30.216.22
/suspended.page/
Http Code: 404 Date: Jan 17 01:12:31 Http Version: HTTP/1.0 Size in Bytes: -
Referer: -
Agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Host: 74.6.67.78
/suspended.page/
Http Code: 404 Date: Jan 17 01:11:24 Http Version: HTTP/1.0 Size in Bytes: -
Referer: -
Agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Host: 74.6.74.155
/suspended.page/
Http Code: 404 Date: Jan 17 01:08:45 Http Version: HTTP/1.0 Size in Bytes: -
Referer: -
Agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Host: 74.6.71.43
/suspended.page/ ...tens of thousands of times in a row.

Has anyone seen this happen? Is this a crawling problem or a server misconfiguration of some kind all of a sudden?

vicyankees
01-18-2007, 02:04 AM
Do you have something in your basewebpage that a server error page also pulls from? looks like the spider hit an error and fell into a continuous cycle that would probably ultimately crash the server.

evilgreenmonkey
01-18-2007, 03:13 AM
Hi Marcia,

/suspended.page/ is a URL mainly used by the CPanel control panel, appearing when a site exceeds its bandwidth limit or is suspended by the web host. When the suspension is activated, all URLs for that domain will redirect (302 I think) to this page and therefore sometimes get included in the search engine's index.

If I was to hazard a guess, I would say that the site in question was suspended during a Slurp crawl and every URL in their index was visited and redirected to this URL.

I would counteract this by sticking the following in your robots.txt:

User-agent: *
Disallow: /suspended.page/

And then plea with your hosting provider to add the following to the top of the index.php file in the suspended.page folder:<?php
header("HTTP/1.0 503 Service Unavailable");
?>This should tell all the spiders to not index the folder and that the site is currently down for maintenance (rather then index a default suspended page).

If you have a CPanel reseller account (i.e. use WHM and can create new hosting accounts), you should be able to edit the contents of suspended.page yourself, as it's usually located in the root folder of your main hosting account.

:cool:

Rob