PDA

View Full Version : Recognizing Search Engine Bots


ymgem
12-18-2005, 05:01 PM
Hi,

I'm not sure this is in the right forum, but there wasn't another, more appropriate one.

I am very, very new at site creation and have only just uploaded my very first brand new site, so please excuse me if my questions seem a bit naive.

How can I know, from my webstats, which bot has read my site?

The obvious ones, like google or msn are written, but what are:

1. BLA
2. ia_archiver-web.archive.org
3. MetaTagRobot

As I said, my site is very, very new, so what other ones should I expect over the coming weeks?

And the million dollar question, should anybody know the answer, is how long after they appear in my stats should I expect to receive visitors from search engines?

dannysullivan
12-19-2005, 10:12 AM
In terms of search engine traffic, there are only four main spiders you care about:

Googlebot, which is Google's spider, and more info on it here:
http://www.google.com/webmasters/bot.html

Slurp, which is Yahoo's spider, and more info on it here:
http://help.yahoo.com/help/us/ysearch/slurp/

MSNBot, which is MSN's spider, and more info on it here:
http://search.msn.com/docs/siteowner.aspx

Teoma, which is the Ask Jeeves spider, and more info on it here:
http://sp.ask.com/docs/about/tech_crawling.html

Good stats programs will already identify these for you.

Beyond the ones above, the major search engines may also operate additional spiders that might use different user agent names (the names I've shown in bold above).

A long-standing public database of spiders is here
http://www.robotstxt.org/wc/active/html/

However, not all spiders may be entered into it or updated.

The IAB has a regularly updated list of spiders here:
http://iab.net/standards/spiders/Spiders.asp

However, it's only accessible to IAB members.

Need to ban some misbehaving spiders? Here's a bad bots list:
http://www.kloth.net/internet/badbots.php