View Full Version : new GOOGLEBOT?
critter
09-28-2004, 04:57 PM
Hello All..
I recently noticed and then read about new GOOGLEBOT...
This is the old one we all know so well:
66.249.64.47 - - [15/Sep/2004:18:59:12 -0700] "GET /robots.txt HTTP/1.0" 404 1227 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
This is the NEW one:
66.249.66.129 - - [15/Sep/2004:18:12:51 -0700] "GET / HTTP/1.1" 200 38358 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
What does everyone think?
My feeling is this is being used to prevent duplicate content, stolen content, cloaked sites, spam, doorway pages and any other method of spam. Seems to me by using two spiders, they can index your site fully with one of the spiders and then partially with the second and compare the two sets of data to see any discrepencies and unlawfull tactics in Google's EYES..
Any thoughts?
CRITTER
Nick W
09-28-2004, 05:03 PM
>>thoughts
Is it april fools? - Are they really usinga Moz UA id or are you pulling my leg? :)
The IP is certainly theirs....
Nick
rustybrick
09-28-2004, 05:07 PM
You got that from WMW or an article at WebProNews (http://www.webpronews.com/insiderreports/searchinsider/wpn-49-20040928DidGoogleUnleashAdditionalGooglebots.html) ?
critter
09-28-2004, 05:09 PM
You got that from WMW or an article at WebProNews (http://www.webpronews.com/insiderreports/searchinsider/wpn-49-20040928DidGoogleUnleashAdditionalGooglebots.html) ?I had noticed this just prior to the WPN article.........I'm curious though......which is why I asked. Noticed it hit my site...
Incubator
09-28-2004, 05:10 PM
I just got my WebProNews email in and there's definite mention of it
Cheers
WC
seomike
09-28-2004, 05:27 PM
I just pulled up some site tracking from one of my larger sites and all my googlebots are in a completely different ip range
64.68.82.33 crawler11.googlebot.com
64.68.82.174 crawler14.googlebot.com
64.68.82.189 crawler15.googlebot.com
64.68.82.182 crawler14.googlebot.com
64.68.82.28 crawler10.googlebot.com
64.68.82.184 crawler14.googlebot.com
64.68.82.170 crawler14.googlebot.com
64.68.82.13 crawler10.googlebot.com
64.68.82.141 crawler13.googlebot.com
64.68.82.143 crawler13.googlebot.com
64.68.82.144 crawler13.googlebot.com
64.68.82.185 crawler14.googlebot.com
64.68.82.146 crawler13.googlebot.com
64.68.82.168 crawler14.googlebot.com
64.68.82.201 crawler15.googlebot.com
64.68.82.30 crawler10.googlebot.com
64.68.82.164 crawler14.googlebot.com
64.68.82.14 crawler10.googlebot.com
64.68.82.197 crawler15.googlebot.com
64.68.82.167 crawler14.googlebot.com
64.68.82.47 crawler11.googlebot.com
64.68.82.169 crawler14.googlebot.com
64.68.82.79 crawler12.googlebot.com
64.68.82.181 crawler14.googlebot.com
64.68.82.136 crawler13.googlebot.com
64.68.82.178 crawler14.googlebot.com
64.68.82.45 crawler11.googlebot.com
64.68.82.10 crawler10.googlebot.com
64.68.82.44 crawler11.googlebot.com
64.68.82.142 crawler13.googlebot.com
Notice how crawler 14 changes it's IP. That is how they try and catch cloakers and spammers. I'd say nothing new just an updated spider probably more adept at reading css, java, redirects and flash .swf files.
Incubator
09-28-2004, 05:35 PM
Fantomaster, any input you have i would like to hear
Cheers
WC
We (fantomaster.com) have seen this UserAgent over 3 months ago, so it's not a new one:
Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
During the last 2 weeks we noticed lots of new spiders (about 200) coming from 3 new IP ranges:
66.249.64.xxx
66.249.65.xxx
66.249.66.xxx
It's not unusual that search engines use new IP ranges from time to time. We have often seen this in the past.
Therefor I can't see an indication that the new IP ranges are used to decloak pages. We are monitoring over 10.000 domains,
most of them are normal domains with no cloaked pages.
And those domains are also heavily spidered from the new IP ranges.
Dirk
Incubator
09-28-2004, 06:29 PM
Thanks Dirk, appreciate the input. I didnt think it was due to cloaking because if the IP being new, wasnt in the bot list, it would have been re-directed to the user home page anyway
Thanks again Dirk
cheers
WC
I, Brian
10-10-2004, 10:53 AM
Platinax is a new domain - I installed a forum from scratch - but I kept the forum closed while I applied Dani's mod_rewrite solution to it.
Cool. Now my forum is all static URLs.
Excepting that the new crawler 66.249.6x.xx has somehow found a way into my dynamic content and is indexing that instead. There's a good 20 or so Googlebots on the forums right now, and they are all eating up the dynamic content. And not the static content.
I closed off a lot of areas with robots.txt to prevent the indexing of possible duplicate content - static archives, printer friendly version, etc - plus the memberlist, user profiles, and a host of other sections to help focus PR into topics.
So I'm completely baffled at how the spider can not only have found dynamic content, but is greedily consuming it - while pretty much ignoring the static content.
There's something not right there.
Marcia
10-10-2004, 11:35 AM
Brian, is there any kind of redirection being used somewhere along the line in the process that could be returning a redirect - like a 302?
I, Brian
10-10-2004, 11:39 AM
I'm really not all that clued up on redirects as I normally have little use for them myself - there are a lot of "[L]"'s about in the .htaccess file, though. :)
What's interesting is the first bots to come in were from the normal range, and they picked up on the static files pretty immediately, and left with a 100 or so of those. So it was fascinating to see the new bots picking up an entirely different set of content when they came in today.
Brian, more and more spiders are also crawling dynamic pages.
BTW, if you are using the Google toolbar with PageRank enabled you send Google the info about the dynamic URLs directly. So they know what to spider.
Dirk
I, Brian
10-10-2004, 12:57 PM
Indeed, Googlebots normally have no problem spidering vBulletin 3's - but I was sort of hoping to have the mod_rewritten HTML versions indexed, not the dynamic versions! I don't use the toolbar in my main browser either. I'm actually a little stumped as to how and why the Googlebots have sniffed their way into the dynamic URLs, when I thought all backdoors were closed to them. Has to be some small flaw somewhere.
Dirk, you're probably the biggest specialist in SEO when it comes to spiders - have you seen any unusual behaviour in the new bots? And would you personally see any connection between the new http 1.1 bots and the AdSense Mediabots? Or would you say the shared IP range is simply a matter of focussed resource use, rather than a deeper relationship between the two?
No, we have not seen unusual behaviour of the new bots.
It's not only the IP range which is shared. We have noticed also Googlebots and Mediapartners bots sharing single IPs.
I think this is only a matter of shared resources nothing else.
BTW, you could send me your mod_rewrite directives and I would check them.
Dirk
I, Brian
10-10-2004, 04:07 PM
Thanks for the comments, Marcia and Dirk. And indeed - I saw the same IP even being used as well. That's what got myself wondering if it implied any kind of working relationship of any kind.
As for the mod_rewrite code - thanks for the comments and offers - but to try and keep this thread on track, I've opened a different thread (http://forums.searchenginewatch.com/showthread.php?t=2087) on that issue, if that's okay. :)