Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 03-15-2006   #1
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
Googlebot called me on my Timpani Live Help

I just got a call (request for chat) from the Googlebot to my live help system (we use Timpani). This seems very interesting, because it indicates that the crawler was not only able to read the JavaScript but also execute the command to "call."

Does this seem out of the ordinary to anyone?

I wonder if we will get "bonus points" for actually answering? By the way the Googlebot is a little rude...didn't even respond to my greeting.
Chris Boggs is offline   Reply With Quote
Old 03-15-2006   #2
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
This is most likely someone changing their useragent to GoogleBot in their browser.
rustybrick is offline   Reply With Quote
Old 03-15-2006   #3
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
damn it's gone now, but the system also shows me IP and registration, and the reg info showed Mountainview CA, so I think it was legit. Should have printed the chat info.

Last edited by Chris Boggs : 03-15-2006 at 01:49 PM.
Chris Boggs is offline   Reply With Quote
Old 03-15-2006   #4
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
I find this really interesting. I've always argued that bots wouldn't read JavaScript - rather than bots /couldn't/ read JavaScript. The danger is that they'll start to loop, or order things, or try and talk to Chris.

However, JavaScript isn't going away. AJAX grows in popularity. It must be tempting for search engines to try and have their cake and eat it. That's to run with JavaScript but only in an information discovering way.

It sounds like this really was Googlebot too.
Wail is offline   Reply With Quote
Old 03-15-2006   #5
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
I will analyze our log files and see if G was in fact crawling at the time, unless that can be spoofed in the same manner?
Chris Boggs is offline   Reply With Quote
Old 03-15-2006   #6
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
The user agent can be spoofed easily. The IP address is not so easy to spoof. If you spot Googlebot in your logs, run a tracert on the IP and see if you return to Google Inc.'s network.

If you do then I'll believe it's Googlebot.

I'll be interested to know it's exact user agent too. We've seen Googlebot/test hit .js files back in early 2005/late 2004 as I recall.
Wail is offline   Reply With Quote
Old 03-15-2006   #7
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
Browser Googlebot/2.1 (+http://www.google.com/bot.html)
Host address crawl-66-249-64-44.googlebot.com
Host IP 66.249.64.44
Country United States
City Mountain View
Organization GOOGLE
World Region California
Postal Code 94043
Time Zone America/Los_Angeles
ISP GOOGLE
Connection Type Unknown

(note: I checked...it's legit.)

Last edited by Chris Boggs : 03-15-2006 at 02:19 PM.
Chris Boggs is offline   Reply With Quote
Old 03-15-2006   #8
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Hmm, can you show the code used on your site?
rustybrick is offline   Reply With Quote
Old 03-15-2006   #9
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
this should be it.

Quote:
<a href='http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&by href=1&imageUrl=http://www.g3group.com/assets/images/chat' target='chat74320687' onClick="javascript:window.open('http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&im ageUrl=http://www.g3group.com/assets/images/chat&referrer='+escape(document.location),'chat743 20687','width=472,height=320');return false;" ><img src='http://server.iad.liveperson.net/hc/74320687/?cmd=repstate&site=74320687&&ver=1&imageUrl=http://www.g3group.com/assets/images/chat' alt="chat with us" name='hcIcon' width=125 height=60 border=0></a>
<!-- END LivePerson Button code -->
Chris Boggs is offline   Reply With Quote
Old 03-15-2006   #10
Rob
Canuck SEM
 
Join Date: Jun 2004
Location: Kelowna, BC
Posts: 234
Rob will become famous soon enoughRob will become famous soon enough
This doesn't surprise me, I've seen some very interesting crawling coming from the mozilla googlebot.

For one, I've seen it submit forms - actually fill them out and hit the submit button.

It's also trying to request pages that don't exist, but it's doing so intelligently.

ie, if I have a site with pages named page1.html page2.html and page3.html it's also requesting page4.html page5.html and so on even if they don't exist.

It's also a greedy bot - it's requesting as much as 8 times the page in a day than the old googlebot.

I figure,based on the fact that it's built on a modern browser engine, we've only just begun to see what it will be able to do. Just think of all the things your normal browser can handle (css, js, flash, movies etc) and I think this is what the new bot can now, or soon will be able to handle.
Rob is offline   Reply With Quote
Old 03-16-2006   #11
Mike
Member
 
Join Date: Jun 2004
Posts: 52
Mike is on a distinguished road
I found out 3 weeks ago that google is looking at on-page javascript on many of my clients' sites, seeing what it thinks are URLs, and trying to go to those pages.

for example, client has hitbox, and we have various categories on the site. Categories are defined in the HB javascript by
Code:
var _mlc="/CategoryName";
Google sees the leading / and assumes this is a url. It then tried to request site.com/CategoryName, and of course got a 404. Then it reported this as a 404 error in the Google Sitemaps account!
Mike is offline   Reply With Quote
Old 03-16-2006   #12
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
In Chris' example code we can see that there is, in fact, a fully qualified URL in the standard HTML for Googlebot to follow. It depends on whether that would trigger the LiveChat session or not.

That said, I've seen other evidence to say that Googlebot is grabbing more URL data from JavaScript this week (or rather, evidence which suggests it as nothing is scientific in this industry) and Mike's example above is further supporting speculation.

So, as an SEO, your clients have just spent money to change their navigation from JavaScript to non. It cost and they're less happy with the way their site looks now. How do you handle this?

At this point in time I like to remind myself that there's no evidence, not even this sort of speculative evidence, that there are any promotional benefits to be had from JavaScript based navigation and so, right now, its business as usual.
Wail is offline   Reply With Quote
Old 03-16-2006   #13
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Right, I would try adding a rel="nofollow" to the URL.
rustybrick is offline   Reply With Quote
Old 03-16-2006   #14
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
Yeah. That would be an interesting study.

There are some signs to suggest that spiders do follow "nofollow" links (after all, the concept is to link condom the anchor and have it not count for PageRank) but its hard to tell whether a spider found a page via some other method (a PageRank request via toolbar). The LiveChat talk box would be as a good 'closed lab' as any other public page.

In fact, there's no way you can add "rel='nofollow'" to a JavaScript variable. Ie, Mike can't take his var _mlc="/CategoryName"; and mark it 'nofollow'.
Wail is offline   Reply With Quote
Old 03-16-2006   #15
Mike
Member
 
Join Date: Jun 2004
Posts: 52
Mike is on a distinguished road
I agree -- I used the hitbox thing as an example when speaking to another client -- bottom line is that even though google is looking for links in javascript, it doesn't mean it's time to announce "ok javascript for everybody!" -- just means they're starting, that it's not 100% accurate yet, and that we don't know what the other engines are doing yet, etc

I think it's gonna be a couple of years before we can tell all our clients to go nuts with javascript menus..
Mike is offline   Reply With Quote
Old 03-17-2006   #16
rabbit999
 
Posts: n/a
Googlebot and msnbot visit Timpani Live help almost on a daily basis at my workplace too.

I tend to "Refuse chat" when I see their hostname displayed.

I sent a support request to Timpani regarding this problem, and they
never replied.....
  Reply With Quote
Old 03-17-2006   #17
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
OK thanks everyone. It looks like the answer to my original question of "Does this seem out of the ordinary to anyone?" is mostly a "no." I was told by a developer that this seemed very strange. I am not a developer. According to some here and elsewhere, though, this involves a clear html path for the Googlebot, even though this is within a JavaScript portion of the code? Can someone "dummy it down" for me?
Chris Boggs is offline   Reply With Quote
Old 03-17-2006   #18
Wail
Another member
 
Join Date: Jun 2004
Posts: 247
Wail will become famous soon enoughWail will become famous soon enough
Quote:
<a href='http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&by href=1&imageUrl=http://www.g3group.com/assets/images/chat' target='chat74320687' onClick="javascript:window.open('http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&im ageUrl=http://www.g3group.com/assets/images/chat&referrer='+escape(document.location),'chat743 20687 ','width=472,height=320');return false;" ><img src='http://server.iad.liveperson.net/hc/74320687/?cmd=repstate&site=74320687&&ver=1&imageUrl=http://www.g3group.com/assets/images/chat' alt="chat with us" name='hcIcon' width=125 height=60 border=0></a>
<!-- END LivePerson Button code -->
<a href='http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&by href=1&imageUrl=http://www.g3group.com/assets/images/chat'

Search engines can follow this. This isn't JavaScript, this is a straight link. Just because there's JavaScript nearby does not stop the spider finding this but of the code.

Now, if we had said something like:

document.write('<a href=\'http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&by href=1&imageUrl=http://www.g3group.com/assets/images/chat\'>');

Then that would have been a different story as JavaScript is actually being used to write that address (it's not in the actual) code. I don't think anyone here would be terribly surprised to see that Googlebot is smart enough to extract this fully formed URL from the JavaScript. It seems that the bot is increasingly likely to do so these days (we speculate).
Wail is offline   Reply With Quote
Old 03-17-2006   #19
GoogleGuy
Unofficial Representative
 
Join Date: Jul 2004
Location: Mountain View, CA
Posts: 66
GoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of lightGoogleGuy is a glorious beacon of light
I'd slap a nofollow on any link that you don't want Googlebot to follow, just to be safe.
GoogleGuy is offline   Reply With Quote
Old 03-17-2006   #20
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
Quote:
Originally Posted by Rob
It's also trying to request pages that don't exist, but it's doing so intelligently.

ie, if I have a site with pages named page1.html page2.html and page3.html it's also requesting page4.html page5.html and so on even if they don't exist.
This rings a bell. Unless my oldtimer's disease is worse than I thought, I remember a Googler (maybe Matt) saying that googlebot requests files that are expected *not* to exist, to check if the site returns 404s or not. I suggest that that's what happened here, and that, coincidentally, you have pages with some of those filenames.
PhilC is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off