PDA

View Full Version : Googlebot called me on my Timpani Live Help


Chris Boggs
03-15-2006, 01:39 PM
I just got a call (request for chat) from the Googlebot to my live help system (we use Timpani). This seems very interesting, because it indicates that the crawler was not only able to read the JavaScript but also execute the command to "call."

Does this seem out of the ordinary to anyone?

I wonder if we will get "bonus points" for actually answering? By the way the Googlebot is a little rude...didn't even respond to my greeting.

rustybrick
03-15-2006, 01:45 PM
This is most likely someone changing their useragent to GoogleBot in their browser.

Chris Boggs
03-15-2006, 01:46 PM
damn it's gone now, but the system also shows me IP and registration, and the reg info showed Mountainview CA, so I think it was legit. Should have printed the chat info. :(

Wail
03-15-2006, 01:49 PM
I find this really interesting. I've always argued that bots wouldn't read JavaScript - rather than bots /couldn't/ read JavaScript. The danger is that they'll start to loop, or order things, or try and talk to Chris.

However, JavaScript isn't going away. AJAX grows in popularity. It must be tempting for search engines to try and have their cake and eat it. That's to run with JavaScript but only in an information discovering way.

It sounds like this really was Googlebot too.

Chris Boggs
03-15-2006, 01:50 PM
I will analyze our log files and see if G was in fact crawling at the time, unless that can be spoofed in the same manner?

Wail
03-15-2006, 01:53 PM
The user agent can be spoofed easily. The IP address is not so easy to spoof. If you spot Googlebot in your logs, run a tracert on the IP and see if you return to Google Inc.'s network.

If you do then I'll believe it's Googlebot.

I'll be interested to know it's exact user agent too. We've seen Googlebot/test hit .js files back in early 2005/late 2004 as I recall.

Chris Boggs
03-15-2006, 02:14 PM
Browser Googlebot/2.1 (+http://www.google.com/bot.html)
Host address crawl-66-249-64-44.googlebot.com
Host IP 66.249.64.44
Country United States
City Mountain View
Organization GOOGLE
World Region California
Postal Code 94043
Time Zone America/Los_Angeles
ISP GOOGLE
Connection Type Unknown

(note: I checked...it's legit.)

rustybrick
03-15-2006, 02:37 PM
Hmm, can you show the code used on your site?

Chris Boggs
03-15-2006, 02:55 PM
this should be it.

<a href='http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&byhref=1&imageUrl=http://www.g3group.com/assets/images/chat' target='chat74320687' onClick="javascript:window.open('http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&imageUrl=http://www.g3group.com/assets/images/chat&referrer='+escape(document.location),'chat74320687 ','width=472,height=320');return false;" ><img src='http://server.iad.liveperson.net/hc/74320687/?cmd=repstate&site=74320687&&ver=1&imageUrl=http://www.g3group.com/assets/images/chat' alt="chat with us" name='hcIcon' width=125 height=60 border=0></a>
<!-- END LivePerson Button code -->

Rob
03-15-2006, 03:26 PM
This doesn't surprise me, I've seen some very interesting crawling coming from the mozilla googlebot.

For one, I've seen it submit forms - actually fill them out and hit the submit button.

It's also trying to request pages that don't exist, but it's doing so intelligently.

ie, if I have a site with pages named page1.html page2.html and page3.html it's also requesting page4.html page5.html and so on even if they don't exist.

It's also a greedy bot - it's requesting as much as 8 times the page in a day than the old googlebot.

I figure,based on the fact that it's built on a modern browser engine, we've only just begun to see what it will be able to do. Just think of all the things your normal browser can handle (css, js, flash, movies etc) and I think this is what the new bot can now, or soon will be able to handle.

Mike
03-16-2006, 12:11 PM
I found out 3 weeks ago that google is looking at on-page javascript on many of my clients' sites, seeing what it thinks are URLs, and trying to go to those pages.

for example, client has hitbox, and we have various categories on the site. Categories are defined in the HB javascript by var _mlc="/CategoryName"; Google sees the leading / and assumes this is a url. It then tried to request site.com/CategoryName, and of course got a 404. Then it reported this as a 404 error in the Google Sitemaps account!

Wail
03-16-2006, 12:17 PM
In Chris' example code we can see that there is, in fact, a fully qualified URL in the standard HTML for Googlebot to follow. It depends on whether that would trigger the LiveChat session or not.

That said, I've seen other evidence to say that Googlebot is grabbing more URL data from JavaScript this week (or rather, evidence which suggests it as nothing is scientific in this industry) and Mike's example above is further supporting speculation.

So, as an SEO, your clients have just spent money to change their navigation from JavaScript to non. It cost and they're less happy with the way their site looks now. How do you handle this?

At this point in time I like to remind myself that there's no evidence, not even this sort of speculative evidence, that there are any promotional benefits to be had from JavaScript based navigation and so, right now, its business as usual.

rustybrick
03-16-2006, 01:15 PM
Right, I would try adding a rel="nofollow" to the URL.

Wail
03-16-2006, 01:36 PM
Yeah. That would be an interesting study.

There are some signs to suggest that spiders do follow "nofollow" links (after all, the concept is to link condom the anchor and have it not count for PageRank) but its hard to tell whether a spider found a page via some other method (a PageRank request via toolbar). The LiveChat talk box would be as a good 'closed lab' as any other public page.

In fact, there's no way you can add "rel='nofollow'" to a JavaScript variable. Ie, Mike can't take his var _mlc="/CategoryName"; and mark it 'nofollow'.

Mike
03-16-2006, 03:56 PM
I agree -- I used the hitbox thing as an example when speaking to another client -- bottom line is that even though google is looking for links in javascript, it doesn't mean it's time to announce "ok javascript for everybody!" -- just means they're starting, that it's not 100% accurate yet, and that we don't know what the other engines are doing yet, etc

I think it's gonna be a couple of years before we can tell all our clients to go nuts with javascript menus..

rabbit999
03-17-2006, 12:39 AM
Googlebot and msnbot visit Timpani Live help almost on a daily basis at my workplace too.

I tend to "Refuse chat" when I see their hostname displayed.

I sent a support request to Timpani regarding this problem, and they
never replied.....

Chris Boggs
03-17-2006, 09:38 AM
OK thanks everyone. It looks like the answer to my original question of "Does this seem out of the ordinary to anyone?" is mostly a "no." I was told by a developer that this seemed very strange. I am not a developer. According to some here and elsewhere, though, this involves a clear html path for the Googlebot, even though this is within a JavaScript portion of the code? Can someone "dummy it down" for me?

Wail
03-17-2006, 10:16 AM
<a href='http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&byhref=1&imageUrl=http://www.g3group.com/assets/images/chat' target='chat74320687' onClick="javascript:window.open('http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&imageUrl=http://www.g3group.com/assets/images/chat&referrer='+escape(document.location),'chat74320687 ','width=472,height=320');return false;" ><img src='http://server.iad.liveperson.net/hc/74320687/?cmd=repstate&site=74320687&&ver=1&imageUrl=http://www.g3group.com/assets/images/chat' alt="chat with us" name='hcIcon' width=125 height=60 border=0></a>
<!-- END LivePerson Button code -->


<a href='http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&byhref=1&imageUrl=http://www.g3group.com/assets/images/chat'

Search engines can follow this. This isn't JavaScript, this is a straight link. Just because there's JavaScript nearby does not stop the spider finding this but of the code.

Now, if we had said something like:

document.write('<a href=\'http://server.iad.liveperson.net/hc/74320687/?cmd=file&file=visitorWantsToChat&site=74320687&byhref=1&imageUrl=http://www.g3group.com/assets/images/chat\'>');

Then that would have been a different story as JavaScript is actually being used to write that address (it's not in the actual) code. I don't think anyone here would be terribly surprised to see that Googlebot is smart enough to extract this fully formed URL from the JavaScript. It seems that the bot is increasingly likely to do so these days (we speculate).

GoogleGuy
03-17-2006, 06:28 PM
I'd slap a nofollow on any link that you don't want Googlebot to follow, just to be safe.

PhilC
03-17-2006, 07:23 PM
It's also trying to request pages that don't exist, but it's doing so intelligently.

ie, if I have a site with pages named page1.html page2.html and page3.html it's also requesting page4.html page5.html and so on even if they don't exist.This rings a bell. Unless my oldtimer's disease is worse than I thought, I remember a Googler (maybe Matt) saying that googlebot requests files that are expected *not* to exist, to check if the site returns 404s or not. I suggest that that's what happened here, and that, coincidentally, you have pages with some of those filenames.

Alan Perkins
03-21-2006, 09:14 AM
All that's required is for server.iad.liveperson.net/robots.txt to contain:User-agent: *
Disallow: /This would save a lot of stress on LiverPerson servers and a lot of garbage URLs indexed in search engines. For example, a site:server.iad.liveperson.net (http://www.google.com/search?&q=site%3Aserver%2Eiad%2Eliveperson%2Enet) search in Google reveals 122,000 results.
I'd slap a nofollow on any link that you don't want Googlebot to follow, just to be safe.Does that work? I thought nofollow just stopped credit being assigned to the link - not the link being followed in the first place.

natasha499
03-22-2006, 01:58 PM
I have Tipani also,, and the google and msn bots open a chat window almost every day.

Chris Boggs
04-07-2006, 02:37 PM
Just as a follow up to this. Googlebot visits fairly regularly, and I always say a nice hello. The chat seems to be discontinued by the bot within one to one and a half minutes usually (still no response to my hellos :p).

Anyone care to guess why this takes this amount of time and not longer or less? I wonder if it opens the chat and then goes on to other links before returning and ending the chat? perhaps my saying hello causes it to pause? I'll see if not answering beyond the canned response makes the session end sooner. I am curious to see people's thoughts on what may be going through the Googlebot's "mind" as it confronts such software.

It would seem to me that once it visited this link once, that it would be smart enough not to come back.

PhilC
04-07-2006, 02:54 PM
I'm not well versed in the technical side of live chat, but, unless a person specifically logs out, he/she/it will be timed out after a certain period of inactivity. In other words, googlebot doesn't log off - it gets timed out. Alternatively, it spiders the log off link.

What it doesn't do is hang around for a minute or two and then log off.

Chris Boggs
04-18-2006, 10:30 AM
This morning at 8:25 EST, I got a "call" from Alexa. (note: also did not reply to my friendly "good morning")

Visitor Info
Contact ID 209.237.238.224
Country United States
City San Francisco
Organization Alexa Internet
Postal Code 94129
ISP United Layer
Connection Type Corporate
Time Zone America/Los_Angeles
IP 209.237.238.224
State/Province California

I checked the IP, and it is registered to Alexa. Now what would Alexa be doing crawling Timpani???

Chris Boggs
04-26-2006, 11:27 AM
MSN bot just joined for the after-the-after-party...

Contact ID msnbot.msn.com
Country United States
City Redmond
Organization Microsoft Corp
Postal Code 98052
ISP Microsoft Corp
Connection Type Corporate
Time Zone America/Los_Angeles
IP 65.55.246.90
Host msnbot.msn.com
State/Province Washington


Browser msnbot/0.9

Alan Perkins
04-26-2006, 11:48 AM
I'm surprised you're surprised. :)

It's going to happen frequently unless LivePerson creates that robots.txt file.

Chris Boggs
04-27-2006, 04:26 PM
I'm surprised you're surprised. :)

It's going to happen frequently unless LivePerson creates that robots.txt file.
yeah Alan, not really surprised any more but it's nice to keep track of...

What I am surprised at is that Yahoo has never "called," nor has Ask...and also that Alexa got into the mix...