PDA

View Full Version : robots.txt and landing pages - Adbot broken?


JohnW
08-05-2006, 02:35 PM
Like most folks, I assume, we create robots-blocked subdirectories for ppc landing pages. I have noticed that this seems to create a problem for the Adwords robot. When you ask the Adwords robot to crawl the page (using the "site related keywords' tool) the result is null and reports that robots.txt has blocked the page. Isn't the Adwords robot supposed to ignore robots.txt unless it is specifically tagged for adbot?

JohnW
08-07-2006, 03:55 PM
Hmmm. I guess everyone with the answers is at SES. The above post is of particular concern to me in that I am afraid a robots issue will affect Quality Score. Anybody know anything about this?

Brian M
08-07-2006, 04:39 PM
Hi John,

Yes, I believe you are correct and have a reason to be concerned. I also noticed this a while back and have been getting around this by creating a new subdirectory for PPC landing pages that I am testing, but I do not block any robot access at all because it stops Adbot.

In theory - as long as there is no link on the web to the landing page, it should never be found. However, this theory is being shot full of holes by search engines that are trying to read JavaScript. They follow Google ads into the site and index the resulting PPC tracking code whenever the ad appears in an Adsense site ad.

I just wish I were going to SES so I could ask this question in person. Hopefully, somebody there will notice this thread and get an answer for us soon...

Brian M

JohnW
08-07-2006, 10:32 PM
>In theory - as long as there is no link on the web to the landing page, it should never be found. However, this theory is being shot full of holes by search engines that are trying to read JavaScript.

Also if a competitor is paying attention he may link to these pages. Also Google may index them simply because someone visited the page with the toolbar.

I found some info - if I am reading it correctly the Google documentation states that you have to specifically disallow adsbot if you want to keep it out.

https://adwords.google.com/support/bin/answer.py?answer=38197

Doesn't this imply that a normal robots.txt will not stop adsbot? Either the documentation is wrong, or the adbot is broken.

MattCutts
08-09-2006, 10:46 PM
A crawl person mentioned this thread and wanted to get specifics. Would you mind telling us more? You could drop the info in this thread, or as a comment on my blog if you'd prefer. If you want to keep it private, you can comment on my blog with a new email address that hasn't had comments approved and say

*** DO NOT APPROVE! MATT ASKED FOR THE SPECIFICS ON THIS ROBOTS.TXT ISSUE ***

or something at the top. Sound good?

AussieWebmaster
08-10-2006, 09:04 AM
I would like to see this one posted here.... share the info with the forum.

JohnW
08-10-2006, 11:29 AM
Hi Matt, not sure how much I can add much to what I already posted, but here goes:

I remember reading that the adsbot will crawl pages even if robots are generically blocked in robots.txt, and unless the robots.txt specifically disallowed the Adsbot. This is a reasonable practice because competent webmasters will have their landing page folders disallowed so as to prevent duplicate content problems, yet would still need the pages to be visible to adsbot so as to establish Quality Score.

I looked for more info and found the page I mentioned earlier
https://adwords.google.com/support/bin/answer.py?answer=38197
Here is an excerpt:

• To prevent AdsBot-Google from accessing your site, add the following to your robots.txt file:
User-agent: AdsBot-Google
Disallow: /
• To prevent AdsBot-Google from accessing parts of your site, add the following to your robots.txt file:
User-agent: AdsBot-Google
Disallow: /exclude/

This seems to reinforce the notion that the user agent AdsBot-Google will not obey a normal robots file like the following

User-agent: *
Disallow: /landingpages/

So, when you are in an AdWords account Ad Group, and you select “Keyword tool” you are presented with a page that has 2 tabs, the first is called “Keyword Variations” and the second called “Site-Related Keywords”. On the second tab you are prompted to “Enter a webpage URL to find keywords related to the content on the page.” You enter the landing page URL and the select the “Get Keywords” button. Here’s what I get if the otherwise normal and healthy page is in a folder with a generic robots disallow:

“Unable to access http://www.domain.com/LP/Nx.html.
We're sorry, but we can't seem to access the URL http://www.domain.com/LP/Nx.html. Permission to crawl this URL may have been denied in the site's robots.txt file. Please enter a different URL to begin the keyword generation process again.”

It appears that whatever robot is assigned to spider the pages for this AdWords feature is indeed honoring the generic User-agent: * disallow even though according to the available documentation, it should not.

If you need a site-specific example let me know but this issue should be easy to replicate.

JohnW

MattCutts
08-10-2006, 01:50 PM
I'm pointing folks here. Thanks, JohnW.

MattCutts
08-10-2006, 06:43 PM
JohnW, the crawl team tried to recreate this behavior by setting up a host and varying the robots.txt, and they didn't see the behavior you saw. Could I get your hostname to check into it deeper? Either here or if you want to do it privately via a comment on my blog is fine..

JohnW
08-11-2006, 01:00 PM
matt, I sent the info through your blog.

JohnW

AussieWebmaster
08-11-2006, 03:54 PM
I hope you guys share this information here as well. Inquiring minds want to know.

MattCutts
08-12-2006, 09:22 PM
Got it, JohnW. Thanks.

AussieWebmaster, at first glance it looks as though Googlebot is working correctly and Adsbot is working correctly, but the request from AdWords keyword tool is checking for the Googlebot agent in robots.txt instead of Adsbot. I'm passing it all over to the crawl team to dig into in more detail though.

AussieWebmaster
08-12-2006, 09:53 PM
Got it, JohnW. Thanks.

AussieWebmaster, at first glance it looks as though Googlebot is working correctly and Adsbot is working correctly, but the request from AdWords keyword tool is checking for the Googlebot agent in robots.txt instead of Adsbot. I'm passing it all over to the crawl team to dig into in more detail though.

Thanks Matt....

JohnW
08-15-2006, 02:08 PM
I got an email from the AdWords rep. He said not to worry, that the AdWords Site-Related Keywords tool does not use the AdWords robot, it uses the regular Gbot. So there are no QS problems. But also no recognition that anything is a problem with this.

Matt, do you see a disconnect here? Would this tool be better if it could access pages that are in a robots disallowed folder?

Am I the only one that disallows robots from landing page folders, or, hmmm... am I the only one that has tried to use ths tool?

AdWordsRep
08-15-2006, 02:57 PM
I got an email from the AdWords rep. He said not to worry...Just to be very clear, this was not from me. ;) I've entirely deferred to MattCutts in this thread, as he is far more expert in these matters than am I.

(JohnW, my screen name here on SEW is AdWordsRep, in case you wonder what I'm going on about.)

AWR

AussieWebmaster
08-15-2006, 04:27 PM
Just to be very clear, this was not from me. ;) I've entirely deferred to MattCutts in this thread, as he is far more expert in these matters than am I.

(JohnW, my screen name here on SEW is AdWordsRep, in case you wonder what I'm going on about.)

AWR

I think we realised he meant his ad words rep as in account rep and not you mate!

AdWordsRep
08-15-2006, 04:48 PM
I think we realised he meant his ad words rep as in account rep and not you mate! Yeah, I kinda figured that was going to be the case. But then, I sometimes fall back into my old habit of being one of those folks who "just has to make sure".

Plus, I get to up my post count by two by 1) Just making sure, and 2) Explaining that I was just making sure. Heheh.

Next post, coming soon: explaining why I was explaining that I was I was just making sure. :)

AWR

JohnW
08-15-2006, 10:29 PM
AWR, I hope I didn’t offend you by referring to someone else as “the” AdWords rep as clearly this title should be reserved for you ;-)