Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 01-14-2006   #1
DanThies
Keyword Research Super Freak
 
Join Date: Jun 2004
Location: Texas, y'all
Posts: 142
DanThies is a name known to allDanThies is a name known to allDanThies is a name known to allDanThies is a name known to allDanThies is a name known to allDanThies is a name known to all
URL Removal tool doesn't support robots.txt extensions

One of my students emailed me about this a while back, and I didn't see any reference to it anywhere.

Googlebot supports an extension to the robots.txt syntax, which allows webmasters to use wildcards in disallow directives:
Quote:
From Google's webmaster info pages:
Additionally, Google has introduced increased flexibility to the robots.txt file standard through the use asterisks. Disallow patterns may include "*" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name.
...
To remove all files of a specific file type (for example, .gif), you'd use the following robots.txt entry:
User-agent: Googlebot
Disallow: /*.gif$
While this is true when Googlebot reads your robots.txt file, Google's URL removal tool does not understand these extensions, and will generate an error message telling you that wildcards aren't allowed, if you feed it a robots.txt file which makes use of these extensions.

Matt Cutts confirmed this... but it really shouldn't be a huge problem under normal circumstances, since it should only take a few days for Googlebot to pick up changes in the robots.txt file, and drop any pages that are disallowed.
DanThies is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off