One of my students emailed me about this a while back, and I didn't see any reference to it anywhere.
Googlebot supports an extension to the robots.txt syntax, which allows webmasters to use wildcards in disallow directives:
Quote:
From Google's webmaster info pages:
Additionally, Google has introduced increased flexibility to the robots.txt file standard through the use asterisks. Disallow patterns may include "*" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name.
...
To remove all files of a specific file type (for example, .gif), you'd use the following robots.txt entry:
User-agent: Googlebot
Disallow: /*.gif$
|
While this is true when Googlebot reads your robots.txt file, Google's
URL removal tool does not understand these extensions, and will generate an error message telling you that wildcards aren't allowed, if you feed it a robots.txt file which makes use of these extensions.
Matt Cutts confirmed this... but it really shouldn't be a huge problem under normal circumstances, since it should only take a few days for Googlebot to pick up changes in the robots.txt file, and drop any pages that are disallowed.