PDA

View Full Version : Prevent PDF Indexing??


inpursuit
01-03-2007, 01:22 AM
im using joomla and wish to prevent google indexing the pdfs

I was told its something in your robots.txt file..

can someone paste the line of code here so i can pop it in thanks as i did a search and cant find the code

St0n3y
01-03-2007, 11:03 AM
You need to look at how to develop a robots.txt file first. If you put all your PDFs in a single directory (folder) then you can esaily exlude that particular directory from getting spidered by the engines.

Here is an example:

User-agent: *
Disallow: /pdf/

rainborick
01-03-2007, 02:38 PM
You can also use wildcards in your robots.txt file for Googlebot now, so you can exclude files by the file name extension (or essentially, by file type) without having to restructure the site to segregate such files into a paticular directory. See http://www.google.com/support/webmasters/bin/answer.py?answer=40367 In addition, Yahoo! also supports wildcards in the robots.txt file, but I don't know if MSN does. Good luck!

mcanerin
01-03-2007, 02:50 PM
MSN, Yahoo and Google all now support wildcards for files.

Note that this is not an official part of the robots.txt initiative (which IMO is badly outdated) but rather functionality that was independently added by the 3 search engines because there was a need for it.

Usage:

Disallow: /*.[file extension]$


Example:

Disallow: /*.PDF$

You will notice that it's a simple wildcard (*), but there is a dollar sign ($, which is pronounced as "string" in programming), at the end that tells the spider that it is the end of the extension.

Use of the wildcard for files in a robots.txt must always be ended with a string character.

Ian