PDA

View Full Version : PDF version of html pages


SEOchica
01-10-2005, 06:07 PM
I have a site that has a lot of technical literature on it, currently in PDF format. I would like to make HTML versions of these pdfs for two reasons. One is so that a user can view the information quickly, without having to wait for Adobe to load... Second, is so that sales people can print of the PDFs for clients or that users can print of a nice formatted version.

Are SE's going to see this as duplicate content? I'm guessing yes, and I know that you can keep a SE from indexing the pages, but I'd like the HTML versions indexed and not the pdfs. How would I go about doing that?

srikanth
01-11-2005, 08:50 AM
Hi SEOChica, keep the PDF documents in a separate folder and disallow spiders to the folder using robots.txt file.

Delete any PDF documents that are outside of that folder. And setup a custom 404 error page for the users.

Code for the robots.txt,

User-agent: *
Disallow: /FolderNameHere/

HuckerNut
01-11-2005, 01:12 PM
SEOchica -
I could be wrong, but I've never seen any penalties from having both a pdf and html file as long as they are from the same site.

powerofeyes
01-11-2005, 03:51 PM
We never see a problem with 2 version of the same document and we have done it in more than 5 client sites, Having 2 versions on the same document is fine,

A PDF version is defientely different than the html version, One suggestion is to include the html version in your mail site template that way PDF and HTML versions will have atleast 5% difference,

qcguide
01-11-2005, 04:21 PM
I could be wrong, but I've never seen any penalties from having both a pdf and html file as long as they are from the same site.
FWIT neither have I.

I don't think it is worth moving your PDF files because of that.

The worst that would happen is Google indexing one version and having the other version show up as supplemental.

pleeker
01-11-2005, 05:56 PM
FWIT neither have I.

Ditto here. In fact, based on recent experiences with one client (and their competition), I'm starting to believe there's a real benefit in offering a PDF version of content in addition to the regular HTML page.

Might be worth a new topic, actually....

St0n3y
01-12-2005, 07:51 PM
I think it would be a mistake to create PDF versions of your website as a means to have more "chances" at top rankings. Google and other engines may not penalize now, but if it becomes an "seo trick", they are likely to change the way they view them and you can find yourself penalized before you know it. If you do need both then I would recommend, as mentioned above, putting PDFs in a separte folder and exclude that folder from search spiders. That's simply the safest way.

SEOchica
01-13-2005, 10:26 AM
Actually, I'm not creating a pdf version of the site. There is a section of the site that has technical documents that are currently in PDF. I am going to deliver an HTML version for the end user to get information quickly and be able to browse through the docs without having to download every pdf.

I will most likely take the advice of srikanth and keep the spiders from the pdf folder.

pleeker
01-13-2005, 01:35 PM
I appreciate the whole "better safe than sorry" idea, but if it's a convenience for the end user to also be able to access the same information in PDF form, then you should do so. Using robots.txt to block the spiders? If you feel you must.

On the other hand, I'm still trying to figure out why one client's pages started ranking better after we posted a PDF with similar (not exact) content to those pages. We took that move after seeing a PDF of similar content (about a new product on the market) posted on a competitor's site rank No. 1 on Google for some time. I'd like to learn more about what value is given to PDFs in the various algorithms.....