PDA

View Full Version : how to prevent HTTPS static(html) pages indexing ?


jameswatt
06-13-2008, 06:35 AM
Hi folks,

My site has few pages in HTTPS version and few in HTTP version.
Problem is I have linked certain pages like home page, sitemap page and services page links in Footer section of HTTPS version pages, now Google has indexed my domain as https://www.example.com and yahoo and msn has indexed certain static pages which are not linked in footer section of my HTTPS page

For example
Linked pages in footer section of HTTPS version pages
https://www.example.com/index.php
https://www.example.com/services.php
https://www.example.com/sitemap.html

Pages which are not linked in footer section of HTTPS version pages but still got indexed in yahoo and msn
https://www.example.com/ihome.php
https://www.example.com/toolresource.html
https://www.example.com/insertm.html

If you click on above pages all pages will redirect to respective HTTP version pages with 302 methods

Now big question how search engine (yahoo and msn) has indexed static html pages with HTTPS version without any link to any of my page.

How I can remove those https://www.example.com/insertm.html pages from robots.txt file or .htaccess file

Questions
How can I prevent indexing HTTPS version pages, excluding my landing page?
What should I do to stop crawling my main domain with HTTPS version (i.e. https://www.example.com)?

It would be great if some one can help me on this front!

beu
06-13-2008, 12:37 PM
For example
Linked pages in footer section of HTTPS version pages
https://www.example.com/index.php
https://www.example.com/services.php
https://www.example.com/sitemap.html

Pages which are not linked in footer section of HTTPS version pages but still got indexed in yahoo and msn
https://www.example.com/ihome.php
https://www.example.com/toolresource.html
https://www.example.com/insertm.html

If you click on above pages all pages will redirect to respective HTTP version pages with 302 methods

Now big question how search engine (yahoo and msn) has indexed static html pages with HTTPS version without any link to any of my page.

302 is the problem, should be a 301! That is why the target URL (HTTPS) is being indexd and not the landing page URL (HTTP). Then by following the HTTPS URLs engines are finding other URLs.

jameswatt
06-16-2008, 03:02 AM
I have not linked every pages with HTTPS version, I have used script for my auto insurance section so all the pages should be displayed as HTTPS version,
following pages are not linked any where with HTTPS protocol but still indexed in yahoo and msn

https://www.example.com/ihome.php
https://www.example.com/toolresource.html
https://www.example.com/insertm.html

how come it possible,

beu
06-16-2008, 03:43 PM
I have not linked every pages with HTTPS version, I have used script for my auto insurance section so all the pages should be displayed as HTTPS version,
following pages are not linked any where with HTTPS protocol but still indexed in yahoo and msn

https://www.example.com/ihome.php
https://www.example.com/toolresource.html
https://www.example.com/insertm.html

how come it possible,
I'd be willing to bet they are linked to by some other page. Post your URL here or PM me and we'll help get to the bottom of this for you.:)

JohnW
06-16-2008, 08:52 PM
Like beu said a 301 will fix this.

Another solution that IMO is preferable to using 301s is to have 2 robots.txt files, one for https and one for http, and disallow the https as needed.

>how come it possible

one possibility is that if you use relative links, once SE gets into your site via https all of the pages can get indexed that way.

beu
06-17-2008, 02:29 AM
Another solution that IMO is preferable to using 301s is to have 2 robots.txt files, one for https and one for http, and disallow the https as needed.
Great tip JohnW!

seomike
06-17-2008, 11:57 AM
Are your https pages all in a directory? Like example.com/admin/ or example.com/cart/

If they are I can help you with an .htaccess script.