ptmagnolia82
05-04-2005, 03:15 PM
Hello,
I've recently noticed that Google is indexing some of our pages as both http:// and https://. I'm concerned about a duplicate content penalty and am looking for possible solutions. I believe the spider finds the https:// on one of our log-in forms (which has to be secure) and jumps to a link from that page and the problem begins.
I could block the log-in page from the spider via the robots.txt file, but I'm not sure if this is the only place where the loophole is present. Is it possible to disallow access to any pages starting with "https://" using robots.txt, thereby making the solution a little more redundant site-wide, instead of implementing a patch to specific pages?
Thanks for the help!
I've recently noticed that Google is indexing some of our pages as both http:// and https://. I'm concerned about a duplicate content penalty and am looking for possible solutions. I believe the spider finds the https:// on one of our log-in forms (which has to be secure) and jumps to a link from that page and the problem begins.
I could block the log-in page from the spider via the robots.txt file, but I'm not sure if this is the only place where the loophole is present. Is it possible to disallow access to any pages starting with "https://" using robots.txt, thereby making the solution a little more redundant site-wide, instead of implementing a patch to specific pages?
Thanks for the help!