View Full Version : spidering https or secured sites
yuwyma
01-05-2006, 07:37 PM
Can anyone can confirm if yahoo spider can index https or secured web sites (no logins)?
Chris_D
01-05-2006, 08:07 PM
Hi and welcome to SEW Forums!
Yahoo! does not index https:// pages (port 443 secure).
There are several ways to prevent our crawler from indexing your site or portions of your site:
- create a "robots.txt" file on your web site to prevent our crawler from indexing your site
- add a "noindex" meta tag to your documents
- remove the original document from your web site
- host the document on a secure section of your web site (HTTPS or login)
http://help.yahoo.com/help/us/ysearch/deletions/deletions-03.html
yuwyma
01-05-2006, 08:16 PM
Hi and welcome to SEW Forums!
Yahoo! does not index https:// pages (port 443 secure).
http://help.yahoo.com/help/us/ysearch/deletions/deletions-03.html
Hi Chris, thanks for the reply. That's what I saw on their help page, but I have received conflicting information. I contacted someone from Yahoo and still waiting for confirmation. I am also trying to look at the access logs for the yahoo spider but unforunately it will take a few days for me to have access.
Chris_D
01-05-2006, 11:02 PM
I think you'll find that Yahoo! will generally only 'index' a secure page, where the request for a non secure variant uses a 302 temporarily moved to redirect to the secure variant.
e.g. Try this search:
http://search.yahoo.com/search?p=https%3A%2F%2Fwww.bnz.co.nz%2FInternet_Ba nking%2F1%2C1184%2C10-144-579%2C00.html
Now try this search:
http://search.yahoo.com/search?p=http%3A%2F%2Fwww.bnz.co.nz%2FInternet_Ban king%2F1%2C1184%2C10-144-579%2C00.html
The http version of the page gets indexed. Remember - Yahoo doesn't show http:// in the serps.
The http pages gives a 302 redirection to the https page (which means - index the requested URL {http://} - but with the content of the 302 target page).
The http page therefore gets indexed with the content of the https:// page. If you click on the link in the serps you'll get 302'd to the secure page - making it appear that the secure page itself was indexed.
PM me the url you are inquiring about & I'll have a look.
Also - many other https:// pages also use robots.txt or meta robots where they don't want a site indexed.