PDA

View Full Version : How to index pages which requires login


qante
11-15-2005, 04:13 AM
Hi all,

Is it possible to index pages which is only available to members after they have logged in?

I have a site with many word documents which I'd like to index in the search engines. But I do not want people to be able to access and download these documents before first logging into their account (which is free of course - but just so that I can keep track of who is downloading).

Is it possible to expose a special sub-site for robots? Or is this considered as spamming the search engines?

Any help is much appreciated.

/a

NuevoJefe
11-16-2005, 04:53 AM
How tech-savvy are your users?

qante
11-16-2005, 06:09 AM
Any help is appreciated - tell me what you got :)

NuevoJefe
11-16-2005, 03:49 PM
Ok, well you basically just have to do user-agent delivery. First you'll have to noarchive the pages or people will just be able to read the cached versions.

Second, you need to add the code. This is perl and is old so there may be better ways to do it and an updated list of bots you want to allow.

if ( $browser =~
/robot¦slurp¦crawl¦scooter¦googlebot¦libwww¦
spider¦psbot¦openbot¦zyborg¦webstream\.net¦archive r¦
internetseer¦pompos¦ask jeeves¦teleportpro¦mercator¦
python-urllib¦webzip¦slysearch¦netsweeper/ ) {
return;
}

If not a crawler, then redirect to the correct page.

print $cgi->redirect(-uri =>
"http://$ENV{'HTTP_HOST'}/cgi-bin/getReg/$section/$topic");

The problem with UA-delivery is that people who are savvy can simply change their UA, like with a firefox extension. There are better ways of cloaking but I think it's probably too much effort unless this is lots of valuable content.