PDA

View Full Version : Session ID Removal


SEOMalc
09-13-2006, 11:55 AM
Good Afternoon,

I am trying to assist a client in removing session IDs based on the user.

I understand the premise having read many vague answers on random SEO company websites but have been unable to find any actual coding examples.

Could anyone point me in the right direction for details on how to check if the user is Googlebot etc and remove the session ID if it is.

The client is using JSP but I am pretty sure I could get the gist from almost any language (please not BBC Basic).

Many thanks

SEOMalc.

tonerman
10-13-2006, 12:19 AM
If you don't want the session ids then you sound like you don't want Googlebot to spider dynamic data. If that is true then you can use the google robots.txt extensions and put

User-agent: Googlebot
Disallow: /*?

in your robots.txt file. That will stop Googlebot from indexing any page with a "?" in the URL. I have learned that if you put a user agent: Googlebot in your robots.txt data you have to copy all your User-agent: * lines under Googlebot also - else it seems to ignore anything not within the Googlebot user agent section.

Matt Cutts also advised that every website have a sitemap with static links to every page.

We did what you want to do with mod-rewrite (strip out session id and other dynamic stuff in the url that displayed), but I didn't code it so I can't tell you how.

SEOMalc
10-13-2006, 05:14 AM
The issue was more that Google was having trouble indexing the site possibly due to the URLs changing each visit due to a new session ID being tagged on i.e.

blah.com/page.html?sid=127y3747t5t65345
blah.com/page.html?sid=45n78gn7hr7h5h51

etc.

The solution we came up with in the end was to modify how session IDs were assigned. We know have it so that if the User Agent matches any of the records in a list we have produced, no session ID is generated so the URLs are always the same. Means we have to keep on top of any User Agent changes but seems to work fine.

htaccesselite
11-03-2006, 09:14 PM
The way I am currently using is to remove SID for all users, just using php.ini

[Session]
session.save_handler = files
session.save_path = "/tmp"
session.use_cookies = 1
session.use_only_cookies = 1
session.name = PHPSESSID
session.auto_start = 0
session.cookie_lifetime = 6000
session.cookie_path = /
session.cookie_domain = htaccesselite.com
session.serialize_handler = php
session.gc_probability = 1
session.gc_divisor = 1000
session.gc_maxlifetime = 1440
session.bug_compat_42 = 0
session.bug_compat_warn = 1
session.referer_check =
session.entropy_length = 0
session.entropy_file =
session.cache_limiter = none
session.cache_expire = 180
session.use_trans_sid = 0
session.hash_function = 0
session.hash_bits_per_character = 5
url_rewriter.tags = ""


php.ini configuration (http://www.php.net/manual/en/configuration.php#70854)

How to do this on Powweb type hosts:

http://www.htaccesselite.com/htaccess/multiple-custom-php-ini-vt25.html

panana
11-28-2006, 06:23 PM
Aloha -

Great thread. I am dealing with a client who is using session ids that are particularly messy. I finally got them to fix it so that the session ids weren't added to an URL until a visitor logs in or adds something to his cart. We also added a sitemap with the static URLs, of course.

The first problem is that Yahoo has already cataloged URLs with session ids, which I want to get out of their database. How do I get them out of the Yahoo database? I'm concerned that these dynamic URLs are not only weaker candidates for ranking, they also might cause a duplicate content penalty.

I considered such things as making a blank page with the dynamic/session iid URL of the page in Yahoo and doing a 301 redirect to the static. Unfortunately, that doesn't seem to be allowed by the server/host. You just can't upload a page with an URL like that, as I found out.

I can't do a blanket redirect, because the Session IDs are needed for visitors making a purchase. I just want to tell Yahoo to use the static URL and drop the dynamic one.

We put in the robots.txt file a command to ignore dynamic URLs. But, my impression is that this will not affect dynamic pages already in the search engine index. Clicking on the dynamic URL's listing in Yahoo brings up a page, so if it isn't changed, that page will stay in Yahoo I assume.

Working with the webmaster is difficult, so using their help and greater programming knowledge is not an immediately open option.

I looked around for a solution to this, but most answers revolved around the idea that 'the page will end up a 404 and be dropped eventually', which doesn' seem to be the case here.

If I'm missing something obvious, put it down to overwork and holiday stress. Thanks for any help you can give me -

jon