View Full Version : Yahoo! Indexing URLs with Session ID's
Marcia
07-04-2004, 03:32 AM
I caught a couple earlier and it's the first time I'd ever noticed it, so I just ran a search for SESSIONID
http://search.yahoo.com/search?fr=slv1-&p=allinurl%3aSESSIONID
I've never had to deal with session ID's myself, but could that cause any problems for people who do?
DianeV
07-04-2004, 09:32 AM
Assuming for the sake of argument that the purpose of the session ID is to track users, and that a unique session ID is handed out for every request for any given page, my guess is that the same page could get spidered multiple times with different session IDs, thus making it appear as though the site had duplicate pages, and possibly lots of them.
One obvious solution would be not to assign session IDs to SE spiders. Wouldn't be cloaking, per se, as the site would be showing the same page to all -- session ID or no.
rustybrick
07-04-2004, 10:00 AM
I've never had to deal with session ID's myself, but could that cause any problems for people who do?
I think its more of a problem for Yahoo!'s spiders. :)
respree
07-22-2004, 05:20 PM
I've had a horrible experience with this where Slurp is indexing my pages with a session and shopping cart ID. I've got a #3 ranking on a rather important term for my site and while I have no doubt people are clicking on this link, they cannot buy (because that shopping cart ID has already been use by another customer). :mad:
I haven't noticed this problem with any of the other search engines.
Is there any way to remove the offending indexed URL?
Any advice greatly appreciated.
polarmate
07-22-2004, 06:32 PM
respree, you could serve a 404 for that particular URL so that it is dropped. Or a 301 for visitors who come from the SERPs.
Dodger
07-23-2004, 04:59 AM
The SE referral's could be easily stripped of Cart IDs and Session ID's with a couple of lines of code ... actually any external referral probably should have them stripped for that matter.
I am wondering how the spider got a hold of a shopping cart ID in the first place. SID's are one thing, but Cart ID's is another. What Cart are you using respree?
respree
07-23-2004, 09:39 AM
Hi Dodger:
We use a shopping cart program called SoftCart (http://www.mercantec.com/Products/SoftCart.html).
If you'd like to take a look, here is the Yahoo! listing (http://search.yahoo.com/search?p=canvas+transfers&ei=UTF-8&fr=fp-tab-web-t&cop=mss&tab=) (see result #3).
Slurp indexed the page as:
http://www.respree.com/cgi-bin/SoftCart.exe/scstore/sitepages/canvas-prints3.html?L+scstore+yzfg9258ffa260a2+1072502833
where "yzfg9258" (in the URL above) represents the shopping cart ID.
It should have indexed it as:
http://www.respree.com/scstore/sitepages/canvas-prints3.html
Here's how Google indexed the same page:
http://www.google.com/search?sourceid=navclient&q=canvas+transfers+discover
(See result #1)
Could you provide some guidance on what code could fix this? Much appreciated.
Dodger
07-23-2004, 08:28 PM
We use a shopping cart program called SoftCart (http://www.mercantec.com/Products/SoftCart.html).
If you'd like to take a look, here is the Yahoo! listing (http://search.yahoo.com/search?p=canvas+transfers&ei=UTF-8&fr=fp-tab-web-t&cop=mss&tab=) (see result #3).
That is the same problem I noticed on some other stores from SoftCarts list of Merchants (http://www.mercantec.com/Merchants/SoftCartCustomers.html) that are using the software. One in particular is Main Street Toys (http://search.yahoo.com/search?p=site%3Awww.mainsttoys.com&ei=UTF-8&n=100&fl=0&dups=1&fr=my_top&b=901) and every listing I come across are dead in the Yahoo results. Very serious problem indeed.
In the case of Main Street though ... they do have some of the same links present in their Google results as well. Not all, but it appears that Googlebot is also susceptible to these links. The links do not contain any normal ID markings that you would normally see in the query paramaters (SID, ID, etc). They also contain alpha characters too, so that is why I think Googlebot is getting tripped up on them.
Since this is affecting not only you, but all their Merchants that use this Cart ... I would go to their Support people and see what they can do to eliminate the problem for you and the other Merchants. They should be able to come up with a fix for it. It may be something simple in the setup of the Cart -- in OSCommerce for example, they have a spider friendly Url setting you can turn on (it is in beta though...and is not full-proof).
First look at their FAQ http://www.mercantec.com/Support/FAQ/SoftCartFAQ.html for any help. There are some features to this product that can be controlled -- such as indexing (I have know idea what index they are talking about).
There is also mention of ID numbers appearing in the URL that will change from page to page. According to the FAQ, this is done for anti-caching purposes so it will serve up the most current page. I think this (excuse me) is a load of hogwash and can be handled a lot more efficiently than that internally, and does not need a Url to constantly be changing.
Also you may try the manual that is available for this. It may be that you can turn certain features on and off. I think this anti-cache feature, if it is one that can be turned off, then that is where your problem lies. The Urls in Yahoo and Google that you are seeing, just may be this Cache parameter. If so, it needs to be turned off completely. But I would contact the Support for the cart to have them workaround this in the future. They should also inform their customers what is going on so they do not run into the same problems.
jcoronella
08-04-2004, 05:09 PM
I'm not sure this is really anything to be concerned about. Try:
http://search.yahoo.com/search?p=allinurl%3Alinux&ei=UTF-8&fr=slv1-&n=100&fl=0&x=wrt
http://search.yahoo.com/search?p=allinurl%3Adirectory&ei=UTF-8&fr=slv1-&n=100&fl=0&x=wrt
http://search.yahoo.com/search?p=allinurl%3Alinks&ei=UTF-8&fr=slv1-&n=100&fl=0&x=wrt
http://search.yahoo.com/search?p=allinurl%3Apartners&ei=UTF-8&fr=slv1-&n=100&fl=0&x=wrt
and so on. I think it is an issue with what allinurl is actually returning.
Dodger
08-10-2004, 09:01 PM
I don't think allinurl: is a valid search query at Yahoo. Try inurl: instead ... this is your linux example (http://search.yahoo.com/search?p=inurl%3Alinux&ei=UTF-8&n=100&fl=0&x=wrt)
Mikkel deMib Svendsen
08-11-2004, 05:30 PM
Session IDs are not as easy to avioid for search engines as one might think. There is no standard for how to implement session IDs across the web - it's pretty much up to ever single vendor and web developer. But trust me, search engine spider engineers do not like them! Session IDs really mess up spidering.
I've seen examples of up to 600,000 versions of the same page indexed with different session IDs. This is, off course, a waste of any engines spidering and indexing resources and have no value to users. When such problems is identified by engines they will deal with it - some way or another. The problem is that we, as website owners, have no control of what way they chose to deal with the situation. Iv've seen them do anything from remove the entire site, to just de-rank pages. In any case it usally end up hurting the websites visability in search engines.
The best way to deal with it, in my oppinion, is not to issue session IDs at all before you really need them. You do not need to preserve state when users are just looking at your products. You only need to hold track of them once they start shopping and search engines don't do that anyway.
Knowing that this is not always a possible option with the system you have, the second best option is to identify important spiders by agent name and just not give them session IDs at all. This is a valid solution that search engines seem to handle fine. Several of the large search engines have confirmed at conferences that's it's an accepted method to use - in other words: Just because we identify their spiders to remove session ID's they will not look at it as being cloaking (in case you don't know what that is, forget about it :)).
So, to summarize my experince:
- Session IDs IS a problem
- YOU have to fix it - you may not like how the engines deal with it
respree
08-11-2004, 05:40 PM
Thanks for you input, guys.
I believe I have the indexing problem fixed on a go-forward basis. A Perl script detects if the visitor is a bot, then serves them a page without the ID's.
My problem now is what to do with the pages that have been indexed in the past. My tech admin is currently attempting to remove the session and shopping cart ID's which were incorrectly indexed using mod_rewrite. Hopefully, his efforts will be successful.
I appreciate all the input.
Mikkel deMib Svendsen
08-11-2004, 05:50 PM
Sounds good :)
Now, the second part ...
If search engines get one URL (without the session ID) and users another it is likely that you will recieve some (or hopefully a lot) of inbound links in the format the users see. Most will simply copy and paste the URL in the location bar of they want to link to a specific product. As the session ID is most likely not valid after the session end (there are some security issues related to this!) you do not really want people to return to these URLs - and you want spiders to know that the "real" url look different. So, I would recmomend that you 301 redirect request for URLs with invalid session IDs to the original URL, without the ID - or, if it's a user, issue a new one (if you really have to).