View Full Version : &id - now it's official it's worth noting that the ampersand is important
The new Google Webmaster Guidelines make it official - &id= as a parameter in an URL is bad.
http://www.google.com/webmasters/guidelines.html
I didn't find the bullet point very helpful:
Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index.
Did they really mean &id= or was it id=. For example, would ?id= be safe to use.
The goal here is to reduce session IDs being included in the search results. Sessions are normally appended to the end of the URL. So the logic might be to allow ?id= but not &id=.
We can't search for question marks in inurl: searches in Google. The search inurl:".asp?id=" (http://www.google.com/search?q=inurl:%22.asp%3Fid%3D%22) returns 16,400,000 results.
I take that as proof. It really is the case that &id= is bad whereas ?id= is acceptable. That's an easy fix if you just happen to be using id as a hard coded parameter in your site, you may find it easier to change the order of the query string elements than anything else.
Its the id that causes the damage but as always try and not use either/or &id.
robwatts
06-06-2005, 02:44 PM
Whatever you do make sure you dont replace those id's with ego's...:p
I, Brian
06-06-2005, 06:29 PM
I'm curious as to why Google views "&ID" so heinously - do I presume that such URLs potentially mess with the docID storage of data?
Its the id that causes the damage but as always try and not use either/or &id.
That's just it. It's not id which causes the damage. It's precisely and explicitly &id which will cause the trouble.
I'm curious as to why Google views "&ID" so heinously - do I presume that such URLs potentially mess with the docID storage of data?
Google's a stat monster. The statistical evidence really must lean heavily towards the situation where &id in the query string signifies the presence of a session variable.
The variable docID is fine if it's not being used to hold a session value.
semanticist
06-09-2005, 10:58 AM
Nice find, Wail.
This is a logical move for G, but I have a feeling a lot of dolphins will get caught in this tuna net. Plenty of other IDs (products, etc.) use id=, and plenty of session IDs use arguments named things other than id=, such as sid=, etc.
That's just it. It's not id which causes the damage. It's precisely and explicitly &id which will cause the trouble.
Are you sure? That seems counter-intuitive (yes, I know they put & in the example, but...). If id= were the only variable in a dynamic URL (as per your example in your first post), then it would say ?id=, yet it would still be a session ID. And this would be acceptable? Could they have put the & in the example just to further show they were talking about specific arguments, but not have been focusing so much on the ampersand itself?
I like the dolphin and tuna analogy!
I'm sure as I can be. I can find lots of ?id= URLs in Google still but can't find any &id=. I included a search sample in my first post.
I've love for someone to post a Google query in this thread which returned results with &id= URLs in them.
As for &sid... yeah, this could become a can of worms, could poor Sid be the next to go? (worms, tuna, eww!)
>That's just it. It's not id which causes the damage.
Are you suggesting that I may be w..wr..wron [can't even say it!] ;)
I'm a simple man, I like my SEO simple too, takes the thinking out of the game, so....
&id is bad
&keyword is not bad
therefore id is to be avoided
Chris_D
06-09-2005, 09:06 PM
Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index
Session IDs are bad, because you end up with what Google perceives as duplicate content - i.e. the same content attributed to multiple URLS
i.e.
www.example.com/index.html&id=1232456
www.example.com/index.html&id=1232478
www.example.com/index.html&id=1232489
are all really the home page of the domain example.com - but each visitor has a session id... so every googlebot visit gets a 'new' url.
I see a lot of sites who insist on appending a session id onto a URL where you e.g. block cookies.
I'm glad Google have made a public statement about this - and great catch Wail!!
Pyrrhonist
06-13-2005, 05:42 PM
I consider the removal of session ID's from the URL to be some of the first work performed when presented with a new site. Therefore, any time that [&|?]id= would be present on a page would be if it was a cart, or for some other legitmate purpse.
Then again, i'm a bit of a stickler for traditional URL strings, so I usually end up rewriting all the URLS anyways.
Jeff Martin
06-13-2005, 09:03 PM
Even with the progress G has made indexing URLs with one valued pair, this just strengthens the case to have a mod rewrite in place, or to never implement a CMS with a dynamic url framework to being with.
Lsisten up CIO/CTOs - Talk to your web marketing / IT folk BEFORE buying that $25,000 CMS.
kevsh
06-14-2005, 05:32 PM
That's just it. It's not id which causes the damage. It's precisely and explicitly &id which will cause the trouble.
Sorry, lost me on this one - you were agreeing with the previous post (it's best not to use either &ID or ?ID) but then you state that only &ID is the issue ... In fact the ID is what you stay away from, period:
No &ID
No ?ID
No ID anywhere, anyhow, anyway
:)
mcanerin
06-14-2005, 05:55 PM
So www.idiot.com is now verbotten? :rolleyes:
How about:
www.amazon.com/exec/obidos/ASIN/0735712565/qid=1118691172/sr=2-5/ref=pd_bbs_b_2_5/102-3328444-8920168
or:
http://www.state.id.us/ (oh look! a .us site!) ;)
or:
www.bi.go.id/web/ (Bank of Indonesia - actually, the country code TLD for indonesia is .id - man, they are screwed!)
or:
vos.ucsb.edu/browse.asp?id=3
or:
well, you get my drift, I expect. It's not the "id" it's the "&id" as a complete string. Not the "&", not the "id", the combination of the two.
Ian
Pyrrhonist
06-14-2005, 06:05 PM
what about:
www.google.us/?id=ab7ded123415322dfaa7d5e35cb (right back at ya, mcanerin ;))
The & would only occur if it's a 2nd level item, and there's several cart systems that use ONLY the session id for tracking.
check out www.energyalternatives.ca
(click the online cart link)
What a stupid reason. Session variable in PHP is alwayssessid unless you predefine that (not many users do that). I haven't seen session to be put into ID. "ID" is commonly used as article ID or page ID or member ID.
ID for session? I bet that this is so in less than 5% of cases. How can Google make such a mistake? :p
mcanerin
06-14-2005, 07:05 PM
I'm inclined to agree about "sessionid=" unless they are already filtering it out. I think they are, if memory serves.
<checks> Never mind, they are not: http://www.google.com/search?q=allinurl:sessionid&hl=en&lr=&c2coff=1&start=20&sa=N
Ian
Pyrrhonist
06-14-2005, 07:15 PM
What a stupid reason. Session variable in PHP is alwayssessid unless you predefine that (not many users do that). I haven't seen session to be put into ID. "ID" is commonly used as article ID or page ID or member ID.
trans_sid session ID is always sessid but i've seen quite a few implementations where ID = sessid. phpBB (the bulletin board system) used to use this - I don't know if they've changed now or not. Invision power board uses "s" as their identifier.
Does anyone know off hand what IIS uses when it appends a sid?
mcanerin
06-14-2005, 09:35 PM
Does anyone know off hand what IIS uses when it appends a sid?
ASP by default uses session cookies - I think you have to actually program URL based sessionID's in the coding language of your choice - I don't believe there is a "default" URL session for IIS/ASP/NET
This is good from a URL issue, but of course has it's own challenges for shopping carts, etc.
Ian
Mikkel deMib Svendsen
06-14-2005, 10:21 PM
This is good from a URL issue, but of course has it's own challenges for shopping carts, etc.
Actually not :)
There is no reason to preserve state before users start shopping - and engines don't shop on your site anyway. Just don't require cookies or session ID'ed URLs before you really need it.
mcanerin
06-14-2005, 11:21 PM
Actually, I was referring to issues within shopping carts, as well as crossing servers, using certain clustered systems, etc, not SE issues - sorry, I wasn't clear.
Ian
Chris_D
06-15-2005, 12:46 AM
Hmmm....
If Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index. http://www.google.com/webmasters/guidelines.html
Then what's this then ehh... there are lots of indexed pages with &id= in urls here:
http://www.google.com/search?hl=en&lr=&q=+site:www.mamboserver.com+mamboserver
Not supplemental - main index. Included. Indexed.
Maybe GoogleGuy can clarify this contradiction? Maybe the Google guideline page should say "as we don't really want to include these pages in our index".....
:)
I, Brian
06-15-2005, 05:13 AM
I second that - I see "ID" as a parameter in a dynamic URL being variably indexed - I wonder if the use of sessions on different platforms may be an useful here, ie PHP vs ASP applications.
Mikkel deMib Svendsen
06-15-2005, 05:29 AM
I wonder if the use of sessions on different platforms may be an useful here, ie PHP vs ASP applications.
No, I don't think so :) - it's a platform independant problem ...
The problems with session IDs is really very simple. 99% of the times it's due to what I usally call "lazy programmin"! Not one single of the websites I have ever consulted on really need to preserve state on any of the pages that they want to get indexed: Front page, category pages, product pages, info pages, community pages, article archives etc. - NONE of them need session IDs in cookies or URLs. It's just lazy programming when you find them.
Sorry, lost me on this one - you were agreeing with the previous post (it's best not to use either &ID or ?ID) but then you state that only &ID is the issue ... In fact the ID is what you stay away from, period:
No &ID
No ?ID
No ID anywhere, anyhow, anyway
:)
I was stating, as clearly as I know how, that it's It's precisely and explicitly &id which will cause the trouble.
I made that comment because my initial Google searches easily turned up ?id= variables in query strings and couldn't find any &id= statements.
Since then Chris_D has posted http://www.google.com/search?hl=en&lr=&q=+site:www.mamboserver.com+mamboserver which clearly shows that &id= is indexed by Google too!
It's clearly a foggy issue.
I have seen actual full blown session (or session like) query stings in Google's index. I think if enough static pages link to the the same session over the a period of time then Google will overcome its reluctance to index them. The same might be true for ?id= or &id=.
I think it's worth advising against having &id= in URLs since Google clearly says it's an issue. We're just debating how much of an issue it is.
Mikkel deMib Svendsen
06-15-2005, 05:59 AM
The issue with Google is that they try to guess what parameter may hold session variables (or any other non-qualifying parameter such as click IDs, time stamped URLs etc). Ever since they started doing that it has been the best advice to stay away from any parameter name or value that may look as session dependand variables.
Google does in fact index many obscure URLs and often because other pages are linking to them. However, true sessionized URLs will almost never be linked from more than one page - at most, because only one user get the exact ID.
This guessing game is, I must say, a very dangerous path - not the least for webmasters. You need to make sure your system can handle these "random" requests from spiders.
However, the bottom line is the same as it has allways been: Make sure to serve a simple, one dimensional, arcitechture to spiders (and users in general)