PDA

View Full Version : % in URL


Incubator
05-12-2005, 11:05 AM
Hello, I was wondering if anyone could confirm that a query in a url we have for a clients ends with something like /products/details/7000%5Fs1/

Will the % mark cause a problem for spidering, if so would a mod rewrite help at all with this or will it be ok left the way it is

Cheers

WC

Bernard
05-12-2005, 11:25 AM
I see listings in SERPs all the time with % in the URI. I assume that means the spiders were able to do their business. :)

David Wallace
05-13-2005, 12:46 PM
This indicates a space in a file name. For example domain.com/page one.html may look like this in the SERPs or in your address bar:

domain.com/page%20one.html

It is simply filling the space with a %20.

Bernard
05-13-2005, 01:45 PM
Sorry WC, just noticed that you were asking about %5F.

%20 = Hexadecimal for 32 = Ascii value for ' '.

%5F = Hexadecimal for 245 = Ascii value for extended character set.

I'm not sure if it is a legal character for all file systems, spiders, etc.:For original character sequences that contain non-ASCII characters,
however, the situation is more difficult. Internet protocols that
transmit octet sequences intended to represent character sequences
are expected to provide some way of identifying the charset used, if
there might be more than one [RFC2277]. However, there is currently
no provision within the generic URI syntax to accomplish this
identification. An individual URI scheme may require a single
charset, define a default charset, or provide a way to indicate the
charset used.

It is expected that a systematic treatment of character encoding
within URI will be developed as a future modification of this
specification.

RFC 2396 - Sec. 2.1 (http://www.faqs.org/rfcs/rfc2396.html)

Incubator
05-13-2005, 01:55 PM
WOW, thanks for that, its way over my head...Orion can you jump in here and break it down in lamen terms Cheers

WC

Bernard
05-13-2005, 02:19 PM
Well, I'm no Orion, but my understanding is that for characters that do not translate to basic US characters (which is mostly everything with an ascii value above 127 decimal), there is no way to specify how to interpret the character. Ie. different character sets may interpret the value differently.

On a web page, you can specify the correct interpretation for characters by specifying the character set with something like:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

There is no way to do that with a URI. Aside from the search engines and spiders, using ascii values like %5F *might* cause problems with cell phones, PDAs, or accessibility devices.

Incubator
05-13-2005, 03:07 PM
Thanks Bernard, now its making sense....


Wc