[whatwg] URL parsing and same-document references [was: Re: Citing multiple <blockquote> elements in HTML5]

Calogero Alex Baldacchino alex.baldacchino at email.it
Sat Dec 13 10:09:17 PST 2008

Nils Dagsson Moskopp ha scritto:
> Am Freitag, den 12.12.2008, 20:36 +0100 schrieb Calogero Alex
> Baldacchino:
>> The above (but the 'double check' I was suggesting) is about the way 
>> Firefox (2.x and 3.0.4) behaves (both href="#foo%20bar" and, in a 
>> different page, href="./example.html#foo%20bar" match id="foo bar"), 
>> while IE7 and Opera 9.x perform an exact comparison, and show, in the 
>> address bar, an url with eventual blank spaces, thus applying the 
>> relaxation allowed by URL parsing rules, but not conforming to RFC 3986, 
>> as a complete URI string.
> Whenever I copypaste an URI from the address bar to any other program, I
> am severely annoyed by this, especially when spaces (delimiters !) are
> part of the fake-URI. A chat or office program, for example, is unable
> to highlight the fake-URI anymore, (how could it ?), also pasting it
> into source code can create all kind of validation errors. And whenever
> I get a bastardized URI via chat or mail, only a part of it is
> clickable.
> Can someone from the web browser faction please state if there is any
> data to support breaking RFC-compatibility ? Because as I see it, its
> something that makes it appear nicer, but breaks whenever URIs are to be
> transferred / communicated.

Actually I'm not from any faction, to be honest. I think a rationale for 
that may be "people write strange things, both in address bars and in 
html code", thus relaxing rules when parsing an URL is meaningful; but I 
think when resolving and recomposing a whole URI the strictest rules 
should be applied.

> Getting to the problem mentioned here, the robustness principle says
> that id="foo bar" should be accepted, but nevertheless invalid - because
> a fragment with a space can never be part of an URI.

Indeed, that's not part of an URI, but a dereferenced component: when 
splitting an URI into its components, there is no need to keep %-encoded 
characters (RFC3986 says separated components can be decoded, thus, 
AIUI, both href="#foo bar" and id="foo bar" respect to conformance 
rules, but when resolving "#foo bar" into a complete, absolute URI, the 
result should always look like 
"http://example.org/something.html#foo%20bar" to be conforming).

 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 Proteggi la tua auto
* Garanzia furto e incendio a soli 30 euro! Offerta valida fino al 31 Dicembre! Non perdere l’occasione!
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8509&d=13-12

More information about the whatwg mailing list