[whatwg] URL parsing and same-document references [was: Re: Citing multiple <blockquote> elements in HTML5]

Sat Dec 13 15:12:39 PST 2008

Nils Dagsson Moskopp ha scritto:
> Am Samstag, den 13.12.2008, 19:09 +0100 schrieb Calogero Alex
> Baldacchino:
>   
>> Actually I'm not from any faction, to be honest. I think a rationale for 
>> that may be "people write strange things, both in address bars and in 
>> html code", thus relaxing rules when parsing an URL is meaningful; but I 
>> think when resolving and recomposing a whole URI the strictest rules 
>> should be applied.
>>     
> Accepting weird input is not a problem here, outputting is. Try writing
> a valid URI into the address bar, then get an invalid displayed.
>
>
> Greetings
>   

Could you make an example, please? I wasn't able to reproduce such in 
IE7 - Opera 9.27 (e.g., 
"http://real.addressofasite.com/index.html#foo%20bar" wasn't changed 
into "http://real.addressofasite.com/index.html#foo bar").

Anyway, I guess you got the point. Relaxed parsing rules are for input 
URLs, but after parsing, a normalization and/or the resolution algorithm 
should be applied, and the showed URL, being absolute and complete, 
should conform to RFC3986. Actual resolution algorithm (section 2.5.3 of 
html5 spec) does not mention fragment identifiers explicitly, and, 
although its 10th step says "Apply any relevant conformance criteria of 
RFC 3986 and RFC 3987, returning an error and aborting these steps if 
appropriate.", step 9 says "Apply the algorithm described in RFC 3986 
section 5.2 Relative Resolution, using url as the potentially relative 
URI reference (R), and base as the base URI (Base)": AIUI, the algorithm 
described in section 5.2 of rfc3986 might be applied to each component 
of an URI without building a complete URI (instead, leaving each part 
separated and held as a property of an object - a components 
recomposition algorithm is defined in section 5.3 of rfc3986, but that's 
not a 'must'); when a single component of an URI is to be handled, 
rfc3986 does not require %-encoding as a 'must', thus the freedom of 
interpretations and the different behaviors in different UAs, leading to 
inconsistent results when copying a URL from a UA and pasting it into 
another one. I think a uniform behaviour should be defined as standard 
(and implemented!), instead (the concern you rised about copy&paste 
perhaps results in a further issue regarding how line breaks should be 
handled by parsing rules - e.g. stripped like leading and trailing 
characters).

Regards,
Alex

 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

 Sponsor:
 CheBanca! La prima banca che ti dà gli interessi in anticipo.
* Fino al 4,70% sul Conto Deposito, zero spese e interessi subito. Aprilo!
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=7918&d=14-12