[whatwg] URL parsing and same-document references [was: Re: Citing multiple <blockquote> elements in HTML5]
Calogero Alex Baldacchino
alex.baldacchino at email.it
Sat Dec 13 15:12:39 PST 2008
Nils Dagsson Moskopp ha scritto:
> Am Samstag, den 13.12.2008, 19:09 +0100 schrieb Calogero Alex
> Baldacchino:
>
>> Actually I'm not from any faction, to be honest. I think a rationale for
>> that may be "people write strange things, both in address bars and in
>> html code", thus relaxing rules when parsing an URL is meaningful; but I
>> think when resolving and recomposing a whole URI the strictest rules
>> should be applied.
>>
> Accepting weird input is not a problem here, outputting is. Try writing
> a valid URI into the address bar, then get an invalid displayed.
>
>
> Greetings
>
Could you make an example, please? I wasn't able to reproduce such in
IE7 - Opera 9.27 (e.g.,
"http://real.addressofasite.com/index.html#foo%20bar" wasn't changed
into "http://real.addressofasite.com/index.html#foo bar").
Anyway, I guess you got the point. Relaxed parsing rules are for input
URLs, but after parsing, a normalization and/or the resolution algorithm
should be applied, and the showed URL, being absolute and complete,
should conform to RFC3986. Actual resolution algorithm (section 2.5.3 of
html5 spec) does not mention fragment identifiers explicitly, and,
although its 10th step says "Apply any relevant conformance criteria of
RFC 3986 and RFC 3987, returning an error and aborting these steps if
appropriate.", step 9 says "Apply the algorithm described in RFC 3986
section 5.2 Relative Resolution, using url as the potentially relative
URI reference (R), and base as the base URI (Base)": AIUI, the algorithm
described in section 5.2 of rfc3986 might be applied to each component
of an URI without building a complete URI (instead, leaving each part
separated and held as a property of an object - a components
recomposition algorithm is defined in section 5.3 of rfc3986, but that's
not a 'must'); when a single component of an URI is to be handled,
rfc3986 does not require %-encoding as a 'must', thus the freedom of
interpretations and the different behaviors in different UAs, leading to
inconsistent results when copying a URL from a UA and pasting it into
another one. I think a uniform behaviour should be defined as standard
(and implemented!), instead (the concern you rised about copy&paste
perhaps results in a further issue regarding how line breaks should be
handled by parsing rules - e.g. stripped like leading and trailing
characters).
Regards,
Alex
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
CheBanca! La prima banca che ti dà gli interessi in anticipo.
* Fino al 4,70% sul Conto Deposito, zero spese e interessi subito. Aprilo!
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=7918&d=14-12
More information about the whatwg
mailing list