[whatwg] Web Addresses vs Legacy Extended IRI

Mon Mar 23 14:16:21 PDT 2009

Ian Hickson wrote:
> [cc'ed DanC since I don't think Dan is on the WHATWG list, and he's the 
> editor of the draft at this point]
> 
> On Mon, 23 Mar 2009, Julian Reschke wrote:
>>> For example, curl will not refuse to fetch the URL 
>>> http://example.com/% despite that URL being invalid.
>> Should it refuse to?
> 
> The URI/IRI specs don't say, because they don't cover error handling.

Indeed.

> This is what the Web addresses spec is supposed to cover. It doesn't 
> change the rules for anything that the URI spec defines, it just also says 
> how to handle errors.
> 
> That way, we can have interoperability across all inputs.
> 
> I personally don't care if we say that http://example.com/% should be 
> thrown out or accepted. However, I _do_ care that we get something that is 
> widely and uniformly implemented, and the best way to do that is to write 
> a spec that matches what people have already implemented.

I'm OK with doing that for browsers.

I'm *very* skeptical about the idea that it needs to be the same way 
everywhere else.

>>> Thus, we need a spec they are willing to follow. The idea of not 
>>> limiting it to HTML is to prevent tools that deal both with HTML and 
>>> with other languages (like Atom, CSS, DOM APIs, etc) from having to 
>>> have two different implementations if they want to be conforming.
>> I understand that you want everybody to use the same rules, and you want 
>> these rules to be the ones needed for HTML content. I disagree with 
>> that.
> 
> I want everyone to follow the same rules. I don't care what those rules 
> are, so long as everyone (or at least, the vast majority of systems) are
> willing to follow them. Right now, it seems to me that most systems do the 
> same thing, so it makes sense to follow what they do. This really has 
> nothing to do with HTML.

Your perspective on "most systems" differs from mine.

>> Do not leak that stuff into places where it's not needed.
> 
> Interoperability and uniformity in implementations is important 
> everywhere. If there are areas that are self-contained and never interact 
> with the rest of the Internet, then they can do whatever they like. I do 
> not believe I have ever suggested doing anything to such software. 
> However, 'curl' obviously isn't self-contained; people will take URLs from 
> e-mails and paste them into the command line to fetch files from FTP 
> servers, and we should ensure that this works the same way whether the 
> user is using Pine with wget or Mail.app with curl or any other 
> combination of mail client and download tool.

How many people paste URLs into command lines? And of these, how many 
remember that they likely need to quote them?

>> For instance, there are lots of cases where the Atom feed format can be 
>> used in absence of HTML.
> 
> Sure, but the tools that use Atom still need to process URLs in the same 
> way as other tools. It would be very bad if a site had an RSS feed and an 
> Atom feed and they both said that the item's URL was http://example.com/% 
> but in one feed that resulted in one file being fetched but in another it 
> resulted in another file being fetched.

Yes, that would be bad.

However, what seems to be more likely is that one tool refuses to fetch 
the file (because the URI parser didn't like it), while in the other 
case, the tool puts the invalid URL on to the wire, in which case the 
server's behavior decides.

I think this is totally ok, and the more tools reject the URL early, the 
better.

>>>> If you think it's worthwhile, propose that change to the relevant 
>>>> standards body (in this case IETF Applications Area).
>>> This was the first thing we tried, but the people on the URI lists 
>>> were not interested in making their specs useful for the real world. 
>>> We are now routing around that negative energy. We're having a meeting 
>>> later this week to see if the IETF will adopt the spec anyway, though.
>> Adopting the spec is not the same thing as mandating its use all over 
>> the place.
> 
> I think it is important that we have interoperable use of URLs in the 
> transitive closure of places that use URLs, starting from any common 
> starting point, like the "URL in an e-mail" example above. I believe this 
> includes most if not all Internet software. I also believe that in 
> practice most software is already doing this, though often in subtly 
> different ways since the URI and IRI specs did not define error handling.

If the consequence of this is that invalid URLs do not interoperate, 
then I think this is a *feature*, not a bug.

Best regards, Julian