[whatwg] Web Addresses vs Legacy Extended IRI

Mon Mar 23 11:44:43 PDT 2009

[cc'ed DanC since I don't think Dan is on the WHATWG list, and he's the 
editor of the draft at this point]

On Mon, 23 Mar 2009, Julian Reschke wrote:
> > 
> > For example, curl will not refuse to fetch the URL 
> > http://example.com/% despite that URL being invalid.
> 
> Should it refuse to?

The URI/IRI specs don't say, because they don't cover error handling.

This is what the Web addresses spec is supposed to cover. It doesn't 
change the rules for anything that the URI spec defines, it just also says 
how to handle errors.

That way, we can have interoperability across all inputs.

I personally don't care if we say that http://example.com/% should be 
thrown out or accepted. However, I _do_ care that we get something that is 
widely and uniformly implemented, and the best way to do that is to write 
a spec that matches what people have already implemented.

> > Thus, we need a spec they are willing to follow. The idea of not 
> > limiting it to HTML is to prevent tools that deal both with HTML and 
> > with other languages (like Atom, CSS, DOM APIs, etc) from having to 
> > have two different implementations if they want to be conforming.
> 
> I understand that you want everybody to use the same rules, and you want 
> these rules to be the ones needed for HTML content. I disagree with 
> that.

I want everyone to follow the same rules. I don't care what those rules 
are, so long as everyone (or at least, the vast majority of systems) are
willing to follow them. Right now, it seems to me that most systems do the 
same thing, so it makes sense to follow what they do. This really has 
nothing to do with HTML.

> Do not leak that stuff into places where it's not needed.

Interoperability and uniformity in implementations is important 
everywhere. If there are areas that are self-contained and never interact 
with the rest of the Internet, then they can do whatever they like. I do 
not believe I have ever suggested doing anything to such software. 
However, 'curl' obviously isn't self-contained; people will take URLs from 
e-mails and paste them into the command line to fetch files from FTP 
servers, and we should ensure that this works the same way whether the 
user is using Pine with wget or Mail.app with curl or any other 
combination of mail client and download tool.

> For instance, there are lots of cases where the Atom feed format can be 
> used in absence of HTML.

Sure, but the tools that use Atom still need to process URLs in the same 
way as other tools. It would be very bad if a site had an RSS feed and an 
Atom feed and they both said that the item's URL was http://example.com/% 
but in one feed that resulted in one file being fetched but in another it 
resulted in another file being fetched.

> > > If you think it's worthwhile, propose that change to the relevant 
> > > standards body (in this case IETF Applications Area).
> > 
> > This was the first thing we tried, but the people on the URI lists 
> > were not interested in making their specs useful for the real world. 
> > We are now routing around that negative energy. We're having a meeting 
> > later this week to see if the IETF will adopt the spec anyway, though.
> 
> Adopting the spec is not the same thing as mandating its use all over 
> the place.

I think it is important that we have interoperable use of URLs in the 
transitive closure of places that use URLs, starting from any common 
starting point, like the "URL in an e-mail" example above. I believe this 
includes most if not all Internet software. I also believe that in 
practice most software is already doing this, though often in subtly 
different ways since the URI and IRI specs did not define error handling.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'