[whatwg] Proposal for improved handling of '#' inside of data URIs

Bjoern Hoehrmann derhoermi at gmx.net
Sat Sep 10 17:50:19 PDT 2011

* Daniel Holbert wrote:
>  In particular: when a "#" character is followed by ">" or "<" in a data 
>URI, I propose that we *don't* treat the "#" as a delimiter, and instead 
>just treat it as part of the encoded document.

Your proposal does not explain whether this applies to base64 encoded
ones, whether the angle brackets have to occur literally or if they can
also occur in their percent-encoded form, or how you handle multiple '#'
characters like in data:...,...#...<...#example. You also don't say on
which layer this would happen. Obviously having this in the URI syntax
specification with an expectation that all parsing libraries would be
updated to treat the 'data' as a special case is unlikely to go down
well (problem starting with angle brackets being disallowed entirely).

If treating the part after the first "#" as fragment identifier doesn't
cause compatibility problems, as you seem to be suggesting, then that's
great, explaining URI processing would be much simpler. We also do not
have special rules for <http://example.com/search?q=#hashtag> despite
someone crafting such an address most likely means the "#" to be data.
There are a number of implementations where the "#" is treated as data,
'javascript' and 'mailto' come to mind, but there it's unreliable and
not widely used, and, more importantly, it's all or nothing, not guess-

You have to escape all sorts of characters in 'data' URLs to make them
work reliably, you have to escape spaces for instance in order to use
them as part of a white-space separated list of URLs or other syntax
that relies on URLs containing no spaces, and you have to escape '#'s
so they work reliably right now and for however long the current pack
of browsers will be around, even if you don't care about all the non-
browser implementations that are unlikely to support this.

If there isn't very clear evidence that this is needed for reasons of
compatibility, it seems preferable by far to have simpler rules that
actually reflect how this stuff works everywhere than have some magic
rules that apply some of the time that robust code cannot rely upon. I
wouldn't mind such fixups in the address bar, as that is a user input
do what I mean interface, but beyond that it just adds complexity for
very little convenience in edge cases.
Björn Höhrmann · mailto:bjoern at hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

More information about the whatwg mailing list