[whatwg] Proposal for improved handling of '#' inside of data URIs
Michael A. Puls II
shadow2531 at gmail.com
Sun Sep 11 07:21:48 PDT 2011
On Sat, 10 Sep 2011 17:15:09 -0400, Daniel Holbert <dholbert at mozilla.com>
> Browsers handle the "#" character in data URIs very differently, and the
> arguably "correct" behavior is probably not what authors actually want
> in many cases.
> This could be more intuitive/do-what-I-mean if we restricted the cases
> under which "#" is treated as a fragment-ID delimiter inside of data
> URIs. In particular: when a "#" character is followed by ">" or "<" in
> a data URI, I propose that we *don't* treat the "#" as a delimiter, and
> instead just treat it as part of the encoded document.
Not only must "#" be "%23" if you don't want it as a frag id, but ">" and
"<" should be "%3E" and "%3C".
Encoding the data (markup for example) for the data URI is simple. Just
use encodeURIComponent(markup) (on a UTF-8 page) in JS on the data. You
still hand-author the markup. You just paste the markup into a textarea
and have something (like encodeURIComponent()) percent-encode it for you.
Of course, if you can percent-encode everything needed as you type, you
can hand-author the URI data. But, who wants to do that, except for simple
data? It's like hand-authoring mime messages. It's not something you would
normally do to create an email or mht file.
If you need to encode the data URI data as base64 instead, you can do
encodeURIComponent(btoa(unescape(encodeURIComponent(markup)))); (on a
And, there's already <http://software.hixie.ch/utilities/cgi/data/data>
Given that, I personally don't think browsers should be too lax with
(bookmarklet) authors already get away with that (even though there's
pages like <http://shadow2531.com/js/jsuri.html>), but at the same time
often run into unexpected (to them) percent-decoding of the URI data
before it's executed.
More information about the whatwg