[whatwg] Proposal for improved handling of '#' inside of data URIs

Michael A. Puls II shadow2531 at gmail.com
Sun Sep 11 11:44:13 PDT 2011

On Sun, 11 Sep 2011 12:14:08 -0400, Glenn Maynard <glenn at zewt.org> wrote:

> On Sun, Sep 11, 2011 at 10:21 AM, Michael A. Puls II
> <shadow2531 at gmail.com>wrote:
>> Not only must "#" be "%23" if you don't want it as a frag id, but ">"  
>> and
>> "<" should be "%3E" and "%3C".
> I'm not sure about the spec on this, but Firefox actively unencodes %3E  
> and
> %3C.  Pasting this into the address bar and copying it back out turns  
> them
> back into literal < and > characters:
> data:text/html,foo<div style=height:3000px></div><span
> id='vector<int>'>bar</span>#vector%3Cint%3E
> which suggests that escaping these characters isn't necessary or  
> encouraged.

Firefox aggressively decodes %HH in the address field to make the URI  
human-readable (which I hate btw, but that's another discussion). It  
usually copies the correct/original value to the clipboard though. In this  
case though, Firefox copies < and > to the clipboard decoded just like you  
say. I can't say I think that's a good idea and wonder if it's intentional  
as you suspect. Chrome does it too though. Opera doesn't and leaves them  
encoded, which I think is better.

I love what Safari does and I think what it does is the right thing to do.  
It will resolve the data URI and convert raw spaces to %20 and convert <  
and > to %3C and %3E (and anything else that should be encoded in a URI to  
%HH) if they're not encoded. It could be argued that Safari shouldn't do  
that visually. But, as far as copying to the clipboard, it should and  
does, which is awesome.

I don't think < and > are in the list of safe URI characters. All  
URI-based functions seem to percent-encode them too. Keeping them encoded  
is definitely good for data URIs in text/plain documents so the don't  
interfere with the < and > that encase the URI.


More information about the whatwg mailing list