[whatwg] base64 entities

Martin Janecke whatwg.org at kaor.in
Thu Aug 26 01:38:44 PDT 2010

Am 26.08.10 01:41, schrieb Adam Barth:
> On Wed, Aug 25, 2010 at 1:55 PM, Ian Hickson<ian at hixie.ch>  wrote:
>> On Wed, 25 Aug 2010, Adam Barth wrote:
>>> HTML should support Base64-encoded entities to make it easier for
>>> authors to include untrusted content in their documents without
>>> risking XSS.
>> Seems like a fine idea. Get browsers to implement it and I'll spec it.
> I've posted a patch for WebKit:
> https://bugs.webkit.org/show_bug.cgi?id=44641
> Some subtleties:
> 1) Some base64 decoders tolerate newlines.  We don't want to decode
> entities with newlines.
> 2) Decoding base64 results in binary data.  We'll need to convert that
> data to characters in order to deal with it in the DOM.  We use always
> use UTF8 for that transformation, regardless of the document's
> encoding.
> 3) Null characters are replaced with U+FFFD.
> 4) The empty base64 entity&%; is consumed and is replaced with the
> empty string.
> 5) Invalid base64 is rejected and the entity is not decoded.
> Adam

Is it necessary to consider compatibility issues here? In HTML4 this
seems to have been valid code (-> http://validator.w3.org/check):

<meta http-equiv="Content-type" content="text/html; charset=US-ASCII">
<title>base64 entity test</title>
<p>Look at these fine ASCII characters: &%4oCT;</p>

Now it would be interpreted differently. Could this lead to old
documents changing in meaning? Do we have to consider old documents that 
were not completely valid (e.g. lacked a doctype declaration)?

More information about the whatwg mailing list