[whatwg] base64 entities

Thu Aug 26 23:02:03 PDT 2010

On Thu, Aug 26, 2010 at 3:52 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
> On 8/26/10 6:45 PM, Adam Barth wrote:
>>>
>>> Note that this issue means that using atob or btoa for dealing with this
>>> is
>>> a huge pain if non-ASCII chars are involved, since those take and return
>>> byte arrays masquerading as JS strings, not actual Unicode strings.
>>
>> I'm slightly confused how that works.  How do you represent arbitrary
>> binary data as characters?
>
> You mean how do atob/btoa take their binary data in JS-land?  You take your
> byte array, and convert it to a sequence of two-byte units by setting the
> high byte to 0.  This sequence of two-byte units is a JS string.

Crazy.

>> Another option is to provide a base64
>> encoder/decoder that uses UTF8 to encode/decode the binary.
>
> Not sure what the exact proposal here is.

The pipeline that makes sense to me is the following:

Unicode base64 character
--base64decode-->
byte array
--UTF8 decode-->
Unicode characters

Once we have real byte arrays in JavaScript, it probably makes sense
to expose a base64 decode function that takes unicode and produces an
honest byte array.  We might also want to expose a function that takes
byte arrays and interprets them as UTF8 (to produce unicode
characters).

>> Because<script>  does not decode entities in HTML, the attacker will
>> be limited to what he or she can do with alphanumeric characters
>
> OK.  I had misunderstood what you were proposing for <script> here.  The
> point is that inside <script> this base64 thing will only be useful for
> setting innerHTML, right?

Yes.  The point is that it's safe in most (all?) contexts, although
it's most useful between tags and in attributes.

Adam