Does ECMAScript currently have a built-in function for encoding & decoding base-64?  We might want a built-in base-64 encoder / decoder if we are implementing this base64-encoded entities.<div><br></div><div>- Ryosuke</div>


<div><br><div class="gmail_quote">On Wed, Aug 25, 2010 at 1:50 PM, Adam Barth <span dir="ltr"><<a href="mailto:w3c@adambarth.com">w3c@adambarth.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


== Summary ==<br>

<br>

HTML should support Base64-encoded entities to make it easier for<br>

authors to include untrusted content in their documents without<br>

risking XSS.  For example,<br>

<br>

&%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==;<br>

<br>

would decode to "HTML5's <canvas> element is awesome."  Notice that<br>

the < and > characters get emitted by the parser as character tokens.<br>

That means they can't be used by an attacker for XSS.  These entities<br>

can be used safely both in intertag content as well as in attribute<br>

values.<br>

<br>

== Use Case ==<br>

<br>

Authors often combine trusted and untrusted text into HTML documents.<br>

If done naively, an attacker can supply HTML markup, including script,<br>

in the untrusted script, resulting in a cross-site script attack.<br>

Authors want a way to include untrusted content safely in HTML<br>

documents without risking XSS.<br>

<br>

== Workarounds ==<br>

<br>

Currently, authors must carefully escape all untrusted content to<br>

prevent an attacker from injecting HTML.  Unfortunately, authors often<br>

apply the incorrect escaping or forget to escape entirely, resulting<br>

in security vulnerabilities.  Escaping content in HTML is tricky<br>

because authors need to use different escaping rules for different<br>

contexts.  For example, PHP's htmlspecialchars isn't sufficient in the<br>

following contexts:<br>

<br>

<img alt=<?php echo htmlspecialchars($name) ?> src="..."><br>

<br>

<script><br>

elmt.innerHTML = 'Hi there <?php echo htmlspecialchars($name) ?>.';<br>

</script><br>

<br>

Some framework convert untrusted content to a series of hex entities,<br>

but that greatly increases the length of the content.<br>

<br>

== Proposal ==<br>

<br>

We should add a new kind of HTML entity that authors can use to<br>

include untrusted content.  In particular, authors should be able to<br>

supply untrusted content in base64, which nicely avoids any scary<br>

characters.  We can avoid clashes with existing or future entities by<br>

using a new character after the & escape character.  In particular, we<br>

could use the % character:<br>

<br>

&%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==;<br>

<br>

Authors could then supply untrusted content as follows:<br>

<br>

<img alt=<?php echo htmlescape($name) ?> src="..."><br>

<br>

where htmlescape is defined as follows:<br>

<br>

function htmlescape($text) {<br>

  return "&%".base64_encode($text).";";<br>

}<br>

<font color="#888888"><br>

Adam<br>

</font></blockquote></div><br></div>