Does ECMAScript currently have a built-in function for encoding & decoding base-64? We might want a built-in base-64 encoder / decoder if we are implementing this base64-encoded entities.<div><br></div><div>- Ryosuke</div>
<div><br><div class="gmail_quote">On Wed, Aug 25, 2010 at 1:50 PM, Adam Barth <span dir="ltr"><<a href="mailto:w3c@adambarth.com">w3c@adambarth.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
== Summary ==<br>
<br>
HTML should support Base64-encoded entities to make it easier for<br>
authors to include untrusted content in their documents without<br>
risking XSS. For example,<br>
<br>
&%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==;<br>
<br>
would decode to "HTML5's <canvas> element is awesome." Notice that<br>
the < and > characters get emitted by the parser as character tokens.<br>
That means they can't be used by an attacker for XSS. These entities<br>
can be used safely both in intertag content as well as in attribute<br>
values.<br>
<br>
== Use Case ==<br>
<br>
Authors often combine trusted and untrusted text into HTML documents.<br>
If done naively, an attacker can supply HTML markup, including script,<br>
in the untrusted script, resulting in a cross-site script attack.<br>
Authors want a way to include untrusted content safely in HTML<br>
documents without risking XSS.<br>
<br>
== Workarounds ==<br>
<br>
Currently, authors must carefully escape all untrusted content to<br>
prevent an attacker from injecting HTML. Unfortunately, authors often<br>
apply the incorrect escaping or forget to escape entirely, resulting<br>
in security vulnerabilities. Escaping content in HTML is tricky<br>
because authors need to use different escaping rules for different<br>
contexts. For example, PHP's htmlspecialchars isn't sufficient in the<br>
following contexts:<br>
<br>
<img alt=<?php echo htmlspecialchars($name) ?> src="..."><br>
<br>
<script><br>
elmt.innerHTML = 'Hi there <?php echo htmlspecialchars($name) ?>.';<br>
</script><br>
<br>
Some framework convert untrusted content to a series of hex entities,<br>
but that greatly increases the length of the content.<br>
<br>
== Proposal ==<br>
<br>
We should add a new kind of HTML entity that authors can use to<br>
include untrusted content. In particular, authors should be able to<br>
supply untrusted content in base64, which nicely avoids any scary<br>
characters. We can avoid clashes with existing or future entities by<br>
using a new character after the & escape character. In particular, we<br>
could use the % character:<br>
<br>
&%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==;<br>
<br>
Authors could then supply untrusted content as follows:<br>
<br>
<img alt=<?php echo htmlescape($name) ?> src="..."><br>
<br>
where htmlescape is defined as follows:<br>
<br>
function htmlescape($text) {<br>
return "&%".base64_encode($text).";";<br>
}<br>
<font color="#888888"><br>
Adam<br>
</font></blockquote></div><br></div>