[whatwg] base64 entities
Adam Barth
w3c at adambarth.com
Wed Aug 25 13:50:14 PDT 2010
== Summary ==
HTML should support Base64-encoded entities to make it easier for
authors to include untrusted content in their documents without
risking XSS. For example,
&%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==;
would decode to "HTML5's <canvas> element is awesome." Notice that
the < and > characters get emitted by the parser as character tokens.
That means they can't be used by an attacker for XSS. These entities
can be used safely both in intertag content as well as in attribute
values.
== Use Case ==
Authors often combine trusted and untrusted text into HTML documents.
If done naively, an attacker can supply HTML markup, including script,
in the untrusted script, resulting in a cross-site script attack.
Authors want a way to include untrusted content safely in HTML
documents without risking XSS.
== Workarounds ==
Currently, authors must carefully escape all untrusted content to
prevent an attacker from injecting HTML. Unfortunately, authors often
apply the incorrect escaping or forget to escape entirely, resulting
in security vulnerabilities. Escaping content in HTML is tricky
because authors need to use different escaping rules for different
contexts. For example, PHP's htmlspecialchars isn't sufficient in the
following contexts:
<img alt=<?php echo htmlspecialchars($name) ?> src="...">
<script>
elmt.innerHTML = 'Hi there <?php echo htmlspecialchars($name) ?>.';
</script>
Some framework convert untrusted content to a series of hex entities,
but that greatly increases the length of the content.
== Proposal ==
We should add a new kind of HTML entity that authors can use to
include untrusted content. In particular, authors should be able to
supply untrusted content in base64, which nicely avoids any scary
characters. We can avoid clashes with existing or future entities by
using a new character after the & escape character. In particular, we
could use the % character:
&%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==;
Authors could then supply untrusted content as follows:
<img alt=<?php echo htmlescape($name) ?> src="...">
where htmlescape is defined as follows:
function htmlescape($text) {
return "&%".base64_encode($text).";";
}
Adam
More information about the whatwg
mailing list