[whatwg] base64 entities

Aryeh Gregor Simetrical+w3c at gmail.com
Fri Aug 27 11:44:50 PDT 2010


2010/8/26 Kornel Lesiński <kornel at geekhood.net>:
> Inside strings you replace "</" with "<\/" ("\/" is valid escape sequence
> for "/"), outside strings you'd need to add space between "</" (a corner
> case x </regexliteral/).

In other words, there's no general way to do it without actually
parsing the JavaScript.

> You might also use <script src="data:">.

Hmm, that's an idea.  It will only work if you want to do it to the
whole script blob, though (like if it's trusted but might contain
"</script>" by mistake).  If you want to encode just particular
untrusted string values, you'd have to use a specialized function of
*some* sort.  I don't see how base-64 encoding is easier than
json_encode(), though.  It's much uglier.

On Thu, Aug 26, 2010 at 6:28 PM, And Clover <and-py at doxdesk.com> wrote:
> The simple approach is to use JavaScript string literal escapes:
> `"\x3C/script>"`.
>
> A JSON encoder may offer the option to avoid HTML-special characters in
> string literals, encoded as escapes like `\u003C`. This allows literals to
> be included in a JavaScript block that may or may not be in a CDATA element,
> so may or may not need HTML-encoding.

Makes sense, but it only works for string literals, not blobs of
JavaScript.  The latter would be useful to have too.  (But data: seems
to be a good enough solution.)

> This is a common but wrong idiom that should be avoided; it won't validate
> because in HTML4 the `</` sequence itself (ETAGO) ends a script block.

Conveniently, it does validate in HTML5:

http://html5.validator.nu/?doc=data:text/html,%3C!DOCTYPE+html%3E%3Ctitle%3E%3C/title%3E%3Cscript%3Ealert(%22%3C/scr%22+%2B+%22ipt%3E%22);%3C/script%3E

There's no reason for us to worry about HTML 4.

> PHP offers no JS-string-literal-escape function. `addslashes` is very close,
> but won't handle some cases with non-ASCII characters correctly. Better to
> use `json_encode` to transfer the string, then write as text:
>
>    elmt.textContent = <?php echo json_encode('Hi there, '+$name,
> JSON_HEX_TAG); ?>
>
> (assuming innerText or Text Node backup for IE/older browsers.)

Interesting, that's useful.  Too bad it only works in PHP 5.2 or higher.

On Thu, Aug 26, 2010 at 6:45 PM, Adam Barth <w3c at adambarth.com> wrote:
> Escaping just those character is insufficient.  The appeal of this
> approach is that authors don't need the right blacklist of dangerous
> characters.  By the way, there are already folks doing something
> similar manually now.  They send the untrusted bytes as base64 and
> decode them using JavaScript.

Is this really harder to do correctly than a function like
json_encode()?  I'm having trouble seeing this as really worth it --
it looks like there are already solutions for all use-cases that are
equally easy and considerably less ugly.  What's a use-case that this
makes a lot easier?  Granted that it allows you to use one type of
encoding for everything, but do you expect authors will actually do
that, given that it makes the output so much more indecipherable?



More information about the whatwg mailing list