[whatwg] Specs for window.atob() and window.btoa()
Simetrical+w3c at gmail.com
Fri Jan 7 09:27:52 PST 2011
On Fri, Jan 7, 2011 at 12:01 AM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
> For what it's worth, Firefox's behavior for atob (based on reading the
> source code, sorta) is the following (ignoring various exceptions on
> allocation failures and the like):
> 1) If the input string contains any 16-bit units whose value is greater
> than 0xff, throw INVALID_CHARACTER_ERR.
This seems redundant with step 4 below.
> 2) If the input string's length is greater than 0xFFFFFFFF / 3, throw a
> generic failure code (because otherwise a 32-bit computation of the output
> string length will overflow; this could probably be changed to use 64-bit
This doesn't sound like it should be in the spec. It can fall under
the hardware limitations clause if it actually comes up. I don't like
the hardware limitations clause, but this case seems so unlikely to
come up on the web that it's not caring. Passing around >1 GB strings
I ran into this case somehow as a web developer, I'd definitely feel
justified in considering it a bug in Firefox.)
> 3) If the length of the source string is 0 mod 4 and the string ends in
> either "=" or "==" then chop off the trailing equals signs from the string.
> If after this step the length is 1 mod 4, throw INVALID_CHARACTER_ERR.
> 4) If the string contains any characters other than those in [A-Za-z0-9+/]
> then throw INVALID_CHARACTER_ERR.
> Step 2 is certainly missing from your spec (and as I said, may not be
> desirable); I haven't verified whether your regexp ends up enforcing exactly
> 3+4 above.
It looks the same to me, although I haven't looked *that* carefully.
Behavior matches in all the tests I could think up.
> Based on code inspection, that sounds right in terms of what the Firefox
> behavior is.
> Note that it's not that uncommon to use atob on things that came from other
> base64-producing tools, not just from btoa. Not sure whether that matters
I don't think it does. I don't think any base64 encoding
implementation is likely to pad input strings' lengths to a multiple
of six bits using anything other than zero bits. So it's mostly just
a matter of specification and testing simplicity.
More information about the whatwg