[whatwg] Specs for window.atob() and window.btoa()

Fri Feb 4 10:21:14 PST 2011

On 04/02/2011, at 18:58, Jonas Sicking wrote:
> On Fri, Feb 4, 2011 at 8:37 AM, Jorge <jorge at jorgechamorro.com> wrote:
>> Hi,
>> 
>> Wrt to the note "some base64 encoders add newlines or other whitespace to their output. atob() throws an exception if its input contains characters other than +/=0-9A-Za-z, so other characters need to be removed before atob() is used for decoding" in http://aryeh.name/spec/base64.html , I think that in the end it's better to ignore any other chars instead of throwing, because skipping over any such chars while decoding is cheaper and requires less memory than scanning the input twice, first to clean it and second to decode it, something you'd not want to end up doing -just in case- everytime.
>> 
>> Say, for example, that you've got a 4MB base64 with (perhaps?) some whitespace, in order to clean it up you're going to have to have it in memory along the cleaned up version at least while constructing the clean version, but if atob() skipped over anything other than +/=0-9A-Za-z you could just pass it directly, and the whole process would be even faster too, given there was no need to clean it up first. FWIW, that's how nodejs is doing it right now.
> 
> Not sure I follow you. Why not simply measure the length of the string
> (most implementations keep that around for fast access), and
> optimistically allocate enough memory to hold the expected result.
> Then start converting. As you're converting, if you find an
> unrecognized character, just free the allocated memory and throw an
> exception.
> 
> No need to scan twice.

I was thinking about this:

var result= atob( base64_inputStr.replace(/\s/g, '') );

The first scan happening in .replace(), the second in atob(). The intermediate value stays in memory (at least for a little while) along with base64_inputStr.
-- 
Jorge.