[whatwg] document.write("\r"): the spec doesn't say how to handle it.

Mon Dec 19 03:28:13 PST 2011

On Wed, Dec 14, 2011 at 2:00 AM, Ian Hickson <ian at hixie.ch> wrote:
> I can remove the text "one at a time", if you like. Would that be
> satisfactory? Or I guess I could change the spec to say that the parser
> should process the characters, rather than the tokenizer, since really
> it's the whole shebang that needs to be involved (stream preprocessor and
> everything). Any opinions on what the right text is here?

I'd like the CRLF preprocessing to be defined as an eager stateful
operation so that there's one bit of state: "last was CR". Then, input
is handled as follows:
If the input character is CR, set "last was CR" to true and emit LF.
If the input character is LF and "last was CR" is true, don't emit
anything and set "last was CR" to false.
If the input character is LF and "last was CR" is is false, emit LF.
Else set "last was CR" to false and emit the input character.

Where "emit" feeds into the tokenizer. By "eager", I mean that the
operation described above doesn't buffer. I.e. the first case emits an
LF upon seeing a CR without waiting for an LF also to appear in the
input.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/