[whatwg] CR "entities" and LFCR

Henri Sivonen hsivonen at iki.fi
Fri Jun 8 05:17:31 PDT 2007

On Jun 7, 2007, at 15:00, Anne van Kesteren wrote:

> These should be converted to LF too. One thing that might be  
> interesting to look into is the handling of LFCR in browsers (as  
> opposed to CRLF). I haven't done that yet... Some browsers (just  
> tested Opera) also normalize two newline entities following each  
> other (CRLF pair).

This requires more code. I haven't analyzed the perf impact, but  
intuitively this requires either naïve and inefficient buffer  
retraversal in the tree builder or additional complexity to the  
tokenizer's buffer management (assuming the tokenizer is doing  
efficient buffering to begin with).

You can't protect the DOM from getting CRs if someone insists on  
putting them there using JS or XML. Is it worthwhile to prevent  
escaped CRs from ending up in the DOM as CRs in HTML? Is special  
handling required for compat.

I'd try doing exactly what XML does here unless compat requires  

Henri Sivonen
hsivonen at iki.fi

