[whatwg] Surrogate pairs and character references

Øistein E. Andersen liszt at coq.no
Wed Sep 16 17:38:05 PDT 2009

It is much clearer now.  Thanks.  Just a few minor issues:

> "Bytes or sequences of bytes in the original byte stream that could  
> not be converted to Unicode characters must be converted to U+FFFD  

With the new definition of Unicode characters as Unicode scalar  
values, this excludes surrogate code points, which are also handled  
separately (and cause a parse error) in the step quoted below.  You  
may want to say "Unicode code points" rather than "Unicode characters".

"U+FFFD REPLACEMENT CHARACTERs" is sufficient, used elsewhere and  
probably reads better than "U+FFFD REPLACEMENT CHARACTER code points".
> All U+0000 NULL characters and code points in the range U+D800 to U 
> +DFFF in the input must be replaced by U+FFFD REPLACEMENT  
> CHARACTERs. Any occurrences of such characters and code points are  
> parse errors.
The phrase "characters and code points" (in the second sentence) is  
awkward given that all characters are in fact code points.

Øistein E. Andersen

More information about the whatwg mailing list