[whatwg] Surrogate pairs and character references
ian at hixie.ch
Thu Sep 24 02:06:13 PDT 2009
On Thu, 17 Sep 2009, Øistein E. Andersen wrote:
> It is much clearer now. Thanks. Just a few minor issues:
> > "Bytes or sequences of bytes in the original byte stream that could not be
> > converted to Unicode characters must be converted to U+FFFD REPLACEMENT
> > CHARACTER code points."
> With the new definition of Unicode characters as Unicode scalar values, this
> excludes surrogate code points, which are also handled separately (and cause a
> parse error) in the step quoted below. You may want to say "Unicode code
> points" rather than "Unicode characters".
> "U+FFFD REPLACEMENT CHARACTERs" is sufficient, used elsewhere and probably
> reads better than "U+FFFD REPLACEMENT CHARACTER code points".
> > All U+0000 NULL characters and code points in the range U+D800 to U+DFFF in
> > the input must be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences
> > of such characters and code points are parse errors.
> The phrase "characters and code points" (in the second sentence) is awkward
> given that all characters are in fact code points.
Yeah, but if I change it it sounds even more awkward because then it
doesn't match the previous sentence. I'd rather have it be technically
redundant than confuse people into thinking that I meant something more
subtle than the spec actually says.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg