[whatwg] 9.2.2: replacement characters. How many?
Ian Hickson
ian at hixie.ch
Thu Jun 14 17:25:05 PDT 2007
On Fri, 3 Nov 2006, Elliotte Harold wrote:
>
> Section 9.2.2 of the current Web Apps 1.0 draft states:
>
> Bytes or sequences of bytes in the original byte stream that could not
> be converted to Unicode characters must be converted to U+FFFD
> REPLACEMENT CHARACTER code points.
>
> I'm concerned about the "or". For example, suppose there are six upper
> halves of a Unicode surrogate pair in a row and no lower halves. Does
> that turn into six replacement characters or one? Both interpretations
> seem possible.
>
> I suppose I prefer six rather than one, but I don't care a great deal as
> long as this is locked down one way or the other.
I don't really know how to define this. I'd like to say that it's up to
the encoding specifications to define it. Any suggestions?
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list