[whatwg] 9.2.2: replacement characters. How many?

Ian Hickson ian at hixie.ch
Thu Jun 14 17:25:05 PDT 2007


On Fri, 3 Nov 2006, Elliotte Harold wrote:
>
> Section 9.2.2 of the current Web Apps 1.0 draft states:
> 
> Bytes or sequences of bytes in the original byte stream that could not 
> be converted to Unicode characters must be converted to U+FFFD 
> REPLACEMENT CHARACTER code points.
> 
> I'm concerned about the "or". For example, suppose there are six upper 
> halves of a Unicode surrogate pair in a row and no lower halves. Does 
> that turn into six replacement characters or one? Both interpretations 
> seem possible.
> 
> I suppose I prefer six rather than one, but I don't care a great deal as 
> long as this is locked down one way or the other.

I don't really know how to define this. I'd like to say that it's up to 
the encoding specifications to define it. Any suggestions?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



More information about the whatwg mailing list