[whatwg] Parsing Numeric Character References

Ian Hickson ian at hixie.ch
Wed Jun 6 15:38:45 PDT 2007


On Sun, 12 Mar 2006, Lachlan Hunt wrote:
> 
> [The spec] does not cover [entities for] the characters in the range 
> from #x80 to #x9F, which have historically been treated as code points 
> from the Windows-1252 repertoire, rather than the control characters 
> from Unicode.  AFAIK, this is already interoperably implemented in all 
> browsers.

Fixed.

> Characters in the range from #x01 to #x19 (except for whitespace 
> characters) are not treated interoperably across platforms.  On Windows, 
> Firefox, IE and Opera all displayed characters from some repertoire I 
> couldn't identify.  But on Mac: all the browsers displayed either 
> nothing or a box (a place holder character).  I think these should all 
> return U+FFFD.

They return the appropriate <control> characters from Unicode. The reason 
they render on some platforms is that the fonts on some platforms (Windows 
in particular) have glyphs in those positions.


> The use of characters in either of these ranges should be an easy parse 
> error.

I've made the first set a parse error, since those actually don't 
roundtrip as one mights expect. But the x01-x19 entities roundtrip fine, 
they just render funkily. We could define something special about these 
characters in the rendering section, but I don't think they should be 
parse errors. Do you agree?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



More information about the whatwg mailing list