[whatwg] Parsing Numeric Character References
Ian Hickson
ian at hixie.ch
Wed Jun 6 15:38:45 PDT 2007
On Sun, 12 Mar 2006, Lachlan Hunt wrote:
>
> [The spec] does not cover [entities for] the characters in the range
> from #x80 to #x9F, which have historically been treated as code points
> from the Windows-1252 repertoire, rather than the control characters
> from Unicode. AFAIK, this is already interoperably implemented in all
> browsers.
Fixed.
> Characters in the range from #x01 to #x19 (except for whitespace
> characters) are not treated interoperably across platforms. On Windows,
> Firefox, IE and Opera all displayed characters from some repertoire I
> couldn't identify. But on Mac: all the browsers displayed either
> nothing or a box (a place holder character). I think these should all
> return U+FFFD.
They return the appropriate <control> characters from Unicode. The reason
they render on some platforms is that the fonts on some platforms (Windows
in particular) have glyphs in those positions.
> The use of characters in either of these ranges should be an easy parse
> error.
I've made the first set a parse error, since those actually don't
roundtrip as one mights expect. But the x01-x19 entities roundtrip fine,
they just render funkily. We could define something special about these
characters in the rendering section, but I don't think they should be
parse errors. Do you agree?
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list