[whatwg] Parsing Numeric Character References
Lachlan Hunt
lachlan.hunt at lachy.id.au
Sat Mar 11 20:22:35 PST 2006
Hi,
In section 8.2.1 Tokenising Entities, for a numeric character
reference, it states:
| If one or more characters match the range, then take them all and
| interpret the string of characters as a number (either hexadecimal
| or decimal as appropriate), and return a character token for the
| Unicode character whose codepoint is that number. If the number is
| not a valid Unicode character (e.g. if the number is higher than
| 1114111), or if the number is zero, then return a character token for
| the U+FFFD REPLACEMENT CHARACTER character instead.
This does not cover the characters in the range from #x80 to #x9F, which
have historically been treated as code points from the Windows-1252
repertoire, rather than the control characters from Unicode. AFAIK,
this is already interoperably implemented in all browsers.
Characters in the range from #x01 to #x19 (except for whitespace
characters) are not treated interoperably across platforms. On Windows,
Firefox, IE and Opera all displayed characters from some repertoire I
couldn't identify. But on Mac: all the browsers displayed either
nothing or a box (a place holder character). I think these should all
return U+FFFD.
The use of characters in either of these ranges should be an easy parse
error.
--
Lachlan Hunt
http://lachy.id.au/
More information about the whatwg
mailing list