[whatwg] Character References in HTML
ian at hixie.ch
Tue Jun 5 17:20:09 PDT 2007
On Thu, 13 Oct 2005, Lachlan Hunt wrote:
> In HTML4, according to SGML rules, numeric character references in the
> range from to are defined as UNUSED, which makes them
> "non-SGML characters". Strictly speaking, it's not an error to refer to
> these characters with character references (even the validator only
> issues a warning: reference to a non-SGML character); but, AIUI, neither
> SGML nor HTML4 assigns any meaning to them.
> Technically, these character references should really refer to the
> Unicode control characters, but reality dictates otherwise for
> text/html, thanks to IE and countless (poorly written) books and
> tutorials. I, therefore, think the spec should say something along
> these lines:
> In HTML, numeric and hexadecimal character references referring to
> code positions in the range from 128 to 159 (0x80 to 0x9F) should be
> re-mapped to code positions in the Unicode character repertoire
> according to the CP1252 to Unicode table [CP1252]. This does not
> apply to XHTML.
Done. (With a must, and with an explicit table, since CP1252 doesn't
define all those characters.)
> HTML documents must not use numeric or hexadecimal character
> references in this range, although browsers should support them for
> backwards compatibility. Authors should instead refer to the correct
> Unicode code position for these characters.
> Also, I think this would also be a nice conformance requirement to see for
> authoring tools:
> HTML Authoring tools should automatically convert these character
> references to either the equivalent Unicode code position or, if the
> file's encoding supports it, the character itself, according to the
> CP1252 to Unicode table [CP1252].
Not done, but it's redundant anyway since simply implementing the spec
will do this automatically (the spec doesn't round-trip the out-of-range
entities through the DOM).
> None of that should apply to XHTML, since XML explicitly allows this
> range in the production for Char and, as far as I'm aware, no XHTML UA
> implements this buggy behaviour.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg