[whatwg] Fwd: Entity parsing
ian at hixie.ch
Thu Jun 4 16:49:04 PDT 2009
On Fri, 24 Apr 2009, Øistein E. Andersen wrote:
> When a named character reference is followed by a semicolon, it clearly
> has to be expanded, but how to handle non-semicolon-terminated character
> references is less obvious.
> Let &IE4 (resp. &HTML4, &HTML5) be a non-semicolon-terminated named
> character reference from the IE4 (resp. HTML4, HTML5) set, and let .
> (full stop) represent any character other than semicolon, and ^
> (circumflex) any character which is (roughly) not an ASCII letter or
> digit (i.e., [^a-zA-Z0-9]). Not completely unreasonable sets of
> character references to expand (outside of attribute values) include:
> 1) &IE4^
> 2) &IE4.
> 3) &HTML4^
> 4) &IE4. &HTML4^
> 5) &HTML4.
> 6) &IE4. &HTML5^
> 7) &HTML4. &HTML5^
> 8) &HTML5.
> (The set of character references to be expanded in attribute values
> could be obtained by replacing . by ^ above.)
> Currently, Opera follows 1), IE 2), and Safari and Firefox 3).
> My main concern is that &HTML4^ is actually legitimate in HTML4 and
> works in both Safari and Firefox today, and that HTML5 should not change
> the rendering of valid HTML4 pages unless there is a good reason to do
Could you give an example of what you mean? I'm having trouble following
your description above.
As far as I can tell HTML5 more or less matches what legacy pages need,
but if there are specific entities that should be parsed in a different
way than HTML5 says they should, I'm happy to fix this.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg