[whatwg] Entity parsing
ian at hixie.ch
Mon Jun 25 00:28:42 PDT 2007
On Sun, 24 Jun 2007, Øistein E. Andersen wrote:
> Personally, I would prefer something along these lines:
> I. All entities are created equal (the burden of carrying a semicolon
> shall be equally distributed amongst all).
For authors, this is now the case.
For implementations, we are pretty much constrained by what IE does.
> II. Abuse of the semicolon shall not be legally enforced (its omission
> shall be conforming unless it separates the entity from a following
> [ASCII] letter or digit).
Well, I had that allowed before, but people complained. :-) For some of
the entities, though, we have to have a semicolon, for compatibility. So
if you want consistency, it has to be required everywhere.
> III. Entities living in attribute values are to be treated as
> first-class citizens (the same rules shall apply to them).
Again, for authors this is done, but for compatibility reasons we're
constrained on what we can say for implementations.
> We clearly should, to the extent possible, try to avoid bizarre quirks,
> and the current rules for entity parsing are not exactly straightforward
> or intuitive. HTML5 currently follows IE7 much more closely than Safari,
> Firefox and Opera do, which seems to suggest that some of the quirks
> could be dispensed with.
It's possible, though people kept pointing out problems, which is how we
ended up where we are now.
> At any rate, web pages containing "&" + entity name followed by
> [^A-Za-z0-9] are probably more likely not to have been authored for IE
> and therefore relying on standard SGML behaviour, so it would probably
> be more backwards- compatible to treat such occurrences as "&" + entity
> name + ";" (i.e., expand the entity).
Well, we'd have to prove this somehow with real research.
> Of course, conformance checkers would be more than welcome to signal
> that a certain current browser is unable to handle "A &mdash B" as
> expected, but this need not mean that all future browsers should be
> required not to handle it "properly" (as per arguably [in the original
> sense] more sensible SGML rules).
Calling SGML "sensible" is a slippery slope! :-)
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg