[whatwg] Entity parsing
Øistein E. Andersen
html5 at xn--istein-9xa.com
Sun Nov 5 06:52:18 PST 2006
>From section 22.214.171.124. Tokenising entities:
> For some entities, UAs require a semicolon, for others they don't.
This applies to IE.
FWIW, the entities not requiring a semicolon are the ones encoding Latin-1 characters,
the other HTML 3.2 entities (&, > and <), as well as " and the uppercase variants (&, ©, >, <, " and ®).
IE/mac has its very own interpretation of what `requiring a semicolon' means; it treats
&Deltax as &Deltax, but &Deltax; as Δx; (with the final semicolon rendered).
Firefox and Safari, on the other hand, seem to have implemented the SGML notion
of entities (mostly) correctly, not requiring a semicolon before whitespace, tags, etc.
(the definition of `etc.' varies slightly) and not giving preferential treatment to
Opera apparently allows omission of a semicolon only when both IE and Firefox/Safari do.
This means that `à la' is rendered as intended in all (these) browsers, whereas
`naïve' is not (IE only); `Haÿ les Roses' works fine, but not the hyphenated
`Haÿ-les-Roses' (not Firefox) or the capitalised `HA&Yuml LES ROSES'
(Firefox/Safari only). Making omission of semicolons conforming (in specific
cases) does therefore not seem very compelling, as it would either be confusing
and apparently arbitrary or make conforming documents render inconsistently.
(Parsing still has to be defined, of course, but bear in mind that constructions
like `naïve' are IE-only.)
Ãistein E. Andersen
More information about the whatwg