[whatwg] Entity parsing [trema/diaeresis vs umlaut]

Øistein E. Andersen html5 at xn--istein-9xa.com
Tue Jun 26 13:55:20 PDT 2007


On 26 Jun 2007, at 7:49AM, Křištof Želechovski wrote:

> Internet Explorer apparently chose to support English natively
> while SGML preferred remaining language-agnostic.

To be fair, this is not how things developed.

Microsoft first chose to make the semicolon optional not only
when allowed by SGML rules (notably before whitespace and tags),
but in any position, for all named entities /that existed at the time/,
i.e., latin-1.

Unfortunately, this meant that new entities could not be added without
changing the interpretation of already existing pages (e.g., if a page contained “less&less”, adding the entity &le to the list would result in its being interpreted
as “less≤ss”), although most of the entities have names that are rather
unlikely to appear by chance, and the ampersand “should” be spelt &.

Microsoft did not dare to risk this, so entities beyond latin-1 require
a semicolon in IE, even in cases where it is optional according
to SGML (and therefore will pass HTML 4.01 validation, I might add).

-- 
Øistein E. Andersen



More information about the whatwg mailing list