[whatwg] Entity parsing [trema/diaeresis vs umlaut]

Křištof Želechovski giecrilj at stegny.2a.pl
Wed Jun 27 12:45:20 PDT 2007

How does it influence the case flanc&eacutee vs &oeliguvre?  The only
difference is that the first one is used in English.

-----Original Message-----
From: Oistein E. Andersen [mailto:html5 at xn--istein-9xa.com] 
Sent: Tuesday, June 26, 2007 10:55 PM
To: giecrilj at stegny.2a.pl; whatwg at whatwg.org
Subject: Re: [whatwg] Entity parsing [trema/diaeresis vs umlaut]

On 26 Jun 2007, at 7:49AM, Křištof Želechovski wrote:

> Internet Explorer apparently chose to support English natively
> while SGML preferred remaining language-agnostic.

To be fair, this is not how things developed.

Microsoft first chose to make the semicolon optional not only
when allowed by SGML rules (notably before whitespace and tags),
but in any position, for all named entities /that existed at the time/,
i.e., latin-1.

Unfortunately, this meant that new entities could not be added without
changing the interpretation of already existing pages (e.g., if a page
contained "less&less", adding the entity &le to the list would result in its
being interpreted
as "less?ss"), although most of the entities have names that are rather
unlikely to appear by chance, and the ampersand "should" be spelt &.

Microsoft did not dare to risk this, so entities beyond latin-1 require
a semicolon in IE, even in cases where it is optional according
to SGML (and therefore will pass HTML 4.01 validation, I might add).

Oistein E. Andersen

More information about the whatwg mailing list