[whatwg] Entity parsing [trema/diæresis vs umlaut]

Øistein E. Andersen html5 at xn--istein-9xa.com
Sat Jun 23 14:27:44 PDT 2007

Sander wrote:

> Are there any char-sets that have both umlaut and trema variations of characters?

Unicode does not make the distinction, so this is somewhat unlikely.

(Personally, I tend to think that the apparent preference for umlaut dots closer
to the letter than trema dots can be linked to extrinsic phenomena like the
preference for steep accents in French typography.)

Kristof Zelechovski wrote:

> Only the vowel U can have either

This is not quite right. All Latin vowels (a, e, i, o, u, y) can take the trema/diæresis
(ä, ë, ï, ö, ü in Dutch; ë, ï, ü*, ÿ** in French), and a, o, u can all be umlauted (ä, ö, ü
in German).

Moreover, the double-dot accent also has other uses (e.g., ä and ë both designate
a stressed schwa in Luxembourgeois), so it is probably not advisable
to attempt a complete classification in HTML.

Øistein E. Andersen

*) possibly only in the word capharnaüm (disregarding the highly unpopular
rectifications orthographiques of 1990) and in proper names
**) only in proper names

More information about the whatwg mailing list