[whatwg] Entity parsing [trema/diaresis vs umlaut]

Kristof Zelechovski giecrilj at stegny.2a.pl
Mon Jun 25 03:44:04 PDT 2007



A stressed schwa is present in Polish maritime dialect as well (Kaszëbszczi)
and Slovaks write "mäso" for "miaso" (meat), but that is not the point.  All
such uses can be covered under the hood of the dieresis; I only want the
true umlaut to be distinct, not as a code point but as an entity name.  BTW,
to clear another misconception: the dieresis is not a double accent—it may
be more verbosely described as double dot above—because unqualified "accent"
means acute accent by default; the Adobe registry name for the double accent
is Hungarian umlaut because it is used in Hungarian orthography only.
To make it explicit and plain: the dieresis is a diacritical mark that has
no intrinsic phonetic connotation, although it is used mostly for separating
vowels; the phonetic meaning of umlaut is generic and well-defined by its
very name and it does not apply to the vowel I.  I did not intend to make
HTML support all possible linguistic intricacies; I only wanted to eliminate
the common nonsense of denoting ï with ï, or at least allow the authors
not to use this absurd denotation while still having an entity for that
letter.  ï should be an alias for &itrema; for backward compatibility,
that is the whole story.  It would be up to the author to determine whether
ü or &utrema; is appropriate; both entities should denote the same
character.
Cheers
Chris

-----Original Message-----
From: whatwg-bounces at lists.whatwg.org
[mailto:whatwg-bounces at lists.whatwg.org] On Behalf Of Oistein E. Andersen
Sent: Saturday, June 23, 2007 11:28 PM
To: whatwg at whatwg.org
Subject: Re: [whatwg] Entity parsing [trema/diaresis vs umlaut]

Sander wrote:

> Are there any char-sets that have both umlaut and trema variations of
characters?

Unicode does not make the distinction, so this is somewhat unlikely.

(Personally, I tend to think that the apparent preference for umlaut dots
closer
to the letter than trema dots can be linked to extrinsic phenomena like the
preference for steep accents in French typography.)

Kristof Zelechovski wrote:

> Only the vowel U can have either

This is not quite right. All Latin vowels (a, e, i, o, u, y) can take the
trema/diaresis
(ä, ë, i, ö, ü in Dutch; ë, i, ü*, y** in French), and a, o, u can all be
umlauted (ä, ö, ü
in German).

Moreover, the double-dot accent also has other uses (e.g., ä and ë both
designate
a stressed schwa in Luxembourgeois), so it is probably not advisable
to attempt a complete classification in HTML.

-- 
Oistein E. Andersen

*) possibly only in the word capharnaüm (disregarding the highly unpopular
rectifications orthographiques of 1990) and in proper names
**) only in proper names




More information about the whatwg mailing list