[whatwg] Entity parsing [trema/diæresis vs umlaut]

Øistein E. Andersen html5 at xn--istein-9xa.com
Mon Jun 25 06:45:39 PDT 2007


On 25 Jun 2007, at 11:44AM, Křištof Želechovski wrote:

> A stressed schwa is present in Polish maritime dialect as well (Kaszëbszczi)
> and Slovaks write "mäso" for "miaso" (meat), but that is not the point.  All
> such uses can be covered under the hood of the dieresis;

I really do not understand why these uses of the double-dot diacritic
should be considered as instances of the diæresis (see below).

> the dieresis is not a double accent

I never said "double accent", but you are right in pointing out that I should have
called it a double-dot diacritic rather than a double-dot accent, since
-- strictly speaking -- the only accents are acute, grave and circumflex.

> To make it explicit and plain: the dieresis is a diacritical mark that has
> no intrinsic phonetic connotation, although it is used mostly for separating
> vowels;

As you may know, diæresis derives from the Greek verb διαιρεῖν (diairein),
which means “to divide”, and it does indeed have an intrinsic meaning.

According to the OED, a diæresis is “[t]he sign (¨) marking [a phonological diæresis], or,
more usually, placed over the second of two vowels which otherwise make a
diphthong or single sound, to indicate that they are to be pronounced separately.”

Similarly, umlaut is defined as “[t]he diacritical sign (¨) placed over a vowel to
indicate that [umlaut] has taken place.”

Hence, the use of either term when the double-dot diacritic is performing
another linguistic function is equally abusive.

> the phonetic meaning of umlaut is generic and well-defined by its
> very name and it does not apply to the vowel I.

Indeed. German umlaut notation is further restricted, and I am not quite sure
if the phonetic phenomenon applies to y either, but this is rather far off topic.

> I did not intend to make HTML support all possible linguistic intricacies;
> I only wanted to eliminate the common nonsense of denoting ï with ï
> [...]
>  I only want the true umlaut to be distinct, not as a code point but as an entity name.
> [...]
> It would be up to the author to determine whether ü or &utrema;
> is appropriate; both entities should denote the same character.

Do you really think it is a good idea to introduce twelve new aliases
that do not work in current browsers, do not make the language more
expressive and require authors to make meaningless decisions?
(Is Slovak ä borrowed from German [it is pronounced æ or ɛ] and
therefore ä or does it have another origin? Should we use
&atrema; by default? How about Pinyin ü? Swedish words that contain
an ö as a result of umlaut vs those that contain it for a different reason?)

Trema or diæresis might have been a better choice than umlaut as a generic name,
since umlaut does not apply to all Latin vowels, but it is really too late to fix this now.


On 25 Jun 2007, at 11:51AM, Křištof Želechovski wrote:

> Could I have an example of &otrema; please? 

The canonical example in Dutch seems to be coördinatie, see
http://nl.wikipedia.org/wiki/Trema_in_de_Nederlandse_spelling .

> Something along the lines of zoölogy, but actually required?

Well, such spellings are "actually required" in some varieties of English.
“The New Yorker mandates that authors must coöperate to reëducate our
readership.” — allegedly from the magazine’s style manual.


On 25 Jun 2007, at 11:16AM, Křištof Želechovski wrote:

> there is no language that could make use of this distinction by having both
> ü and &utrema;.  There are languages that use ü and theoretically
> there could be ones that use &utrema;, although I do not know of any valid case
> (I consider the French case invalid).

I have no idea why you consider capharnaüm to be invalid (if this is what you imply),
but perhaps Spanish pingüino and Dutch reünie will be more convincing examples.

French dictionaries require loan-words like angström, führer and länder (plural
of land) to be spelt with an umlaut, but these are of course too rare for
a differentiation tréma/umlaut to have developed, and I would imagine
German imports with umlaut to be only slightly more common in Dutch.

It would be interesting to see whether 19th-c. German actually made a
distinction between umlaut on a, o, u and diæresis on e, i (e.g., Rhomboïd),
but I do not know how consistently the diæresis was used, and words
requiring it are typically foreign words that, unlike the rest, will not have
been printed in Fraktur...

-- 
Øistein E. Andersen



More information about the whatwg mailing list