[whatwg] Entity parsing [trema/diaresis vs umlaut]

Křištof Želechovski giecrilj at stegny.2a.pl
Mon Jun 25 23:46:44 PDT 2007

Of course you are right; I was thinking of the tréma when I wrote that and I
changed it to a dieresis afterwards to make it more English (to get rid of
the red underlines).  A general qui pro quo followed.
Slovak ä is an original invention; the tréma palatalizes the preceding
consonant.  I did not consider capharnaüm invalid but irrelevant: it is a
Hebrew (or Aramaic?) proper name and can be regarded as a transcription.

-----Original Message-----
From: whatwg-bounces at lists.whatwg.org
[mailto:whatwg-bounces at lists.whatwg.org] On Behalf Of Oistein E. Andersen
Sent: Monday, June 25, 2007 3:46 PM
To: whatwg at whatwg.org; giecrilj at stegny.2a.pl
Subject: Re: [whatwg] Entity parsing [trema/diaresis vs umlaut]

On 25 Jun 2007, at 11:44AM, Křištof Želechovski wrote:

> To make it explicit and plain: the dieresis is a diacritical mark that has
> no intrinsic phonetic connotation, although it is used mostly for
> vowels;

As you may know, diaresis derives from the Greek verb ???????? (diairein),
which means "to divide", and it does indeed have an intrinsic meaning.

According to the OED, a diaresis is "[t]he sign (¨) marking [a phonological
diaresis], or,
more usually, placed over the second of two vowels which otherwise make a
diphthong or single sound, to indicate that they are to be pronounced

Similarly, umlaut is defined as "[t]he diacritical sign (¨) placed over a
vowel to
indicate that [umlaut] has taken place."

Hence, the use of either term when the double-dot diacritic is performing
another linguistic function is equally abusive.

> the phonetic meaning of umlaut is generic and well-defined by its
> very name and it does not apply to the vowel I.

Indeed. German umlaut notation is further restricted, and I am not quite
if the phonetic phenomenon applies to y either, but this is rather far off

> I did not intend to make HTML support all possible linguistic intricacies;
> I only wanted to eliminate the common nonsense of denoting i with ï
> [...]
>  I only want the true umlaut to be distinct, not as a code point but as an
entity name.
> [...]
> It would be up to the author to determine whether ü or &utrema;
> is appropriate; both entities should denote the same character.

Do you really think it is a good idea to introduce twelve new aliases
that do not work in current browsers, do not make the language more
expressive and require authors to make meaningless decisions?
(Is Slovak ä borrowed from German [it is pronounced a or ?] and
therefore ä or does it have another origin? Should we use
&atrema; by default? How about Pinyin ü? Swedish words that contain
an ö as a result of umlaut vs those that contain it for a different reason?)

Trema or diaresis might have been a better choice than umlaut as a generic
since umlaut does not apply to all Latin vowels, but it is really too late
to fix this now.

On 25 Jun 2007, at 11:51AM, Křištof Želechovski wrote:

> Could I have an example of &otrema; please? 

The canonical example in Dutch seems to be coördinatie, see
http://nl.wikipedia.org/wiki/Trema_in_de_Nederlandse_spelling .

> Something along the lines of zoölogy, but actually required?

Well, such spellings are "actually required" in some varieties of English.
"The New Yorker mandates that authors must coöperate to reëducate our
readership." - allegedly from the magazine's style manual.

On 25 Jun 2007, at 11:16AM, Křištof Želechovski wrote:

> there is no language that could make use of this distinction by having
> ü and &utrema;.  There are languages that use ü and
> there could be ones that use &utrema;, although I do not know of any valid
> (I consider the French case invalid).

I have no idea why you consider capharnaüm to be invalid (if this is what
you imply),
but perhaps Spanish pingüino and Dutch reünie will be more convincing

Oistein E. Andersen

More information about the whatwg mailing list