[whatwg] Entity parsing

Kristof Zelechovski giecrilj at stegny.2a.pl
Mon Jun 25 03:16:08 PDT 2007


If there is a character set that sports both, it must be used to put down
some human language.  My point there is no language that could make use of
this distinction by having both ü and &utrema;.  There are languages
that use ü and theoretically there could be ones that use &utrema;,
although I do not know of any valid case (I consider the French case
invalid).

Chëërs

Chrïs

 

  _____  

From: whatwg-bounces at lists.whatwg.org
[mailto:whatwg-bounces at lists.whatwg.org] On Behalf Of Sander
Sent: Saturday, June 23, 2007 2:59 PM
To: Kristof Zelechovski; whatwg at whatwg.org
Subject: Re: [whatwg] Entity parsing

 

I hadn't thought of that one ;-)  (in Dutch there are no native words with
umlauts, only some of German or Scandinavian descent).
My question was about char-sets that contain both a trema version and a
(seperate) umlaut version of the same character. Are there any?

cheers,
Sander


Kristof Zelechovski schreef: 

Only the vowel U can have either but I have not seen a valid example of
&utrema;.  The orthography "ambigüe" has recently been changed to "ambiguë"
for consistency.  Polish "nauka" (science) and German "beurteilen" would
make good candidates but the national rules of orthography do not allow this
distinction because Slavic languages do not have diphthongs except in
borrowed words and it would cause ambiguity in German (cf. "geübt").
(Incidentally, this leads to bad pronunciation often encountered even in
Polish media.)
Cheers
Chris
 
-----Original Message-----
From: Sander [mailto:html5 at zoid.nl] 
Sent: Friday, June 22, 2007 9:26 PM
To: Kristof Zelechovski
Subject: Re: [whatwg] Entity parsing
 
 
Kristof Zelechovski schreef:
  

A dieresis is not an umlaut so I have to bite my tongue each time I write
    

or
  

read nonsense like ï.  It feels like lying.  Umlaut means "mixed", a
dieresis means "standalone".  Those are very different things, and "I" can
never gets mixed so there is no ambiguïty.  Since "umlaut" is borrowed
    

from
  

German, I can see no problem in borrowing "tréma" from French.  I
    

personally
  

prefer "&itrema;" to "&idier;" because of readability, but I would not
insist on that.
  
    

 
"In professional typography, umlaut dots are usually a bit closer to the 
letter's body than the dots of the trema. In handwriting, however, no 
distinction is visible between the two. This is also true for most 
computer fonts and encodings."
[http://en.wikipedia.org/wiki/Umlaut_(diacritic)]
 
Are there any char-sets that have both umlaut and trema variations of 
characters? If so, both entities could exist.
 
cheers,
Sander
 
 
PS: I'd go for "&itrema;" instead of "&idier;" as well as the term 
"trema" is also the one that's used in Dutch.
 
 
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20070625/660e0287/attachment.htm>


More information about the whatwg mailing list