[whatwg] Entity parsing [trema/diaeresis vs umlaut]

K?i?tof ?elechovski giecrilj at stegny.2a.pl
Thu Jun 28 04:51:13 PDT 2007

I had a look at the reference page you have directed me to: it actually
states that the ISO-8859-1 character set can be used for English.  Although
my hypothesis that the word œovre is not English remains valid (see also the
citations in the appendix), I admit that the fact that the ligature œ is not
included in the character set (and, consequently, that the character set
ISO-8859-1 cannot be used for encoding French text, which I find kind of
stunning because of the popularity of the French language) provides a much
simpler explanation to the observable phenomenon.  My fault, I should have
checked that up first.
Best regards


Other Wikipedia entries also disagree, e.g.
Borrowings into English from Latin words featuring œ are often spelled with
the letter e, especially in American English. For example, fœderal became
federal in English, while fœtus became fetus only in American English. Other
œs in English spell out as 2 separate letters oe.
The use of the œ and æ is obsolescent in modern English, and has been used
predominantly in British English. It is usually used to evoke archaism, or
in literal quotations of historic sources.
In English, which has imported words from all three languages, it is now
usual to replace Æ/æ with Ae/ae and Œ/œ with Oe/oe.

Microsoft Word does not accept hors d'œuvre but it has no problem with hors
d'oeuvre.  The American English International keyboard does not provide a
way to type the ligature œ.  The Microsoft Encarta dictionary does not
recognize such a spelling, nor does Reference.com.
The word coeur is not mentioned in any English dictionary I know.

-----Original Message-----
From: Oistein E. Andersen [mailto:html5 at xn--istein-9xa.com] 
Sent: Wednesday, June 27, 2007 11:44 PM
To: giecrilj at stegny.2a.pl; whatwg at whatwg.org
Subject: Re: [whatwg] Entity parsing [trema/diaeresis vs umlaut]

You might want to have a look at
http://pl.wikipedia.org/wiki/ISO_8859-1 .

Afterwards, consider the following:
1) Latin-1 does not contain all the characters that are required
for typesetting of English.

More information about the whatwg mailing list