[whatwg] Internal character encoding declaration, Drop UTF-32, and UTF and BOM terminology

Michael A. Puls II shadow2531 at gmail.com
Sat Jun 23 17:41:05 PDT 2007


> On Sat, 11 Mar 2006, Henri Sivonen wrote:
> > The encoding labels with LE or BE in them mean BOMless variants where
> > the encoding label on the transfer protocol level gives the endianness.
> > See http://www.ietf.org/rfc/rfc2781.txt When the spec refers to UTF-16
> > with BOM in a particular endianness, I think the spec should use
> > "big-endian UTF-16" and "little-endian UTF-16".
> >
> > Since declaring endianness on the transfer protocol level has no benefit
> > over using the BOM when the label is right and there's a chance to get
> > the label wrong, the encoding labels with explicit endianness are
> > harmful for interchange. In my opinion, the spec should avoid giving
> > authors any bad ideas by reinforcing these labels by repetition.

FWIW, after reading the labeling part of the RFC again and adding your
suggestion, I came up with this:

big-endian UTF-16 = The big-endian encoding of UTF-16 with the BOM FEFF
little-endian UTF-16 = The little-endian encoding of UTF-16 with the BOM FFFE
UTF-16BE = The big-endian encoding of UTF-16 without the BOM
UTF-16LE = The little-endian encoding of UTF-16 without the BOM
UTF-16 = big-endian UTF-16 or little-endian UTF-16 or fallback to UTF-16BE

-- 
Michael



More information about the whatwg mailing list