[whatwg] UTF and BOM terminology
Henri Sivonen
hsivonen at iki.fi
Sun May 27 01:56:29 PDT 2007
"If the encoding is one of UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, or
UTF-32LE, then authors can use a BOM at the start of the file to
indicate the character encoding."
That sentence should read:
"If the encoding is one of UTF-8, UTF-16, or UTF-32, then authors can
use a BOM at the start of the file to indicate the character encoding."
The encoding labels with LE or BE in them mean BOMless variants where
the encoding label on the transfer protocol level gives the
endianness. See http://www.ietf.org/rfc/rfc2781.txt When the spec
refers to UTF-16 with BOM in a particular endianness, I think the
spec should use "big-endian UTF-16" and "little-endian UTF-16".
Since declaring endianness on the transfer protocol level has no
benefit over using the BOM when the label is right and there's a
chance to get the label wrong, the encoding labels with explicit
endianness are harmful for interchange. In my opinion, the spec
should avoid giving authors any bad ideas by reinforcing these labels
by repetition.
--
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
More information about the whatwg
mailing list