[html5] r6498 - [e] (0) Clean up how we refer to UTF-16. Fixing http://www.w3.org/Bugs/Public/sh [...]
whatwg at whatwg.org
whatwg at whatwg.org
Wed Aug 17 15:28:04 PDT 2011
Author: ianh
Date: 2011-08-17 15:28:03 -0700 (Wed, 17 Aug 2011)
New Revision: 6498
Modified:
complete.html
index
source
Log:
[e] (0) Clean up how we refer to UTF-16.
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=13396
Modified: complete.html
===================================================================
--- complete.html 2011-08-17 22:20:32 UTC (rev 6497)
+++ complete.html 2011-08-17 22:28:03 UTC (rev 6498)
@@ -3343,6 +3343,10 @@
different <meta charset> elements applying in each case.
-->
+ <p>The term <dfn id=a-utf-16-encoding>a UTF-16 encoding</dfn> refers to any variant of
+ UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without
+ a BOM, raw UTF-16LE, and raw UTF-16BE. <a href=#refsRFC2781>[RFC2781]</a></p>
+
<p>The term <dfn id=unicode-character>Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
is not a surrogate code point). <a href=#refsUNICODE>[UNICODE]</a></p>
@@ -6627,7 +6631,8 @@
component contains no unescaped non-ASCII characters. <a href=#refsRFC3987>[RFC3987]</a></li>
<li><p>The <a href=#url>URL</a> is a valid IRI reference and the <a href="#document's-character-encoding" title="document's character encoding">character encoding</a> of
- the URL's <code><a href=#document>Document</a></code> is UTF-8 or UTF-16. <a href=#refsRFC3987>[RFC3987]</a></li>
+ the URL's <code><a href=#document>Document</a></code> is UTF-8 or <a href=#a-utf-16-encoding>a UTF-16
+ encoding</a>. <a href=#refsRFC3987>[RFC3987]</a></li>
</ul><p>A string is a <dfn id=valid-non-empty-url>valid non-empty URL</dfn> if it is a
<a href=#valid-url>valid URL</a> but it is not the empty string.</p>
@@ -6819,8 +6824,8 @@
</dl></li>
- <li><p>If <var title="">encoding</var> is a UTF-16 encoding, then
- change the value of <var title="">encoding</var> to UTF-8.</li>
+ <li><p>If <var title="">encoding</var> is <a href=#a-utf-16-encoding>a UTF-16
+ encoding</a>, then change the value of <var title="">encoding</var> to UTF-8.</li>
<li>
@@ -84216,9 +84221,8 @@
<li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is false, then jump to the second
step of the overall "two step" algorithm.</li>
- <li><p>If <var title="">charset</var> is a UTF-16 encoding,
- change the value of <var title="">charset</var> to
- UTF-8.</li>
+ <li><p>If <var title="">charset</var> is <a href=#a-utf-16-encoding>a UTF-16
+ encoding</a>, change the value of <var title="">charset</var> to UTF-8.</li>
<li><p>If <var title="">charset</var> is not a supported
character encoding, then jump to the second step of the
@@ -84650,12 +84654,14 @@
violation</a> of the W3C Character Model specification, motivated
by a desire for compatibility with legacy content. <a href=#refsCHARMOD>[CHARMOD]</a></p>
- <p>When a user agent is to use the UTF-16 encoding but no BOM has
- been found, user agents must default to UTF-16LE.</p>
+ <p>When a user agent is to use the self-describing UTF-16 encoding
+ but no BOM has been found, user agents must default to little-endian
+ UTF-16.</p>
- <p class=note>The requirement to default UTF-16 to LE rather than
- BE is a <a href=#willful-violation>willful violation</a> of RFC 2781, motivated by a
- desire for compatibility with legacy content. <a href=#refsRFC2781>[RFC2781]</a></p>
+ <p class=note>The requirement to default UTF-16 to little-endian
+ rather than big-endian is a <a href=#willful-violation>willful violation</a> of RFC
+ 2781, motivated by a desire for compatibility with legacy content.
+ <a href=#refsRFC2781>[RFC2781]</a></p>
<hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
encodings. <a href=#refsCESU8>[CESU8]</a> <a href=#refsUTF7>[UTF7]</a> <a href=#refsBOCU1>[BOCU1]</a> <a href=#refsSCSU>[SCSU]</a></p>
@@ -84771,13 +84777,13 @@
earlier section failed to find the right encoding.</li>
<li>If the encoding that is already being used to interpret the
- input stream is a UTF-16 encoding, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ input stream is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. The new encoding is ignored;
if it was anything but the same encoding, then it would be clearly
incorrect.</li>
- <li>If the new encoding is a UTF-16 encoding, change it to
- UTF-8.</li>
+ <li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change
+ it to UTF-8.</li>
<li>If all the bytes up to the last byte converted by the current
decoder have the same Unicode interpretations in both the current
@@ -88176,7 +88182,7 @@
<p id=meta-charset-during-parse>If the element has a <code title=attr-meta-charset><a href=#attr-meta-charset>charset</a></code> attribute, and its value
is either a supported <a href=#ascii-compatible-character-encoding>ASCII-compatible character
- encoding</a> or a UTF-16 encoding, and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is currently
+ encoding</a> or <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is currently
<i>tentative</i>, then <a href=#change-the-encoding>change the encoding</a> to the
encoding given by the value of the <code title=attr-meta-charset><a href=#attr-meta-charset>charset</a></code> attribute.</p>
@@ -88186,8 +88192,8 @@
<code title=attr-meta-content><a href=#attr-meta-content>content</a></code> attribute, and
applying the <a href=#algorithm-for-extracting-an-encoding-from-a-meta-element>algorithm for extracting an encoding from a
<code>meta</code> element</a> to that attribute's value returns
- a supported <a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a> or a
- UTF-16 encoding, and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is currently
+ a supported <a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a> or
+ <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is currently
<i>tentative</i>, then <a href=#change-the-encoding>change the encoding</a> to the
extracted encoding.</p>
Modified: index
===================================================================
--- index 2011-08-17 22:20:32 UTC (rev 6497)
+++ index 2011-08-17 22:28:03 UTC (rev 6498)
@@ -3240,6 +3240,10 @@
different <meta charset> elements applying in each case.
-->
+ <p>The term <dfn id=a-utf-16-encoding>a UTF-16 encoding</dfn> refers to any variant of
+ UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without
+ a BOM, raw UTF-16LE, and raw UTF-16BE. <a href=#refsRFC2781>[RFC2781]</a></p>
+
<p>The term <dfn id=unicode-character>Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
is not a surrogate code point). <a href=#refsUNICODE>[UNICODE]</a></p>
@@ -6491,7 +6495,8 @@
component contains no unescaped non-ASCII characters. <a href=#refsRFC3987>[RFC3987]</a></li>
<li><p>The <a href=#url>URL</a> is a valid IRI reference and the <a href="#document's-character-encoding" title="document's character encoding">character encoding</a> of
- the URL's <code><a href=#document>Document</a></code> is UTF-8 or UTF-16. <a href=#refsRFC3987>[RFC3987]</a></li>
+ the URL's <code><a href=#document>Document</a></code> is UTF-8 or <a href=#a-utf-16-encoding>a UTF-16
+ encoding</a>. <a href=#refsRFC3987>[RFC3987]</a></li>
</ul><p>A string is a <dfn id=valid-non-empty-url>valid non-empty URL</dfn> if it is a
<a href=#valid-url>valid URL</a> but it is not the empty string.</p>
@@ -6683,8 +6688,8 @@
</dl></li>
- <li><p>If <var title="">encoding</var> is a UTF-16 encoding, then
- change the value of <var title="">encoding</var> to UTF-8.</li>
+ <li><p>If <var title="">encoding</var> is <a href=#a-utf-16-encoding>a UTF-16
+ encoding</a>, then change the value of <var title="">encoding</var> to UTF-8.</li>
<li>
@@ -79663,9 +79668,8 @@
<li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is false, then jump to the second
step of the overall "two step" algorithm.</li>
- <li><p>If <var title="">charset</var> is a UTF-16 encoding,
- change the value of <var title="">charset</var> to
- UTF-8.</li>
+ <li><p>If <var title="">charset</var> is <a href=#a-utf-16-encoding>a UTF-16
+ encoding</a>, change the value of <var title="">charset</var> to UTF-8.</li>
<li><p>If <var title="">charset</var> is not a supported
character encoding, then jump to the second step of the
@@ -80097,12 +80101,14 @@
violation</a> of the W3C Character Model specification, motivated
by a desire for compatibility with legacy content. <a href=#refsCHARMOD>[CHARMOD]</a></p>
- <p>When a user agent is to use the UTF-16 encoding but no BOM has
- been found, user agents must default to UTF-16LE.</p>
+ <p>When a user agent is to use the self-describing UTF-16 encoding
+ but no BOM has been found, user agents must default to little-endian
+ UTF-16.</p>
- <p class=note>The requirement to default UTF-16 to LE rather than
- BE is a <a href=#willful-violation>willful violation</a> of RFC 2781, motivated by a
- desire for compatibility with legacy content. <a href=#refsRFC2781>[RFC2781]</a></p>
+ <p class=note>The requirement to default UTF-16 to little-endian
+ rather than big-endian is a <a href=#willful-violation>willful violation</a> of RFC
+ 2781, motivated by a desire for compatibility with legacy content.
+ <a href=#refsRFC2781>[RFC2781]</a></p>
<hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
encodings. <a href=#refsCESU8>[CESU8]</a> <a href=#refsUTF7>[UTF7]</a> <a href=#refsBOCU1>[BOCU1]</a> <a href=#refsSCSU>[SCSU]</a></p>
@@ -80218,13 +80224,13 @@
earlier section failed to find the right encoding.</li>
<li>If the encoding that is already being used to interpret the
- input stream is a UTF-16 encoding, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ input stream is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. The new encoding is ignored;
if it was anything but the same encoding, then it would be clearly
incorrect.</li>
- <li>If the new encoding is a UTF-16 encoding, change it to
- UTF-8.</li>
+ <li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change
+ it to UTF-8.</li>
<li>If all the bytes up to the last byte converted by the current
decoder have the same Unicode interpretations in both the current
@@ -83623,7 +83629,7 @@
<p id=meta-charset-during-parse>If the element has a <code title=attr-meta-charset><a href=#attr-meta-charset>charset</a></code> attribute, and its value
is either a supported <a href=#ascii-compatible-character-encoding>ASCII-compatible character
- encoding</a> or a UTF-16 encoding, and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is currently
+ encoding</a> or <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is currently
<i>tentative</i>, then <a href=#change-the-encoding>change the encoding</a> to the
encoding given by the value of the <code title=attr-meta-charset><a href=#attr-meta-charset>charset</a></code> attribute.</p>
@@ -83633,8 +83639,8 @@
<code title=attr-meta-content><a href=#attr-meta-content>content</a></code> attribute, and
applying the <a href=#algorithm-for-extracting-an-encoding-from-a-meta-element>algorithm for extracting an encoding from a
<code>meta</code> element</a> to that attribute's value returns
- a supported <a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a> or a
- UTF-16 encoding, and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is currently
+ a supported <a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a> or
+ <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is currently
<i>tentative</i>, then <a href=#change-the-encoding>change the encoding</a> to the
extracted encoding.</p>
Modified: source
===================================================================
--- source 2011-08-17 22:20:32 UTC (rev 6497)
+++ source 2011-08-17 22:28:03 UTC (rev 6498)
@@ -2202,6 +2202,11 @@
different <meta charset> elements applying in each case.
-->
+ <p>The term <dfn>a UTF-16 encoding</dfn> refers to any variant of
+ UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without
+ a BOM, raw UTF-16LE, and raw UTF-16BE. <a
+ href="#refsRFC2781">[RFC2781]</a></p>
+
<p>The term <dfn>Unicode character</dfn> is used to mean a <i
title="">Unicode scalar value</i> (i.e. any Unicode code point that
is not a surrogate code point). <a
@@ -6212,8 +6217,8 @@
<li><p>The <span>URL</span> is a valid IRI reference and the <span
title="document's character encoding">character encoding</span> of
- the URL's <code>Document</code> is UTF-8 or UTF-16. <a
- href="#refsRFC3987">[RFC3987]</a></p></li>
+ the URL's <code>Document</code> is UTF-8 or <span>a UTF-16
+ encoding</span>. <a href="#refsRFC3987">[RFC3987]</a></p></li>
</ul>
@@ -6435,8 +6440,9 @@
</li>
- <li><p>If <var title="">encoding</var> is a UTF-16 encoding, then
- change the value of <var title="">encoding</var> to UTF-8.</p></li>
+ <li><p>If <var title="">encoding</var> is <span>a UTF-16
+ encoding</span>, then change the value of <var
+ title="">encoding</var> to UTF-8.</p></li>
<li>
@@ -95332,9 +95338,9 @@
title="">got pragma</var> is false, then jump to the second
step of the overall "two step" algorithm.</p></li>
- <li><p>If <var title="">charset</var> is a UTF-16 encoding,
- change the value of <var title="">charset</var> to
- UTF-8.</p></li>
+ <li><p>If <var title="">charset</var> is <span>a UTF-16
+ encoding</span>, change the value of <var
+ title="">charset</var> to UTF-8.</p></li>
<li><p>If <var title="">charset</var> is not a supported
character encoding, then jump to the second step of the
@@ -95876,13 +95882,14 @@
by a desire for compatibility with legacy content. <a
href="#refsCHARMOD">[CHARMOD]</a></p>
- <p>When a user agent is to use the UTF-16 encoding but no BOM has
- been found, user agents must default to UTF-16LE.</p>
+ <p>When a user agent is to use the self-describing UTF-16 encoding
+ but no BOM has been found, user agents must default to little-endian
+ UTF-16.</p>
- <p class="note">The requirement to default UTF-16 to LE rather than
- BE is a <span>willful violation</span> of RFC 2781, motivated by a
- desire for compatibility with legacy content. <a
- href="#refsRFC2781">[RFC2781]</a></p>
+ <p class="note">The requirement to default UTF-16 to little-endian
+ rather than big-endian is a <span>willful violation</span> of RFC
+ 2781, motivated by a desire for compatibility with legacy content.
+ <a href="#refsRFC2781">[RFC2781]</a></p>
<hr>
@@ -96006,14 +96013,14 @@
earlier section failed to find the right encoding.</li>
<li>If the encoding that is already being used to interpret the
- input stream is a UTF-16 encoding, then set the <span
+ input stream is <span>a UTF-16 encoding</span>, then set the <span
title="concept-encoding-confidence">confidence</span> to
<i>certain</i> and abort these steps. The new encoding is ignored;
if it was anything but the same encoding, then it would be clearly
incorrect.</li>
- <li>If the new encoding is a UTF-16 encoding, change it to
- UTF-8.</li>
+ <li>If the new encoding is <span>a UTF-16 encoding</span>, change
+ it to UTF-8.</li>
<li>If all the bytes up to the last byte converted by the current
decoder have the same Unicode interpretations in both the current
@@ -99925,7 +99932,7 @@
<p id="meta-charset-during-parse">If the element has a <code
title="attr-meta-charset">charset</code> attribute, and its value
is either a supported <span>ASCII-compatible character
- encoding</span> or a UTF-16 encoding, and the <span
+ encoding</span> or <span>a UTF-16 encoding</span>, and the <span
title="concept-encoding-confidence">confidence</span> is currently
<i>tentative</i>, then <span>change the encoding</span> to the
encoding given by the value of the <code
@@ -99938,8 +99945,8 @@
<code title="attr-meta-content">content</code> attribute, and
applying the <span>algorithm for extracting an encoding from a
<code>meta</code> element</span> to that attribute's value returns
- a supported <span>ASCII-compatible character encoding</span> or a
- UTF-16 encoding, and the <span
+ a supported <span>ASCII-compatible character encoding</span> or
+ <span>a UTF-16 encoding</span>, and the <span
title="concept-encoding-confidence">confidence</span> is currently
<i>tentative</i>, then <span>change the encoding</span> to the
extracted encoding.</p>
More information about the Commit-Watchers
mailing list