[html5] r1701 - /
whatwg at whatwg.org
whatwg at whatwg.org
Sat May 24 03:27:58 PDT 2008
Author: ianh
Date: 2008-05-24 03:27:58 -0700 (Sat, 24 May 2008)
New Revision: 1701
Modified:
index
source
Log:
[ct] (0) Shun UTF-32. Make it slightly clearer what 'UTF-16' means.
Modified: index
===================================================================
--- index 2008-05-24 10:20:43 UTC (rev 1700)
+++ index 2008-05-24 10:27:58 UTC (rev 1701)
@@ -33173,17 +33173,20 @@
<tr>
<td>FE FF
- <td>UTF-16BE BOM <!-- followed by a character --> or UTF-32LE BOM
+ <td>UTF-16BE BOM
+ <!-- followed by a character --><!-- nobody uses this: or UTF-32LE BOM -->
+
<tr>
<td>FF FE
<td>UTF-16LE BOM <!-- followed by a character -->
-
+ <!-- nobody uses this
<tr>
<td>00 00 FE FF
-
- <td>UTF-32BE BOM <!-- this one is redundant with the one above
+ <td>UTF-32BE BOM
+-->
+ <!-- this one is redundant with the one above
<tr>
<td>FF FE 00 00
<td>UTF-32LE BOM
@@ -33205,8 +33208,6 @@
<p>...then the sniffed type of the resource is "text/plain".</p>
- <p class=big-issue>Should we remove UTF-32 from the above?</p>
-
<li>
<p>Otherwise, if any of the first <var title="">n</var> bytes of the
resource are in one of the following byte ranges:</p>
@@ -42216,6 +42217,10 @@
<p>Support for UTF-32 is not recommended. This encoding is rarely used, and
frequently misimplemented.
+ <p class=note>This specification does not make any attempt to support
+ UTF-32 in its algorithms; support and use of UTF-32 can thus lead to
+ unexpected behavior in implementations of this specification.
+
<h5 id=preprocessing><span class=secno>8.2.2.3. </span>Preprocessing the
input stream</h5>
@@ -42298,7 +42303,7 @@
actual encoding of the file.
<ol>
- <li>If the new encoding is UTF-16, change it to UTF-8.
+ <li>If the new encoding is a UTF-16 encoding, change it to UTF-8.
<li>If the new encoding is identical or equivalent to the encoding that is
already being used to interpret the input stream, then set the <a
Modified: source
===================================================================
--- source 2008-05-24 10:20:43 UTC (rev 1700)
+++ source 2008-05-24 10:27:58 UTC (rev 1701)
@@ -31031,13 +31031,15 @@
<tbody>
<tr>
<td>FE FF
- <td>UTF-16BE BOM <!-- followed by a character --> or UTF-32LE BOM
+ <td>UTF-16BE BOM <!-- followed by a character --><!-- nobody uses this: or UTF-32LE BOM -->
<tr>
<td>FF FE
<td>UTF-16LE BOM <!-- followed by a character -->
+<!-- nobody uses this
<tr>
<td>00 00 FE FF
<td>UTF-32BE BOM
+-->
<!-- this one is redundant with the one above
<tr>
<td>FF FE 00 00
@@ -31055,8 +31057,6 @@
<p>...then the sniffed type of the resource is "text/plain".</p>
- <p class="big-issue">Should we remove UTF-32 from the above?</p>
-
</li>
<li><p>Otherwise, if any of the first <var title="">n</var> bytes
@@ -39803,8 +39803,13 @@
<p>Support for UTF-32 is not recommended. This encoding is rarely
used, and frequently misimplemented.</p>
+ <p class="note">This specification does not make any attempt to
+ support UTF-32 in its algorithms; support and use of UTF-32 can thus
+ lead to unexpected behavior in implementations of this
+ specification.</p>
+
<h5>Preprocessing the input stream</h5>
<p>Given an encoding, the bytes in the input stream must be
@@ -39886,7 +39891,8 @@
<ol>
- <li>If the new encoding is UTF-16, change it to UTF-8.</li>
+ <li>If the new encoding is a UTF-16 encoding, change it to
+ UTF-8.</li>
<li>If the new encoding is identical or equivalent to the encoding
that is already being used to interpret the input stream, then set
More information about the Commit-Watchers
mailing list