[html5] r1701 - /

whatwg at whatwg.org whatwg at whatwg.org
Sat May 24 03:27:58 PDT 2008


Author: ianh
Date: 2008-05-24 03:27:58 -0700 (Sat, 24 May 2008)
New Revision: 1701

Modified:
   index
   source
Log:
[ct] (0) Shun UTF-32. Make it slightly clearer what 'UTF-16' means.

Modified: index
===================================================================
--- index	2008-05-24 10:20:43 UTC (rev 1700)
+++ index	2008-05-24 10:27:58 UTC (rev 1701)
@@ -33173,17 +33173,20 @@
       <tr>
        <td>FE FF
 
-       <td>UTF-16BE BOM <!-- followed by a character --> or UTF-32LE BOM
+       <td>UTF-16BE BOM
+        <!-- followed by a character --><!-- nobody uses this: or UTF-32LE BOM -->
+        
 
       <tr>
        <td>FF FE
 
        <td>UTF-16LE BOM <!-- followed by a character -->
-
+        <!-- nobody uses this
       <tr>
        <td>00 00 FE FF
-
-       <td>UTF-32BE BOM <!-- this one is redundant with the one above
+       <td>UTF-32BE BOM
+-->
+        <!-- this one is redundant with the one above
       <tr>
        <td>FF FE 00 00
        <td>UTF-32LE BOM
@@ -33205,8 +33208,6 @@
 
     <p>...then the sniffed type of the resource is "text/plain".</p>
 
-    <p class=big-issue>Should we remove UTF-32 from the above?</p>
-
    <li>
     <p>Otherwise, if any of the first <var title="">n</var> bytes of the
      resource are in one of the following byte ranges:</p>
@@ -42216,6 +42217,10 @@
   <p>Support for UTF-32 is not recommended. This encoding is rarely used, and
    frequently misimplemented.
 
+  <p class=note>This specification does not make any attempt to support
+   UTF-32 in its algorithms; support and use of UTF-32 can thus lead to
+   unexpected behavior in implementations of this specification.
+
   <h5 id=preprocessing><span class=secno>8.2.2.3. </span>Preprocessing the
    input stream</h5>
 
@@ -42298,7 +42303,7 @@
    actual encoding of the file.
 
   <ol>
-   <li>If the new encoding is UTF-16, change it to UTF-8.
+   <li>If the new encoding is a UTF-16 encoding, change it to UTF-8.
 
    <li>If the new encoding is identical or equivalent to the encoding that is
     already being used to interpret the input stream, then set the <a

Modified: source
===================================================================
--- source	2008-05-24 10:20:43 UTC (rev 1700)
+++ source	2008-05-24 10:27:58 UTC (rev 1701)
@@ -31031,13 +31031,15 @@
      <tbody>
       <tr>
        <td>FE FF
-       <td>UTF-16BE BOM <!-- followed by a character --> or UTF-32LE BOM
+       <td>UTF-16BE BOM <!-- followed by a character --><!-- nobody uses this: or UTF-32LE BOM -->
       <tr>
        <td>FF FE
        <td>UTF-16LE BOM <!-- followed by a character -->
+<!-- nobody uses this
       <tr>
        <td>00 00 FE FF
        <td>UTF-32BE BOM
+-->
 <!-- this one is redundant with the one above
       <tr>
        <td>FF FE 00 00
@@ -31055,8 +31057,6 @@
 
     <p>...then the sniffed type of the resource is "text/plain".</p>
 
-    <p class="big-issue">Should we remove UTF-32 from the above?</p>
-
    </li>
 
    <li><p>Otherwise, if any of the first <var title="">n</var> bytes
@@ -39803,8 +39803,13 @@
   <p>Support for UTF-32 is not recommended. This encoding is rarely
   used, and frequently misimplemented.</p>
 
+  <p class="note">This specification does not make any attempt to
+  support UTF-32 in its algorithms; support and use of UTF-32 can thus
+  lead to unexpected behavior in implementations of this
+  specification.</p>
 
 
+
   <h5>Preprocessing the input stream</h5>
 
   <p>Given an encoding, the bytes in the input stream must be
@@ -39886,7 +39891,8 @@
 
   <ol>
 
-   <li>If the new encoding is UTF-16, change it to UTF-8.</li>
+   <li>If the new encoding is a UTF-16 encoding, change it to
+   UTF-8.</li>
 
    <li>If the new encoding is identical or equivalent to the encoding
    that is already being used to interpret the input stream, then set




More information about the Commit-Watchers mailing list