[html5] r1274 - /
whatwg at whatwg.org
whatwg at whatwg.org
Thu Feb 28 15:26:36 PST 2008
Author: ianh
Date: 2008-02-28 15:26:32 -0800 (Thu, 28 Feb 2008)
New Revision: 1274
Modified:
index
source
Log:
[e] (0) remove 'BOM' from the table of encoding names. add a note saying that encoding errors are still errors.
Modified: index
===================================================================
--- index 2008-02-28 22:52:47 UTC (rev 1273)
+++ index 2008-02-28 23:26:32 UTC (rev 1274)
@@ -38012,31 +38012,31 @@
<tr>
<th>Bytes in Hexadecimal
- <th>Description
+ <th>Encoding
<tbody><!-- nobody uses this
<tr>
<td>00 00 FE FF
- <td>UTF-32BE BOM
+ <td>UTF-32BE
<tr>
<td>FF FE 00 00
- <td>UTF-32LE BOM
+ <td>UTF-32LE
-->
<tr>
<td>FE FF
- <td>UTF-16BE BOM
+ <td>UTF-16BE
<tr>
<td>FF FE
- <td>UTF-16LE BOM
+ <td>UTF-16LE
<tr>
<td>EF BB BF
- <td>UTF-8 BOM <!-- nobody uses this
+ <td>UTF-8 <!-- nobody uses this
<tr>
<td>DD 73 66 73
<td>UTF-EBCDIC
@@ -38044,6 +38044,8 @@
</table>
+ <p class=note>This step looks for Unicode Byte Order Marks (BOMs).
+
<li>
<p>Otherwise, the user agent will have to search for explicit character
encoding information in the file itself. This should proceed as follows:
@@ -38421,6 +38423,11 @@
be converted to Unicode characters must be converted to U+FFFD REPLACEMENT
CHARACTER code points.
+ <p class=note>Bytes or sequences of bytes in the original byte stream that
+ did not conform to the encoding specification (e.g. invalid UTF-8 byte
+ sequences in a UTF-8 input stream) are errors that conformance checkers
+ are expected to report.
+
<p>One leading U+FEFF BYTE ORDER MARK character must be ignored if any are
present.
Modified: source
===================================================================
--- source 2008-02-28 22:52:47 UTC (rev 1273)
+++ source 2008-02-28 23:26:32 UTC (rev 1274)
@@ -35544,25 +35544,25 @@
<thead>
<tr>
<th>Bytes in Hexadecimal
- <th>Description
+ <th>Encoding
<tbody>
<!-- nobody uses this
<tr>
<td>00 00 FE FF
- <td>UTF-32BE BOM
+ <td>UTF-32BE
<tr>
<td>FF FE 00 00
- <td>UTF-32LE BOM
+ <td>UTF-32LE
-->
<tr>
<td>FE FF
- <td>UTF-16BE BOM
+ <td>UTF-16BE
<tr>
<td>FF FE
- <td>UTF-16LE BOM
+ <td>UTF-16LE
<tr>
<td>EF BB BF
- <td>UTF-8 BOM
+ <td>UTF-8
<!-- nobody uses this
<tr>
<td>DD 73 66 73
@@ -35570,6 +35570,9 @@
-->
</table>
+ <p class="note">This step looks for Unicode Byte Order Marks
+ (BOMs).</p></li>
+
<li><p>Otherwise, the user agent will have to search for explicit
character encoding information in the file itself. This should
proceed as follows:
@@ -35979,6 +35982,11 @@
could not be converted to Unicode characters must be converted to
U+FFFD REPLACEMENT CHARACTER code points.</p>
+ <p class="note">Bytes or sequences of bytes in the original byte
+ stream that did not conform to the encoding specification
+ (e.g. invalid UTF-8 byte sequences in a UTF-8 input stream) are
+ errors that conformance checkers are expected to report.</p>
+
<p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
any are present.</p>
More information about the Commit-Watchers
mailing list