[html5] r967 - /

whatwg at whatwg.org whatwg at whatwg.org
Thu Jun 28 14:11:36 PDT 2007


Author: ianh
Date: 2007-06-28 14:11:33 -0700 (Thu, 28 Jun 2007)
New Revision: 967

Modified:
   index
   source
Log:
[e] (2) Clarify who is in charge of dropping BOMs. Hint: it's not the air force.

Modified: index
===================================================================
--- index	2007-06-28 08:09:07 UTC (rev 966)
+++ index	2007-06-28 21:11:33 UTC (rev 967)
@@ -30884,7 +30884,8 @@
   <p>Bytes or sequences of bytes that are not valid UTF-8 sequences must be
    interpreted as the U+FFFD REPLACEMENT CHARACTER.
 
-  <p>A leading U+FEFF BYTE ORDER MARK character must be ignored if present.
+  <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if any are
+   present.
 
   <p>The stream must then be parsed by reading everything line by line, in
    blocks separated by blank lines. Comment lines (those starting with the
@@ -33494,13 +33495,15 @@
 
   <p>Given an encoding, the bytes in the input stream must be converted to
    Unicode characters for the tokeniser, as described by the rules for that
-   encoding.
+   encoding, except that leading U+FEFF BYTE ORDER MARK characters must not
+   be stripped by the encoding layer.
 
   <p>Bytes or sequences of bytes in the original byte stream that could not
    be converted to Unicode characters must be converted to U+FFFD REPLACEMENT
    CHARACTER code points.
 
-  <p>A leading U+FEFF BYTE ORDER MARK (BOM) must be dropped if present.
+  <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if any are
+   present.
 
   <p>All U+0000 NULL characters in the input must be replaced by U+FFFD
    REPLACEMENT CHARACTERs. Any occurrences of such characters is a <a

Modified: source
===================================================================
--- source	2007-06-28 08:09:07 UTC (rev 966)
+++ source	2007-06-28 21:11:33 UTC (rev 967)
@@ -28318,8 +28318,8 @@
   <p>Bytes or sequences of bytes that are not valid UTF-8 sequences
   must be interpreted as the U+FFFD REPLACEMENT CHARACTER.</p>
 
-  <p>A leading U+FEFF BYTE ORDER MARK character must be ignored if
-  present.</p>
+  <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
+  any are present.</p>
 
   <p>The stream must then be parsed by reading everything line by
   line, in blocks separated by blank lines. Comment lines (those
@@ -31002,14 +31002,15 @@
 
   <p>Given an encoding, the bytes in the input stream must be
   converted to Unicode characters for the tokeniser, as described by
-  the rules for that encoding.</p>
+  the rules for that encoding, except that leading U+FEFF BYTE ORDER
+  MARK characters must not be stripped by the encoding layer.</p>
 
   <p>Bytes or sequences of bytes in the original byte stream that
   could not be converted to Unicode characters must be converted to
   U+FFFD REPLACEMENT CHARACTER code points.</p>
 
-  <p>A leading U+FEFF BYTE ORDER MARK (BOM) must be dropped if
-  present.</p>
+  <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
+  any are present.</p>
 
   <p>All U+0000 NULL characters in the input must be replaced by
   U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is




More information about the Commit-Watchers mailing list