[html5] r967 - /
whatwg at whatwg.org
whatwg at whatwg.org
Thu Jun 28 14:11:36 PDT 2007
Author: ianh
Date: 2007-06-28 14:11:33 -0700 (Thu, 28 Jun 2007)
New Revision: 967
Modified:
index
source
Log:
[e] (2) Clarify who is in charge of dropping BOMs. Hint: it's not the air force.
Modified: index
===================================================================
--- index 2007-06-28 08:09:07 UTC (rev 966)
+++ index 2007-06-28 21:11:33 UTC (rev 967)
@@ -30884,7 +30884,8 @@
<p>Bytes or sequences of bytes that are not valid UTF-8 sequences must be
interpreted as the U+FFFD REPLACEMENT CHARACTER.
- <p>A leading U+FEFF BYTE ORDER MARK character must be ignored if present.
+ <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if any are
+ present.
<p>The stream must then be parsed by reading everything line by line, in
blocks separated by blank lines. Comment lines (those starting with the
@@ -33494,13 +33495,15 @@
<p>Given an encoding, the bytes in the input stream must be converted to
Unicode characters for the tokeniser, as described by the rules for that
- encoding.
+ encoding, except that leading U+FEFF BYTE ORDER MARK characters must not
+ be stripped by the encoding layer.
<p>Bytes or sequences of bytes in the original byte stream that could not
be converted to Unicode characters must be converted to U+FFFD REPLACEMENT
CHARACTER code points.
- <p>A leading U+FEFF BYTE ORDER MARK (BOM) must be dropped if present.
+ <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if any are
+ present.
<p>All U+0000 NULL characters in the input must be replaced by U+FFFD
REPLACEMENT CHARACTERs. Any occurrences of such characters is a <a
Modified: source
===================================================================
--- source 2007-06-28 08:09:07 UTC (rev 966)
+++ source 2007-06-28 21:11:33 UTC (rev 967)
@@ -28318,8 +28318,8 @@
<p>Bytes or sequences of bytes that are not valid UTF-8 sequences
must be interpreted as the U+FFFD REPLACEMENT CHARACTER.</p>
- <p>A leading U+FEFF BYTE ORDER MARK character must be ignored if
- present.</p>
+ <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
+ any are present.</p>
<p>The stream must then be parsed by reading everything line by
line, in blocks separated by blank lines. Comment lines (those
@@ -31002,14 +31002,15 @@
<p>Given an encoding, the bytes in the input stream must be
converted to Unicode characters for the tokeniser, as described by
- the rules for that encoding.</p>
+ the rules for that encoding, except that leading U+FEFF BYTE ORDER
+ MARK characters must not be stripped by the encoding layer.</p>
<p>Bytes or sequences of bytes in the original byte stream that
could not be converted to Unicode characters must be converted to
U+FFFD REPLACEMENT CHARACTER code points.</p>
- <p>A leading U+FEFF BYTE ORDER MARK (BOM) must be dropped if
- present.</p>
+ <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
+ any are present.</p>
<p>All U+0000 NULL characters in the input must be replaced by
U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is
More information about the Commit-Watchers
mailing list