[html5] r5666 - [giow] (2) Parser: don't convert 0000 to FFFD in the input stream processor, ins [...]

whatwg at whatwg.org whatwg at whatwg.org
Mon Nov 1 19:08:54 PDT 2010


Author: ianh
Date: 2010-11-01 19:08:52 -0700 (Mon, 01 Nov 2010)
New Revision: 5666

Modified:
   complete.html
   index
   source
Log:
[giow] (2) Parser: don't convert 0000 to FFFD in the input stream processor, instead do it (mostly) in the tokenizer, so that we can instead swallow 0000s in body.
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659

Modified: complete.html
===================================================================
--- complete.html	2010-11-02 01:06:04 UTC (rev 5665)
+++ complete.html	2010-11-02 02:08:52 UTC (rev 5666)
@@ -75503,12 +75503,12 @@
   motivated by a desire to increase the resilience of user agents in
   the face of naïve transcoders.</p>
 
-  <p>All U+0000 NULL characters and code points in the range U+D800 to
-  U+DFFF<!-- surrogates not allowed e.g. in UTF-8, and we don't want
-  them to suddenly turn into codepoints when they go through a UTF-16
-  pipe --> in the input must be replaced by U+FFFD REPLACEMENT
-  CHARACTERs. Any occurrences of such characters and code points are
-  <a href=#parse-error title="parse error">parse errors</a>.</p>
+  <p>Code points in the range U+D800 to U+DFFF<!-- surrogates are not
+  allowed e.g. in UTF-8, and we don't want them to suddenly turn into
+  codepoints when they go through a UTF-16 pipe --> in the input must
+  be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of
+  such characters and code points are <a href=#parse-error title="parse error">parse
+  errors</a>.</p>
 
   <p>Any occurrences of any characters in the ranges U+0001 to U+0008,
   <!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF,
@@ -76147,6 +76147,10 @@
    <dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <a href=#tag-open-state>tag open state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit the <a href=#current-input-character>current input
+   character</a> as a character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -76178,6 +76182,10 @@
    <dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <a href=#rcdata-less-than-sign-state>RCDATA less-than sign state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -76205,6 +76213,10 @@
   <dl class=switch><dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <a href=#rawtext-less-than-sign-state>RAWTEXT less-than sign state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -76219,6 +76231,10 @@
   <dl class=switch><dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <a href=#script-data-less-than-sign-state>script data less-than sign state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -76230,7 +76246,11 @@
 
   <p>Consume the <a href=#next-input-character>next input character</a>:</p>
 
-  <dl class=switch><dt>EOF</dt>
+  <dl class=switch><dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
+   <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
    <dt>Anything else</dt>
@@ -76322,6 +76342,10 @@
    character</a> (add 0x0020 to the character's code point) to the
    current tag token's tag name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current tag token's tag name.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -76628,6 +76652,10 @@
    <dd><p>Switch to the <a href=#script-data-escaped-less-than-sign-state>script data escaped less-than sign
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -76648,6 +76676,11 @@
    <dd><p>Switch to the <a href=#script-data-escaped-less-than-sign-state>script data escaped less-than sign
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#script-data-escaped-state>script data
+   escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER character
+   token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -76671,6 +76704,11 @@
    <dd>Switch to the <a href=#script-data-state>script data state</a>. Emit a U+003E
    GREATER-THAN SIGN character token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#script-data-escaped-state>script data
+   escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER character
+   token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -76821,6 +76859,10 @@
    sign state</a>. Emit a U+003C LESS-THAN SIGN character
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -76842,6 +76884,11 @@
    sign state</a>. Emit a U+003C LESS-THAN SIGN character
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#script-data-double-escaped-state>script data
+   double escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -76867,6 +76914,11 @@
    <dd>Switch to the <a href=#script-data-state>script data state</a>. Emit a U+003E
    GREATER-THAN SIGN character token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#script-data-double-escaped-state>script data
+   double escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -76945,6 +76997,12 @@
    value to the empty string. Switch to the <a href=#attribute-name-state>attribute name
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Start a new attribute in the current
+   tag token. Set that attribute's name to a U+FFFD REPLACEMENT
+   CHARACTER character, and its value to the empty string. Switch to
+   the <a href=#attribute-name-state>attribute name state</a>.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -76958,8 +77016,8 @@
 
    <dt>Anything else</dt>
    <dd>Start a new attribute in the current tag token. Set that
-   attribute's name to the <a href=#current-input-character>current input character</a>, and its value to
-   the empty string. Switch to the <a href=#attribute-name-state>attribute name
+   attribute's name to the <a href=#current-input-character>current input character</a>, and
+   its value to the empty string. Switch to the <a href=#attribute-name-state>attribute name
    state</a>.</dd>
 
   </dl><h5 id=attribute-name-state><span class=secno>12.2.4.35 </span><dfn>Attribute name state</dfn></h5>
@@ -76988,6 +77046,10 @@
    character</a> (add 0x0020 to the character's code point) to the
    current attribute's name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's name.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -77039,6 +77101,12 @@
    and its value to the empty string. Switch to the <a href=#attribute-name-state>attribute
    name state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Start a new attribute in the current
+   tag token. Set that attribute's name to a U+FFFD REPLACEMENT
+   CHARACTER character, and its value to the empty string. Switch to
+   the <a href=#attribute-name-state>attribute name state</a>.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -77076,6 +77144,11 @@
    <dt>U+0027 APOSTROPHE (')</dt>
    <dd>Switch to the <a href=#attribute-value-(single-quoted)-state>attribute value (single-quoted) state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value. Switch to the
+   <a href=#attribute-value-(unquoted)-state>attribute value (unquoted) state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#data-state>data
    state</a>. Emit the current tag token.</dd>
@@ -77108,6 +77181,10 @@
    state</a>, with the <a href=#additional-allowed-character>additional allowed character</a>
    being U+0022 QUOTATION MARK (").</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -77157,6 +77234,10 @@
    <dd>Switch to the <a href=#data-state>data state</a>. Emit the current tag
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -77235,12 +77316,13 @@
   <p>Consume every character up to and including the first U+003E
   GREATER-THAN SIGN character (>) or the end of the file (EOF),
   whichever comes first. Emit a comment token whose data is the
-  concatenation of all the characters starting from and including
-  the character that caused the state machine to switch into the
-  bogus comment state, up to and including the character immediately
-  before the last consumed character (i.e. up to the character just
-  before the U+003E or EOF character). (If the comment was started
-  by the end of the file (EOF), the token is empty.)</p>
+  concatenation of all the characters starting from and including the
+  character that caused the state machine to switch into the bogus
+  comment state, up to and including the character immediately before
+  the last consumed character (i.e. up to the character just before
+  the U+003E or EOF character), but with any U+0000 NULL characters
+  replaced by U+FFFD REPLACEMENT CHARACTER characters. (If the comment
+  was started by the end of the file (EOF), the token is empty.)</p>
 
   <p>Switch to the <a href=#data-state>data state</a>.</p>
 
@@ -77280,6 +77362,11 @@
   <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href=#comment-start-dash-state>comment start dash state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the comment token's data. Switch to the <a href=#comment-state>comment
+   state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#data-state>data
    state</a>. Emit the comment token.</dd> <!-- see comment in
@@ -77300,6 +77387,12 @@
   <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href=#comment-end-state>comment end state</a></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+002D HYPHEN-MINUS
+   character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <a href=#comment-state>comment
+   state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#data-state>data
    state</a>. Emit the comment token.</dd>
@@ -77321,6 +77414,10 @@
   <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href=#comment-end-dash-state>comment end dash state</a></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the comment token's data.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Emit the comment token. Reconsume the
    EOF character in the <a href=#data-state>data state</a>.</dd> <!-- see comment
@@ -77337,6 +77434,12 @@
   <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href=#comment-end-state>comment end state</a></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+002D HYPHEN-MINUS
+   character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <a href=#comment-state>comment
+   state</a>.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Emit the comment token. Reconsume the
    EOF character in the <a href=#data-state>data state</a>.</dd> <!-- see comment
@@ -77355,6 +77458,12 @@
    <dd>Switch to the <a href=#data-state>data state</a>. Emit the comment
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append two U+002D HYPHEN-MINUS
+   characters (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <a href=#comment-state>comment
+   state</a>.</dd>
+
    <dt>U+0021 EXCLAMATION MARK (!)</dt>
    <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#comment-end-bang-state>comment end bang
    state</a>.</dd>
@@ -77390,6 +77499,12 @@
    <dd>Switch to the <a href=#data-state>data state</a>. Emit the comment
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append two U+002D HYPHEN-MINUS
+   characters (-), a U+0021 EXCLAMATION MARK character (!), and a
+   U+FFFD REPLACEMENT CHARACTER character to the comment token's data.
+   Switch to the <a href=#comment-state>comment state</a>.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Emit the comment token. Reconsume
    the EOF character in the <a href=#data-state>data state</a>.</dd> <!-- see
@@ -77438,6 +77553,11 @@
    character's code point). Switch to the <a href=#doctype-name-state>DOCTYPE name
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Set the token's name to a U+FFFD
+   REPLACEMENT CHARACTER character. Switch to the <a href=#doctype-name-state>DOCTYPE name
+   state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Create a new DOCTYPE token. Set its
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#data-state>data
@@ -77473,6 +77593,10 @@
    character</a> (add 0x0020 to the character's code point) to the
    current DOCTYPE token's name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's name.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Emit that DOCTYPE token.
@@ -77602,6 +77726,10 @@
   <dl class=switch><dt>U+0022 QUOTATION MARK (")</dt>
    <dd>Switch to the <a href=#after-doctype-public-identifier-state>after DOCTYPE public identifier state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's public identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#data-state>data
@@ -77613,8 +77741,8 @@
    Reconsume the EOF character in the <a href=#data-state>data state</a>.</dd>
 
    <dt>Anything else</dt>
-   <dd>Append the <a href=#current-input-character>current input character</a> to the current DOCTYPE
-   token's public identifier.</dd>
+   <dd>Append the <a href=#current-input-character>current input character</a> to the current
+   DOCTYPE token's public identifier.</dd>
 
   </dl><h5 id=doctype-public-identifier-(single-quoted)-state><span class=secno>12.2.4.59 </span><dfn>DOCTYPE public identifier (single-quoted) state</dfn></h5>
 
@@ -77623,6 +77751,10 @@
   <dl class=switch><dt>U+0027 APOSTROPHE (')</dt>
    <dd>Switch to the <a href=#after-doctype-public-identifier-state>after DOCTYPE public identifier state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's public identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#data-state>data
@@ -77634,8 +77766,8 @@
    Reconsume the EOF character in the <a href=#data-state>data state</a>.</dd>
 
    <dt>Anything else</dt>
-   <dd>Append the <a href=#current-input-character>current input character</a> to the current DOCTYPE
-   token's public identifier.</dd>
+   <dd>Append the <a href=#current-input-character>current input character</a> to the current
+   DOCTYPE token's public identifier.</dd>
 
   </dl><h5 id=after-doctype-public-identifier-state><span class=secno>12.2.4.60 </span><dfn>After DOCTYPE public identifier state</dfn></h5>
 
@@ -77789,6 +77921,10 @@
    <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's system identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#data-state>data
@@ -77811,6 +77947,10 @@
    <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's system identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#data-state>data
@@ -77873,7 +78013,9 @@
   end of the file (EOF), whichever comes first. Emit a series of
   character tokens consisting of all the characters consumed except
   the matching three character sequence at the end (if one was found
-  before the end of the file).</p>
+  before the end of the file)<!--(not needed; taken care of by the
+  tree constructor), but with any U+0000 NULL characters replaced by
+  U+FFFD REPLACEMENT CHARACTER characters-->.</p>
 
   <p>Switch to the <a href=#data-state>data state</a>.</p>
 
@@ -79068,29 +79210,44 @@
   <p>When the <a href=#insertion-mode>insertion mode</a> is "<a href=#parsing-main-inbody title="insertion
   mode: in body">in body</a>", tokens must be handled as follows:</p>
 
-  <dl class=switch><dt>A character token</dt>
+  <dl class=switch><dt>A character token that is U+0000 NULL</dt>
    <dd>
 
+    <p><a href=#parse-error>Parse error</a>. Ignore the token.</p>
+
+    <!-- The D-Link DSL-G604T ADSL router has a zero byte in its
+         configuration UI before a <frameset>, which is why U+0000 is
+         special-cased here.
+         refs: https://bugzilla.mozilla.org/show_bug.cgi?id=563526
+               http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659
+    -->
+
+   </dd>
+
+   <dt>A character token that is one of U+0009 CHARACTER TABULATION,
+   U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE
+   RETURN (CR), or U+0020 SPACE</dt>
+   <dd>
+
     <p><a href=#reconstruct-the-active-formatting-elements>Reconstruct the active formatting elements</a>, if
     any.</p>
 
     <p><a href=#insert-a-character title="insert a character">Insert the token's
     character</a> into the <a href=#current-node>current node</a>.</p>
 
-    <p>If the token is not one of U+0009 CHARACTER TABULATION, U+000A
-    LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
-    (CR), U+0020 SPACE, or U+FFFD REPLACEMENT CHARACTER, then set the
-    <a href=#frameset-ok-flag>frameset-ok flag</a> to "not ok".</p>
+   </dd>
 
-    <!-- U+FFFD REPLACEMENT CHARACTER is in this list because the
-         D-Link DSL-G604T ADSL router has a zero byte in its
-         configuration UI before a <frameset>. Zero bytes get
-         converted to U+FFFD, which (without that character in this
-         list) would mean the <frameset> would be ignored.
-         refs: https://bugzilla.mozilla.org/show_bug.cgi?id=563526
-               http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659
-    -->
+   <dt>Any other character token</dt>
+   <dd>
 
+    <p><a href=#reconstruct-the-active-formatting-elements>Reconstruct the active formatting elements</a>, if
+    any.</p>
+
+    <p><a href=#insert-a-character title="insert a character">Insert the token's
+    character</a> into the <a href=#current-node>current node</a>.</p>
+
+    <p>Set the <a href=#frameset-ok-flag>frameset-ok flag</a> to "not ok".</p>
+
    </dd>
 
    <dt>A comment token</dt>
@@ -80312,6 +80469,10 @@
     <p><a href=#insert-a-character title="insert a character">Insert the token's
     character</a> into the <a href=#current-node>current node</a>.</p>
 
+    <p class=note>This can never be a U+0000 NULL character; the
+    tokenizer converts those to U+FFFD REPLACEMENT CHARACTER
+    characters.</p>
+
    </dd>
 
    <dt>An end-of-file token</dt>
@@ -81108,8 +81269,13 @@
   <p>When the <a href=#insertion-mode>insertion mode</a> is "<a href=#parsing-main-inselect title="insertion
   mode: in select">in select</a>", tokens must be handled as follows:</p>
 
-  <dl class=switch><dt>A character token</dt>
+  <dl class=switch><dt>A character token that is U+0000 NULL</dt>
    <dd>
+    <p><a href=#parse-error>Parse error</a>. Ignore the token.</p>
+   </dd>
+
+   <dt>Any other character token</dt>
+   <dd>
     <p><a href=#insert-a-character title="insert a character">Insert the token's
     character</a> into the <a href=#current-node>current node</a>.</p>
    </dd>
@@ -81309,17 +81475,33 @@
 
     </ol></dd>
 
-   <dt>A character token</dt>
+   <dt>A character token that is U+0000 NULL</dt>
    <dd>
 
+    <p><a href=#parse-error>Parse error</a>. <a href=#insert-a-character title="insert a
+    character">Insert a U+FFFD REPLACEMENT CHARACTER character</a>
+    into the <a href=#current-node>current node</a>.</p>
+
+   </dd>
+
+   <dt>A character token that is one of U+0009 CHARACTER TABULATION,
+   U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE
+   RETURN (CR), or U+0020 SPACE</dt>
+   <dd>
+
     <p><a href=#insert-a-character title="insert a character">Insert the token's
     character</a> into the <a href=#current-node>current node</a>.</p>
 
-    <p>If the token is not one of U+0009 CHARACTER TABULATION, U+000A
-    LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
-    (CR), or U+0020 SPACE, then set the <a href=#frameset-ok-flag>frameset-ok
-    flag</a> to "not ok".</p>
+   </dd>
 
+   <dt>Any other character token</dt>
+   <dd>
+
+    <p><a href=#insert-a-character title="insert a character">Insert the token's
+    character</a> into the <a href=#current-node>current node</a>.</p>
+
+    <p>Set the <a href=#frameset-ok-flag>frameset-ok flag</a> to "not ok".</p>
+
    </dd>
 
    <dt>A comment token</dt>

Modified: index
===================================================================
--- index	2010-11-02 01:06:04 UTC (rev 5665)
+++ index	2010-11-02 02:08:52 UTC (rev 5666)
@@ -71417,12 +71417,12 @@
   motivated by a desire to increase the resilience of user agents in
   the face of naïve transcoders.</p>
 
-  <p>All U+0000 NULL characters and code points in the range U+D800 to
-  U+DFFF<!-- surrogates not allowed e.g. in UTF-8, and we don't want
-  them to suddenly turn into codepoints when they go through a UTF-16
-  pipe --> in the input must be replaced by U+FFFD REPLACEMENT
-  CHARACTERs. Any occurrences of such characters and code points are
-  <a href=#parse-error title="parse error">parse errors</a>.</p>
+  <p>Code points in the range U+D800 to U+DFFF<!-- surrogates are not
+  allowed e.g. in UTF-8, and we don't want them to suddenly turn into
+  codepoints when they go through a UTF-16 pipe --> in the input must
+  be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of
+  such characters and code points are <a href=#parse-error title="parse error">parse
+  errors</a>.</p>
 
   <p>Any occurrences of any characters in the ranges U+0001 to U+0008,
   <!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF,
@@ -72061,6 +72061,10 @@
    <dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <a href=#tag-open-state>tag open state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit the <a href=#current-input-character>current input
+   character</a> as a character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -72092,6 +72096,10 @@
    <dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <a href=#rcdata-less-than-sign-state>RCDATA less-than sign state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -72119,6 +72127,10 @@
   <dl class=switch><dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <a href=#rawtext-less-than-sign-state>RAWTEXT less-than sign state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -72133,6 +72145,10 @@
   <dl class=switch><dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <a href=#script-data-less-than-sign-state>script data less-than sign state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -72144,7 +72160,11 @@
 
   <p>Consume the <a href=#next-input-character>next input character</a>:</p>
 
-  <dl class=switch><dt>EOF</dt>
+  <dl class=switch><dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
+   <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
    <dt>Anything else</dt>
@@ -72236,6 +72256,10 @@
    character</a> (add 0x0020 to the character's code point) to the
    current tag token's tag name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current tag token's tag name.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -72542,6 +72566,10 @@
    <dd><p>Switch to the <a href=#script-data-escaped-less-than-sign-state>script data escaped less-than sign
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -72562,6 +72590,11 @@
    <dd><p>Switch to the <a href=#script-data-escaped-less-than-sign-state>script data escaped less-than sign
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#script-data-escaped-state>script data
+   escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER character
+   token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -72585,6 +72618,11 @@
    <dd>Switch to the <a href=#script-data-state>script data state</a>. Emit a U+003E
    GREATER-THAN SIGN character token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#script-data-escaped-state>script data
+   escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER character
+   token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -72735,6 +72773,10 @@
    sign state</a>. Emit a U+003C LESS-THAN SIGN character
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -72756,6 +72798,11 @@
    sign state</a>. Emit a U+003C LESS-THAN SIGN character
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#script-data-double-escaped-state>script data
+   double escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -72781,6 +72828,11 @@
    <dd>Switch to the <a href=#script-data-state>script data state</a>. Emit a U+003E
    GREATER-THAN SIGN character token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#script-data-double-escaped-state>script data
+   double escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -72859,6 +72911,12 @@
    value to the empty string. Switch to the <a href=#attribute-name-state>attribute name
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Start a new attribute in the current
+   tag token. Set that attribute's name to a U+FFFD REPLACEMENT
+   CHARACTER character, and its value to the empty string. Switch to
+   the <a href=#attribute-name-state>attribute name state</a>.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -72872,8 +72930,8 @@
 
    <dt>Anything else</dt>
    <dd>Start a new attribute in the current tag token. Set that
-   attribute's name to the <a href=#current-input-character>current input character</a>, and its value to
-   the empty string. Switch to the <a href=#attribute-name-state>attribute name
+   attribute's name to the <a href=#current-input-character>current input character</a>, and
+   its value to the empty string. Switch to the <a href=#attribute-name-state>attribute name
    state</a>.</dd>
 
   </dl><h5 id=attribute-name-state><span class=secno>10.2.4.35 </span><dfn>Attribute name state</dfn></h5>
@@ -72902,6 +72960,10 @@
    character</a> (add 0x0020 to the character's code point) to the
    current attribute's name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's name.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -72953,6 +73015,12 @@
    and its value to the empty string. Switch to the <a href=#attribute-name-state>attribute
    name state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Start a new attribute in the current
+   tag token. Set that attribute's name to a U+FFFD REPLACEMENT
+   CHARACTER character, and its value to the empty string. Switch to
+   the <a href=#attribute-name-state>attribute name state</a>.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -72990,6 +73058,11 @@
    <dt>U+0027 APOSTROPHE (')</dt>
    <dd>Switch to the <a href=#attribute-value-(single-quoted)-state>attribute value (single-quoted) state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value. Switch to the
+   <a href=#attribute-value-(unquoted)-state>attribute value (unquoted) state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#data-state>data
    state</a>. Emit the current tag token.</dd>
@@ -73022,6 +73095,10 @@
    state</a>, with the <a href=#additional-allowed-character>additional allowed character</a>
    being U+0022 QUOTATION MARK (").</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
    <a href=#data-state>data state</a>.</dd>
@@ -73071,6 +73148,10 @@
    <dd>Switch to the <a href=#data-state>data state</a>. Emit the current tag
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -73149,12 +73230,13 @@
   <p>Consume every character up to and including the first U+003E
   GREATER-THAN SIGN character (>) or the end of the file (EOF),
   whichever comes first. Emit a comment token whose data is the
-  concatenation of all the characters starting from and including
-  the character that caused the state machine to switch into the
-  bogus comment state, up to and including the character immediately
-  before the last consumed character (i.e. up to the character just
-  before the U+003E or EOF character). (If the comment was started
-  by the end of the file (EOF), the token is empty.)</p>
+  concatenation of all the characters starting from and including the
+  character that caused the state machine to switch into the bogus
+  comment state, up to and including the character immediately before
+  the last consumed character (i.e. up to the character just before
+  the U+003E or EOF character), but with any U+0000 NULL characters
+  replaced by U+FFFD REPLACEMENT CHARACTER characters. (If the comment
+  was started by the end of the file (EOF), the token is empty.)</p>
 
   <p>Switch to the <a href=#data-state>data state</a>.</p>
 
@@ -73194,6 +73276,11 @@
   <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href=#comment-start-dash-state>comment start dash state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the comment token's data. Switch to the <a href=#comment-state>comment
+   state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#data-state>data
    state</a>. Emit the comment token.</dd> <!-- see comment in
@@ -73214,6 +73301,12 @@
   <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href=#comment-end-state>comment end state</a></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+002D HYPHEN-MINUS
+   character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <a href=#comment-state>comment
+   state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#data-state>data
    state</a>. Emit the comment token.</dd>
@@ -73235,6 +73328,10 @@
   <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href=#comment-end-dash-state>comment end dash state</a></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the comment token's data.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Emit the comment token. Reconsume the
    EOF character in the <a href=#data-state>data state</a>.</dd> <!-- see comment
@@ -73251,6 +73348,12 @@
   <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href=#comment-end-state>comment end state</a></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+002D HYPHEN-MINUS
+   character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <a href=#comment-state>comment
+   state</a>.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Emit the comment token. Reconsume the
    EOF character in the <a href=#data-state>data state</a>.</dd> <!-- see comment
@@ -73269,6 +73372,12 @@
    <dd>Switch to the <a href=#data-state>data state</a>. Emit the comment
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append two U+002D HYPHEN-MINUS
+   characters (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <a href=#comment-state>comment
+   state</a>.</dd>
+
    <dt>U+0021 EXCLAMATION MARK (!)</dt>
    <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#comment-end-bang-state>comment end bang
    state</a>.</dd>
@@ -73304,6 +73413,12 @@
    <dd>Switch to the <a href=#data-state>data state</a>. Emit the comment
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append two U+002D HYPHEN-MINUS
+   characters (-), a U+0021 EXCLAMATION MARK character (!), and a
+   U+FFFD REPLACEMENT CHARACTER character to the comment token's data.
+   Switch to the <a href=#comment-state>comment state</a>.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Emit the comment token. Reconsume
    the EOF character in the <a href=#data-state>data state</a>.</dd> <!-- see
@@ -73352,6 +73467,11 @@
    character's code point). Switch to the <a href=#doctype-name-state>DOCTYPE name
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Set the token's name to a U+FFFD
+   REPLACEMENT CHARACTER character. Switch to the <a href=#doctype-name-state>DOCTYPE name
+   state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Create a new DOCTYPE token. Set its
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#data-state>data
@@ -73387,6 +73507,10 @@
    character</a> (add 0x0020 to the character's code point) to the
    current DOCTYPE token's name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's name.</dd>
+
    <dt>EOF</dt>
    <dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Emit that DOCTYPE token.
@@ -73516,6 +73640,10 @@
   <dl class=switch><dt>U+0022 QUOTATION MARK (")</dt>
    <dd>Switch to the <a href=#after-doctype-public-identifier-state>after DOCTYPE public identifier state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's public identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#data-state>data
@@ -73527,8 +73655,8 @@
    Reconsume the EOF character in the <a href=#data-state>data state</a>.</dd>
 
    <dt>Anything else</dt>
-   <dd>Append the <a href=#current-input-character>current input character</a> to the current DOCTYPE
-   token's public identifier.</dd>
+   <dd>Append the <a href=#current-input-character>current input character</a> to the current
+   DOCTYPE token's public identifier.</dd>
 
   </dl><h5 id=doctype-public-identifier-(single-quoted)-state><span class=secno>10.2.4.59 </span><dfn>DOCTYPE public identifier (single-quoted) state</dfn></h5>
 
@@ -73537,6 +73665,10 @@
   <dl class=switch><dt>U+0027 APOSTROPHE (')</dt>
    <dd>Switch to the <a href=#after-doctype-public-identifier-state>after DOCTYPE public identifier state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's public identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#data-state>data
@@ -73548,8 +73680,8 @@
    Reconsume the EOF character in the <a href=#data-state>data state</a>.</dd>
 
    <dt>Anything else</dt>
-   <dd>Append the <a href=#current-input-character>current input character</a> to the current DOCTYPE
-   token's public identifier.</dd>
+   <dd>Append the <a href=#current-input-character>current input character</a> to the current
+   DOCTYPE token's public identifier.</dd>
 
   </dl><h5 id=after-doctype-public-identifier-state><span class=secno>10.2.4.60 </span><dfn>After DOCTYPE public identifier state</dfn></h5>
 
@@ -73703,6 +73835,10 @@
    <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's system identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#data-state>data
@@ -73725,6 +73861,10 @@
    <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href=#parse-error>Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's system identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#data-state>data
@@ -73787,7 +73927,9 @@
   end of the file (EOF), whichever comes first. Emit a series of
   character tokens consisting of all the characters consumed except
   the matching three character sequence at the end (if one was found
-  before the end of the file).</p>
+  before the end of the file)<!--(not needed; taken care of by the
+  tree constructor), but with any U+0000 NULL characters replaced by
+  U+FFFD REPLACEMENT CHARACTER characters-->.</p>
 
   <p>Switch to the <a href=#data-state>data state</a>.</p>
 
@@ -74982,29 +75124,44 @@
   <p>When the <a href=#insertion-mode>insertion mode</a> is "<a href=#parsing-main-inbody title="insertion
   mode: in body">in body</a>", tokens must be handled as follows:</p>
 
-  <dl class=switch><dt>A character token</dt>
+  <dl class=switch><dt>A character token that is U+0000 NULL</dt>
    <dd>
 
+    <p><a href=#parse-error>Parse error</a>. Ignore the token.</p>
+
+    <!-- The D-Link DSL-G604T ADSL router has a zero byte in its
+         configuration UI before a <frameset>, which is why U+0000 is
+         special-cased here.
+         refs: https://bugzilla.mozilla.org/show_bug.cgi?id=563526
+               http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659
+    -->
+
+   </dd>
+
+   <dt>A character token that is one of U+0009 CHARACTER TABULATION,
+   U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE
+   RETURN (CR), or U+0020 SPACE</dt>
+   <dd>
+
     <p><a href=#reconstruct-the-active-formatting-elements>Reconstruct the active formatting elements</a>, if
     any.</p>
 
     <p><a href=#insert-a-character title="insert a character">Insert the token's
     character</a> into the <a href=#current-node>current node</a>.</p>
 
-    <p>If the token is not one of U+0009 CHARACTER TABULATION, U+000A
-    LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
-    (CR), U+0020 SPACE, or U+FFFD REPLACEMENT CHARACTER, then set the
-    <a href=#frameset-ok-flag>frameset-ok flag</a> to "not ok".</p>
+   </dd>
 
-    <!-- U+FFFD REPLACEMENT CHARACTER is in this list because the
-         D-Link DSL-G604T ADSL router has a zero byte in its
-         configuration UI before a <frameset>. Zero bytes get
-         converted to U+FFFD, which (without that character in this
-         list) would mean the <frameset> would be ignored.
-         refs: https://bugzilla.mozilla.org/show_bug.cgi?id=563526
-               http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659
-    -->
+   <dt>Any other character token</dt>
+   <dd>
 
+    <p><a href=#reconstruct-the-active-formatting-elements>Reconstruct the active formatting elements</a>, if
+    any.</p>
+
+    <p><a href=#insert-a-character title="insert a character">Insert the token's
+    character</a> into the <a href=#current-node>current node</a>.</p>
+
+    <p>Set the <a href=#frameset-ok-flag>frameset-ok flag</a> to "not ok".</p>
+
    </dd>
 
    <dt>A comment token</dt>
@@ -76226,6 +76383,10 @@
     <p><a href=#insert-a-character title="insert a character">Insert the token's
     character</a> into the <a href=#current-node>current node</a>.</p>
 
+    <p class=note>This can never be a U+0000 NULL character; the
+    tokenizer converts those to U+FFFD REPLACEMENT CHARACTER
+    characters.</p>
+
    </dd>
 
    <dt>An end-of-file token</dt>
@@ -77022,8 +77183,13 @@
   <p>When the <a href=#insertion-mode>insertion mode</a> is "<a href=#parsing-main-inselect title="insertion
   mode: in select">in select</a>", tokens must be handled as follows:</p>
 
-  <dl class=switch><dt>A character token</dt>
+  <dl class=switch><dt>A character token that is U+0000 NULL</dt>
    <dd>
+    <p><a href=#parse-error>Parse error</a>. Ignore the token.</p>
+   </dd>
+
+   <dt>Any other character token</dt>
+   <dd>
     <p><a href=#insert-a-character title="insert a character">Insert the token's
     character</a> into the <a href=#current-node>current node</a>.</p>
    </dd>
@@ -77223,17 +77389,33 @@
 
     </ol></dd>
 
-   <dt>A character token</dt>
+   <dt>A character token that is U+0000 NULL</dt>
    <dd>
 
+    <p><a href=#parse-error>Parse error</a>. <a href=#insert-a-character title="insert a
+    character">Insert a U+FFFD REPLACEMENT CHARACTER character</a>
+    into the <a href=#current-node>current node</a>.</p>
+
+   </dd>
+
+   <dt>A character token that is one of U+0009 CHARACTER TABULATION,
+   U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE
+   RETURN (CR), or U+0020 SPACE</dt>
+   <dd>
+
     <p><a href=#insert-a-character title="insert a character">Insert the token's
     character</a> into the <a href=#current-node>current node</a>.</p>
 
-    <p>If the token is not one of U+0009 CHARACTER TABULATION, U+000A
-    LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
-    (CR), or U+0020 SPACE, then set the <a href=#frameset-ok-flag>frameset-ok
-    flag</a> to "not ok".</p>
+   </dd>
 
+   <dt>Any other character token</dt>
+   <dd>
+
+    <p><a href=#insert-a-character title="insert a character">Insert the token's
+    character</a> into the <a href=#current-node>current node</a>.</p>
+
+    <p>Set the <a href=#frameset-ok-flag>frameset-ok flag</a> to "not ok".</p>
+
    </dd>
 
    <dt>A comment token</dt>

Modified: source
===================================================================
--- source	2010-11-02 01:06:04 UTC (rev 5665)
+++ source	2010-11-02 02:08:52 UTC (rev 5666)
@@ -86658,12 +86658,12 @@
   motivated by a desire to increase the resilience of user agents in
   the face of naïve transcoders.</p>
 
-  <p>All U+0000 NULL characters and code points in the range U+D800 to
-  U+DFFF<!-- surrogates not allowed e.g. in UTF-8, and we don't want
-  them to suddenly turn into codepoints when they go through a UTF-16
-  pipe --> in the input must be replaced by U+FFFD REPLACEMENT
-  CHARACTERs. Any occurrences of such characters and code points are
-  <span title="parse error">parse errors</span>.</p>
+  <p>Code points in the range U+D800 to U+DFFF<!-- surrogates are not
+  allowed e.g. in UTF-8, and we don't want them to suddenly turn into
+  codepoints when they go through a UTF-16 pipe --> in the input must
+  be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of
+  such characters and code points are <span title="parse error">parse
+  errors</span>.</p>
 
   <p>Any occurrences of any characters in the ranges U+0001 to U+0008,
   <!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF,
@@ -87399,6 +87399,10 @@
    <dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <span>tag open state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Emit the <span>current input
+   character</span> as a character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -87435,6 +87439,10 @@
    <dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <span>RCDATA less-than sign state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -87467,6 +87475,10 @@
    <dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <span>RAWTEXT less-than sign state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -87486,6 +87498,10 @@
    <dt>U+003C LESS-THAN SIGN (<)</dt>
    <dd>Switch to the <span>script data less-than sign state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -87502,6 +87518,10 @@
 
   <dl class="switch">
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -87609,6 +87629,10 @@
    character</span> (add 0x0020 to the character's code point) to the
    current tag token's tag name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current tag token's tag name.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Reconsume the EOF character in the
    <span>data state</span>.</dd>
@@ -87975,6 +87999,10 @@
    <dd><p>Switch to the <span>script data escaped less-than sign
    state</span>.</p></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Reconsume the EOF character in the
    <span>data state</span>.</dd>
@@ -88000,6 +88028,11 @@
    <dd><p>Switch to the <span>script data escaped less-than sign
    state</span>.</p></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Switch to the <span>script data
+   escaped state</span>. Emit a U+FFFD REPLACEMENT CHARACTER character
+   token.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Reconsume the EOF character in the
    <span>data state</span>.</dd>
@@ -88028,6 +88061,11 @@
    <dd>Switch to the <span>script data state</span>. Emit a U+003E
    GREATER-THAN SIGN character token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Switch to the <span>script data
+   escaped state</span>. Emit a U+FFFD REPLACEMENT CHARACTER character
+   token.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Reconsume the EOF character in the
    <span>data state</span>.</dd>
@@ -88204,6 +88242,10 @@
    sign state</span>. Emit a U+003C LESS-THAN SIGN character
    token.</p></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Reconsume the EOF character in the
    <span>data state</span>.</dd>
@@ -88230,6 +88272,11 @@
    sign state</span>. Emit a U+003C LESS-THAN SIGN character
    token.</p></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Switch to the <span>script data
+   double escaped state</span>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Reconsume the EOF character in the
    <span>data state</span>.</dd>
@@ -88260,6 +88307,11 @@
    <dd>Switch to the <span>script data state</span>. Emit a U+003E
    GREATER-THAN SIGN character token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Switch to the <span>script data
+   double escaped state</span>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Reconsume the EOF character in the
    <span>data state</span>.</dd>
@@ -88354,6 +88406,12 @@
    value to the empty string. Switch to the <span>attribute name
    state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Start a new attribute in the current
+   tag token. Set that attribute's name to a U+FFFD REPLACEMENT
+   CHARACTER character, and its value to the empty string. Switch to
+   the <span>attribute name state</span>.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -88367,8 +88425,8 @@
 
    <dt>Anything else</dt>
    <dd>Start a new attribute in the current tag token. Set that
-   attribute's name to the <span>current input character</span>, and its value to
-   the empty string. Switch to the <span>attribute name
+   attribute's name to the <span>current input character</span>, and
+   its value to the empty string. Switch to the <span>attribute name
    state</span>.</dd>
 
   </dl>
@@ -88402,6 +88460,10 @@
    character</span> (add 0x0020 to the character's code point) to the
    current attribute's name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's name.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -88457,6 +88519,12 @@
    and its value to the empty string. Switch to the <span>attribute
    name state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Start a new attribute in the current
+   tag token. Set that attribute's name to a U+FFFD REPLACEMENT
+   CHARACTER character, and its value to the empty string. Switch to
+   the <span>attribute name state</span>.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -88499,6 +88567,11 @@
    <dt>U+0027 APOSTROPHE (')</dt>
    <dd>Switch to the <span>attribute value (single-quoted) state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value. Switch to the
+   <span>attribute value (unquoted) state</span>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><span>Parse error</span>. Switch to the <span>data
    state</span>. Emit the current tag token.</dd>
@@ -88536,6 +88609,10 @@
    state</span>, with the <span>additional allowed character</span>
    being U+0022 QUOTATION MARK (").</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Reconsume the EOF character in the
    <span>data state</span>.</dd>
@@ -88595,6 +88672,10 @@
    <dd>Switch to the <span>data state</span>. Emit the current tag
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (<)</dt>
@@ -88686,12 +88767,13 @@
   <p>Consume every character up to and including the first U+003E
   GREATER-THAN SIGN character (>) or the end of the file (EOF),
   whichever comes first. Emit a comment token whose data is the
-  concatenation of all the characters starting from and including
-  the character that caused the state machine to switch into the
-  bogus comment state, up to and including the character immediately
-  before the last consumed character (i.e. up to the character just
-  before the U+003E or EOF character). (If the comment was started
-  by the end of the file (EOF), the token is empty.)</p>
+  concatenation of all the characters starting from and including the
+  character that caused the state machine to switch into the bogus
+  comment state, up to and including the character immediately before
+  the last consumed character (i.e. up to the character just before
+  the U+003E or EOF character), but with any U+0000 NULL characters
+  replaced by U+FFFD REPLACEMENT CHARACTER characters. (If the comment
+  was started by the end of the file (EOF), the token is empty.)</p>
 
   <p>Switch to the <span>data state</span>.</p>
 
@@ -88734,6 +88816,11 @@
    <dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <span>comment start dash state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the comment token's data. Switch to the <span>comment
+   state</span>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><span>Parse error</span>. Switch to the <span>data
    state</span>. Emit the comment token.</dd> <!-- see comment in
@@ -88759,6 +88846,12 @@
    <dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <span>comment end state</span></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+002D HYPHEN-MINUS
+   character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <span>comment
+   state</span>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><span>Parse error</span>. Switch to the <span>data
    state</span>. Emit the comment token.</dd>
@@ -88785,6 +88878,10 @@
    <dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <span>comment end dash state</span></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the comment token's data.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Emit the comment token. Reconsume the
    EOF character in the <span>data state</span>.</dd> <!-- see comment
@@ -88806,6 +88903,12 @@
    <dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <span>comment end state</span></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+002D HYPHEN-MINUS
+   character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <span>comment
+   state</span>.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Emit the comment token. Reconsume the
    EOF character in the <span>data state</span>.</dd> <!-- see comment
@@ -88829,6 +88932,12 @@
    <dd>Switch to the <span>data state</span>. Emit the comment
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append two U+002D HYPHEN-MINUS
+   characters (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <span>comment
+   state</span>.</dd>
+
    <dt>U+0021 EXCLAMATION MARK (!)</dt>
    <dd><span>Parse error</span>. Switch to the <span>comment end bang
    state</span>.</dd>
@@ -88869,6 +88978,12 @@
    <dd>Switch to the <span>data state</span>. Emit the comment
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append two U+002D HYPHEN-MINUS
+   characters (-), a U+0021 EXCLAMATION MARK character (!), and a
+   U+FFFD REPLACEMENT CHARACTER character to the comment token's data.
+   Switch to the <span>comment state</span>.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Emit the comment token. Reconsume
    the EOF character in the <span>data state</span>.</dd> <!-- see
@@ -88927,6 +89042,11 @@
    character's code point). Switch to the <span>DOCTYPE name
    state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Set the token's name to a U+FFFD
+   REPLACEMENT CHARACTER character. Switch to the <span>DOCTYPE name
+   state</span>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><span>Parse error</span>. Create a new DOCTYPE token. Set its
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <span>data
@@ -88967,6 +89087,10 @@
    character</span> (add 0x0020 to the character's code point) to the
    current DOCTYPE token's name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's name.</dd>
+
    <dt>EOF</dt>
    <dd><span>Parse error</span>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Emit that DOCTYPE token.
@@ -89116,6 +89240,10 @@
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dd>Switch to the <span>after DOCTYPE public identifier state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's public identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><span>Parse error</span>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <span>data
@@ -89127,8 +89255,8 @@
    Reconsume the EOF character in the <span>data state</span>.</dd>
 
    <dt>Anything else</dt>
-   <dd>Append the <span>current input character</span> to the current DOCTYPE
-   token's public identifier.</dd>
+   <dd>Append the <span>current input character</span> to the current
+   DOCTYPE token's public identifier.</dd>
 
   </dl>
 
@@ -89142,6 +89270,10 @@
    <dt>U+0027 APOSTROPHE (')</dt>
    <dd>Switch to the <span>after DOCTYPE public identifier state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's public identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><span>Parse error</span>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <span>data
@@ -89153,8 +89285,8 @@
    Reconsume the EOF character in the <span>data state</span>.</dd>
 
    <dt>Anything else</dt>
-   <dd>Append the <span>current input character</span> to the current DOCTYPE
-   token's public identifier.</dd>
+   <dd>Append the <span>current input character</span> to the current
+   DOCTYPE token's public identifier.</dd>
 
   </dl>
 
@@ -89333,6 +89465,10 @@
    <dd>Switch to the <span>after DOCTYPE system identifier
    state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's system identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><span>Parse error</span>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <span>data
@@ -89360,6 +89496,10 @@
    <dd>Switch to the <span>after DOCTYPE system identifier
    state</span>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><span>Parse error</span>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's system identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (>)</dt>
    <dd><span>Parse error</span>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <span>data
@@ -89435,7 +89575,9 @@
   end of the file (EOF), whichever comes first. Emit a series of
   character tokens consisting of all the characters consumed except
   the matching three character sequence at the end (if one was found
-  before the end of the file).</p>
+  before the end of the file)<!--(not needed; taken care of by the
+  tree constructor), but with any U+0000 NULL characters replaced by
+  U+FFFD REPLACEMENT CHARACTER characters-->.</p>
 
   <p>Switch to the <span>data state</span>.</p>
 
@@ -90769,29 +90911,44 @@
 
   <dl class="switch">
 
-   <dt>A character token</dt>
+   <dt>A character token that is U+0000 NULL</dt>
    <dd>
 
+    <p><span>Parse error</span>. Ignore the token.</p>
+
+    <!-- The D-Link DSL-G604T ADSL router has a zero byte in its
+         configuration UI before a <frameset>, which is why U+0000 is
+         special-cased here.
+         refs: https://bugzilla.mozilla.org/show_bug.cgi?id=563526
+               http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659
+    -->
+
+   </dd>
+
+   <dt>A character token that is one of U+0009 CHARACTER TABULATION,
+   U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE
+   RETURN (CR), or U+0020 SPACE</dt>
+   <dd>
+
     <p><span>Reconstruct the active formatting elements</span>, if
     any.</p>
 
     <p><span title="insert a character">Insert the token's
     character</span> into the <span>current node</span>.</p>
 
-    <p>If the token is not one of U+0009 CHARACTER TABULATION, U+000A
-    LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
-    (CR), U+0020 SPACE, or U+FFFD REPLACEMENT CHARACTER, then set the
-    <span>frameset-ok flag</span> to "not ok".</p>
+   </dd>
 
-    <!-- U+FFFD REPLACEMENT CHARACTER is in this list because the
-         D-Link DSL-G604T ADSL router has a zero byte in its
-         configuration UI before a <frameset>. Zero bytes get
-         converted to U+FFFD, which (without that character in this
-         list) would mean the <frameset> would be ignored.
-         refs: https://bugzilla.mozilla.org/show_bug.cgi?id=563526
-               http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659
-    -->
+   <dt>Any other character token</dt>
+   <dd>
 
+    <p><span>Reconstruct the active formatting elements</span>, if
+    any.</p>
+
+    <p><span title="insert a character">Insert the token's
+    character</span> into the <span>current node</span>.</p>
+
+    <p>Set the <span>frameset-ok flag</span> to "not ok".</p>
+
    </dd>
 
    <dt>A comment token</dt>
@@ -92111,6 +92268,10 @@
     <p><span title="insert a character">Insert the token's
     character</span> into the <span>current node</span>.</p>
 
+    <p class="note">This can never be a U+0000 NULL character; the
+    tokenizer converts those to U+FFFD REPLACEMENT CHARACTER
+    characters.</p>
+
    </dd>
 
    <dt>An end-of-file token</dt>
@@ -92986,8 +93147,13 @@
 
   <dl class="switch">
 
-   <dt>A character token</dt>
+   <dt>A character token that is U+0000 NULL</dt>
    <dd>
+    <p><span>Parse error</span>. Ignore the token.</p>
+   </dd>
+
+   <dt>Any other character token</dt>
+   <dd>
     <p><span title="insert a character">Insert the token's
     character</span> into the <span>current node</span>.</p>
    </dd>
@@ -93206,17 +93372,33 @@
 
    </dd>
 
-   <dt>A character token</dt>
+   <dt>A character token that is U+0000 NULL</dt>
    <dd>
 
+    <p><span>Parse error</span>. <span title="insert a
+    character">Insert a U+FFFD REPLACEMENT CHARACTER character</span>
+    into the <span>current node</span>.</p>
+
+   </dd>
+
+   <dt>A character token that is one of U+0009 CHARACTER TABULATION,
+   U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE
+   RETURN (CR), or U+0020 SPACE</dt>
+   <dd>
+
     <p><span title="insert a character">Insert the token's
     character</span> into the <span>current node</span>.</p>
 
-    <p>If the token is not one of U+0009 CHARACTER TABULATION, U+000A
-    LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
-    (CR), or U+0020 SPACE, then set the <span>frameset-ok
-    flag</span> to "not ok".</p>
+   </dd>
 
+   <dt>Any other character token</dt>
+   <dd>
+
+    <p><span title="insert a character">Insert the token's
+    character</span> into the <span>current node</span>.</p>
+
+    <p>Set the <span>frameset-ok flag</span> to "not ok".</p>
+
    </dd>
 
    <dt>A comment token</dt>




More information about the Commit-Watchers mailing list