[html5] r2139 - [ct] (0) Rearchitect how RCDATA/CDATA blocks work so that they don't involve inv [...]

Tue Sep 2 02:43:08 PDT 2008

Author: ianh
Date: 2008-09-02 02:42:45 -0700 (Tue, 02 Sep 2008)
New Revision: 2139

Modified:
   index
   source
Log:
[ct] (0) Rearchitect how RCDATA/CDATA blocks work so that they don't involve invoking the tokeniser in a weird way. (credit: w)

Modified: index
===================================================================

--- index	2008-09-02 07:25:09 UTC (rev 2138)
+++ index	2008-09-02 09:42:45 UTC (rev 2139)
@@ -2071,48 +2071,51 @@
          <li><a href="#parsing-main-inbody"><span class=secno>8.2.5.10.
           </span>The "in body" insertion mode</a>
 
-         <li><a href="#parsing-main-intable"><span class=secno>8.2.5.11.
+         <li><a href="#parsing-main-incdata"><span class=secno>8.2.5.11.
+          </span>The "in CDATA/RCDATA" insertion mode</a>
+
+         <li><a href="#parsing-main-intable"><span class=secno>8.2.5.12.
           </span>The "in table" insertion mode</a>
 
-         <li><a href="#parsing-main-incaption"><span class=secno>8.2.5.12.
+         <li><a href="#parsing-main-incaption"><span class=secno>8.2.5.13.
           </span>The "in caption" insertion mode</a>
 
-         <li><a href="#parsing-main-incolgroup"><span class=secno>8.2.5.13.
+         <li><a href="#parsing-main-incolgroup"><span class=secno>8.2.5.14.
           </span>The "in column group" insertion mode</a>
 
-         <li><a href="#parsing-main-intbody"><span class=secno>8.2.5.14.
+         <li><a href="#parsing-main-intbody"><span class=secno>8.2.5.15.
           </span>The "in table body" insertion mode</a>
 
-         <li><a href="#parsing-main-intr"><span class=secno>8.2.5.15.
+         <li><a href="#parsing-main-intr"><span class=secno>8.2.5.16.
           </span>The "in row" insertion mode</a>
 
-         <li><a href="#parsing-main-intd"><span class=secno>8.2.5.16.
+         <li><a href="#parsing-main-intd"><span class=secno>8.2.5.17.
           </span>The "in cell" insertion mode</a>
 
-         <li><a href="#parsing-main-inselect"><span class=secno>8.2.5.17.
+         <li><a href="#parsing-main-inselect"><span class=secno>8.2.5.18.
           </span>The "in select" insertion mode</a>
 
          <li><a href="#parsing-main-inselectintable"><span
-          class=secno>8.2.5.18. </span>The "in select in table" insertion
+          class=secno>8.2.5.19. </span>The "in select in table" insertion
           mode</a>
 
-         <li><a href="#parsing-main-inforeign"><span class=secno>8.2.5.19.
+         <li><a href="#parsing-main-inforeign"><span class=secno>8.2.5.20.
           </span>The "in foreign content" insertion mode</a>
 
-         <li><a href="#parsing-main-afterbody"><span class=secno>8.2.5.20.
+         <li><a href="#parsing-main-afterbody"><span class=secno>8.2.5.21.
           </span>The "after body" insertion mode</a>
 
-         <li><a href="#parsing-main-inframeset"><span class=secno>8.2.5.21.
+         <li><a href="#parsing-main-inframeset"><span class=secno>8.2.5.22.
           </span>The "in frameset" insertion mode</a>
 
          <li><a href="#parsing-main-afterframeset"><span
-          class=secno>8.2.5.22. </span>The "after frameset" insertion
+          class=secno>8.2.5.23. </span>The "after frameset" insertion
           mode</a>
 
-         <li><a href="#the-after0"><span class=secno>8.2.5.23. </span>The
+         <li><a href="#the-after0"><span class=secno>8.2.5.24. </span>The
           "after after body" insertion mode</a>
 
-         <li><a href="#the-after1"><span class=secno>8.2.5.24. </span>The
+         <li><a href="#the-after1"><span class=secno>8.2.5.25. </span>The
           "after after frameset" insertion mode</a>
         </ul>
 
@@ -26746,9 +26749,25 @@
    encoding</var></dfn>. They are determined when the script is run, based on
    the attributes on the element at that time.
 
+  <p>When an <span>XML parser</span> creates a <code><a
+   href="#script1">script</a></code> element, it must be marked as being <a
+   href="#parser-inserted">"parser-inserted"</a>. When the element's end tag
+   is parsed, the user agent must <a href="#running" title="running a
+   script">run</a> the <code><a href="#script1">script</a></code> element.
+
+  <p class=note>Equivalent requirements exist for the <a href="#html-0">HTML
+   parser</a>, but they are detailed in that section instead.
+
+  <p>When a <code><a href="#script1">script</a></code> element that is marked
+   as neither having <a href="#already">"already executed"</a> nor being <a
+   href="#parser-inserted">"parser-inserted"</a> is <span>inserted into a
+   document</span><!-- XXX xref -->, the user agent must <a href="#running"
+   title="running a script">run</a> the <code><a
+   href="#script1">script</a></code> element.
+
   <p><dfn id=running title="running a script">Running a script</dfn>: When a
-   script block is <span>inserted into a document</span>, the user agent must
-   act as follows:
+   <code><a href="#script1">script</a></code> element is to be run, the user
+   agent must act as follows:
 
   <ol>
    <li>
@@ -26815,10 +26834,8 @@
      or if the user agent does not <a href="#support">support the scripting
      language</a> given by <var><a href="#the-scripts">the script's
      type</a></var> for this <code><a href="#script1">script</a></code>
-     element, or if the <code><a href="#script1">script</a></code> element
-     has its <a href="#already">"already executed"</a> flag set, then the
-     user agent must abort these steps at this point. The script is not
-     executed.</p>
+     element, then the user agent must abort these steps at this point. The
+     script is not executed.</p>
 
    <li>
     <p>The user agent must set the element's <a href="#already">"already
@@ -46921,43 +46938,52 @@
    href="#in-head0" title="insertion mode: in head noscript">in head
    noscript</a>", "<a href="#after9" title="insertion mode: after head">after
    head</a>", "<a href="#in-body" title="insertion mode: in body">in
-   body</a>", "<a href="#in-table" title="insertion mode: in table">in
-   table</a>", "<a href="#in-caption" title="insertion mode: in caption">in
-   caption</a>", "<a href="#in-column" title="insertion mode: in column
-   group">in column group</a>", "<a href="#in-table0" title="insertion mode:
-   in table body">in table body</a>", "<a href="#in-row" title="insertion
-   mode: in row">in row</a>", "<a href="#in-cell" title="insertion mode: in
-   cell">in cell</a>", "<a href="#in-select" title="insertion mode: in
-   select">in select</a>", "<a href="#in-select0" title="insertion mode: in
-   select in table">in select in table</a>", "<a href="#in-foreign"
-   title="insertion mode: in foreign content">in foreign content</a>", "<a
-   href="#after10" title="insertion mode: after body">after body</a>", "<a
-   href="#in-frameset" title="insertion mode: in frameset">in frameset</a>",
-   "<a href="#after11" title="insertion mode: after frameset">after
-   frameset</a>", "<a href="#after12" title="insertion mode: after after
-   body">after after body</a>", and "<a href="#after13" title="insertion
-   mode: after after frameset">after after frameset</a>" during the course of
-   the parsing, as described in the <a href="#tree-construction0">tree
-   construction</a> stage. The insertion mode affects how tokens are
-   processed and whether CDATA sections are supported.
+   body</a>", "<a href="#in-cdatarcdata" title="insertion mode: in
+   CDATA/RCDATA">in CDATA/RCDATA</a>", "<a href="#in-table" title="insertion
+   mode: in table">in table</a>", "<a href="#in-caption" title="insertion
+   mode: in caption">in caption</a>", "<a href="#in-column" title="insertion
+   mode: in column group">in column group</a>", "<a href="#in-table0"
+   title="insertion mode: in table body">in table body</a>", "<a
+   href="#in-row" title="insertion mode: in row">in row</a>", "<a
+   href="#in-cell" title="insertion mode: in cell">in cell</a>", "<a
+   href="#in-select" title="insertion mode: in select">in select</a>", "<a
+   href="#in-select0" title="insertion mode: in select in table">in select in
+   table</a>", "<a href="#in-foreign" title="insertion mode: in foreign
+   content">in foreign content</a>", "<a href="#after10" title="insertion
+   mode: after body">after body</a>", "<a href="#in-frameset"
+   title="insertion mode: in frameset">in frameset</a>", "<a href="#after11"
+   title="insertion mode: after frameset">after frameset</a>", "<a
+   href="#after12" title="insertion mode: after after body">after after
+   body</a>", and "<a href="#after13" title="insertion mode: after after
+   frameset">after after frameset</a>" during the course of the parsing, as
+   described in the <a href="#tree-construction0">tree construction</a>
+   stage. The insertion mode affects how tokens are processed and whether
+   CDATA sections are supported.
 
   <p>Seven of these modes, namely "<a href="#in-head" title="insertion mode:
    in head">in head</a>", "<a href="#in-body" title="insertion mode: in
-   body">in body</a>", "<a href="#in-table" title="insertion mode: in
-   table">in table</a>", "<a href="#in-table0" title="insertion mode: in
-   table body">in table body</a>", "<a href="#in-row" title="insertion mode:
-   in row">in row</a>", "<a href="#in-cell" title="insertion mode: in
-   cell">in cell</a>", and "<a href="#in-select" title="insertion mode: in
-   select">in select</a>", are special, in that the other modes defer to them
-   at various times. When the algorithm below says that the user agent is to
-   do something "<dfn id=using10>using the rules for</dfn> the <var
-   title="">m</var> insertion mode", where <var title="">m</var> is one of
-   these modes, the user agent must use the rules described under the <var
-   title="">m</var> <span>insertion mode</span>'s section, but must leave the
-   <span>insertion mode</span> unchanged unless the rules in <var
-   title="">m</var> themselves switch the <span>insertion mode</span> to a
-   new value.
+   body">in body</a>", "<a href="#in-cdatarcdata" title="insertion mode: in
+   CDATA/RCDATA">in CDATA/RCDATA</a>", "<a href="#in-table" title="insertion
+   mode: in table">in table</a>", "<a href="#in-table0" title="insertion
+   mode: in table body">in table body</a>", "<a href="#in-row"
+   title="insertion mode: in row">in row</a>", "<a href="#in-cell"
+   title="insertion mode: in cell">in cell</a>", and "<a href="#in-select"
+   title="insertion mode: in select">in select</a>", are special, in that the
+   other modes defer to them at various times. When the algorithm below says
+   that the user agent is to do something "<dfn id=using10>using the rules
+   for</dfn> the <var title="">m</var> insertion mode", where <var
+   title="">m</var> is one of these modes, the user agent must use the rules
+   described under the <var title="">m</var> <span>insertion mode</span>'s
+   section, but must leave the <span>insertion mode</span> unchanged unless
+   the rules in <var title="">m</var> themselves switch the <span>insertion
+   mode</span> to a new value.
 
+  <p>When the insertion mode is switched to "<a href="#in-cdatarcdata"
+   title="insertion mode: in CDATA/RCDATA">in CDATA/RCDATA</a>", the <dfn
+   id=original>original insertion mode</dfn> is also set. This is the
+   insertion mode to which the tree construction stage will return when the
+   corresponding end tag is parsed.
+
   <p>When the insertion mode is switched to "<a href="#in-foreign"
    title="insertion mode: in foreign content">in foreign content</a>", the
    <dfn id=secondary1>secondary insertion mode</dfn> is also set. This
@@ -46965,6 +46991,8 @@
    title="insertion mode: in foreign content">in foreign content</a>" mode to
    handle HTML (i.e. not foreign) content.
 
+  <hr>
+
   <p>When the steps below require the UA to <dfn id=reset>reset the insertion
    mode appropriately</dfn>, it means the UA must follow these steps:
 
@@ -49510,13 +49538,9 @@
 
   <ol>
    <li>
-    <p><a href="#create0">Create an element for the token</a> in the <a
-     href="#html-namespace0">HTML namespace</a>.
+    <p><a href="#insert0">Insert an HTML element</a> for the token.
 
    <li>
-    <p>Append the new element to the <a href="#current5">current node</a>.
-
-   <li>
     <p>If the algorithm that was invoked is the <a href="#generic">generic
      CDATA element parsing algorithm</a>, switch the tokeniser's <a
      href="#content4">content model flag</a> to the CDATA state; otherwise
@@ -49525,23 +49549,13 @@
      href="#content4">content model flag</a> to the RCDATA state.
 
    <li>
-    <p>Then, collect all the character tokens that the tokeniser returns
-     until it returns a token that is not a character token, or until it
-     stops tokenizing.
+    <p>Let the <a href="#original">original insertion mode</a> be the current
+     <span>insertion mode</span>.</p>
 
    <li>
-    <p>If this process resulted in a collection of character tokens, append a
-     single <code>Text</code> node, whose contents is the concatenation of
-     all those tokens' characters, to the new element node.
-
-   <li>
-    <p>The tokeniser's <a href="#content4">content model flag</a> will have
-     switched back to the PCDATA state.
-
-   <li>
-    <p>If the next token is an end tag token with the same tag name as the
-     start tag token, ignore it. Otherwise, it's an end-of-file token, and
-     this is a <a href="#parse2">parse error</a>.
+    <p>Then, switch the <span>insertion mode</span> to "<a
+     href="#in-cdatarcdata" title="insertion mode: in CDATA/RCDATA">in
+     CDATA/RCDATA</a>".
   </ol>
 
   <h5 id=closing1><span class=secno>8.2.5.2. </span>Closing elements that
@@ -50157,120 +50171,45 @@
    <dt id=scriptTag>A start tag whose tag name is "script"
 
    <dd>
-    <p><a href="#create0">Create an element for the token</a> in the <a
-     href="#html-namespace0">HTML namespace</a>.</p>
+    <ol>
+     <li>
+      <p><a href="#create0">Create an element for the token</a> in the <a
+       href="#html-namespace0">HTML namespace</a>.
 
-    <p>Mark the element as being <a
-     href="#parser-inserted">"parser-inserted"</a>. This ensures that, if the
-     script is external, any <code title=dom-document-write-HTML><a
-     href="#document.write...">document.write()</a></code> calls in the
-     script will execute in-line, instead of blowing the document away, as
-     would happen in most other cases.</p>
+     <li>
+      <p>Mark the element as being <a
+       href="#parser-inserted">"parser-inserted"</a>.</p>
 
-    <p>Switch the tokeniser's <a href="#content4">content model flag</a> to
-     the CDATA state.</p>
+      <p class=note>This ensures that, if the script is external, any <code
+       title=dom-document-write-HTML><a
+       href="#document.write...">document.write()</a></code> calls in the
+       script will execute in-line, instead of blowing the document away, as
+       would happen in most other cases. It also prevents the script from
+       executing until the end tag is seen.</p>
 
-    <p>Then, collect all the character tokens that the tokeniser returns
-     until it returns a token that is not a character token, or until it
-     stops tokenizing.</p>
+     <li>
+      <p>If the parser was originally created for the <a
+       href="#html-fragment0">HTML fragment parsing algorithm</a>, then mark
+       the <code><a href="#script1">script</a></code> element as <a
+       href="#already">"already executed"</a>. (<a href="#fragment">fragment
+       case</a>)
 
-    <p>If this process resulted in a collection of character tokens, append a
-     single <code>Text</code> node to the <code><a
-     href="#script1">script</a></code> element node whose contents is the
-     concatenation of all those tokens' characters.</p>
+     <li>
+      <p>Append the new element to the <a href="#current5">current node</a>.</p>
 
-    <p>The tokeniser's <a href="#content4">content model flag</a> will have
-     switched back to the PCDATA state.</p>
+     <li>
+      <p>Switch the tokeniser's <a href="#content4">content model flag</a> to
+       the CDATA state.
 
-    <p>If the next token is not an end tag token with the tag name "script",
-     then this is a <a href="#parse2">parse error</a>; mark the <code><a
-     href="#script1">script</a></code> element as <a href="#already">"already
-     executed"</a>. Otherwise, the token is the <code><a
-     href="#script1">script</a></code> element's end tag, so ignore it.</p>
+     <li>
+      <p>Let the <a href="#original">original insertion mode</a> be the
+       current <span>insertion mode</span>.</p>
 
-    <p>If the parser was originally created for the <a
-     href="#html-fragment0">HTML fragment parsing algorithm</a>, then mark
-     the <code><a href="#script1">script</a></code> element as <a
-     href="#already">"already executed"</a>, and skip the rest of the
-     processing described for this token (including the part below where
-     "<span title="pending external script">pending external scripts</span>"
-     are executed). (<a href="#fragment">fragment case</a>)</p>
+     <li>
+      <p>Switch the <span>insertion mode</span> to "<a href="#in-cdatarcdata"
+       title="insertion mode: in CDATA/RCDATA">in CDATA/RCDATA</a>".
+    </ol>
 
-    <p class=note>Marking the <code><a href="#script1">script</a></code>
-     element as "already executed" prevents it from executing when it is
-     inserted into the document a few paragraphs below. Thus, scripts missing
-     their end tags and scripts that were inserted using <code
-     title=dom-innerHTML-HTML><a href="#innerhtml0">innerHTML</a></code>,
-     <code title=dom-outerHTML-HTML><a
-     href="#outerhtml0">outerHTML</a></code>, or <code
-     title=dom-insertAdjacentHTML-HTML><a
-     href="#insertadjacenthtml0">insertAdjacentHTML()</a></code> aren't
-     executed.</p>
-
-    <p>Let the <var title="">old insertion point</var> have the same value as
-     the current <a href="#insertion">insertion point</a>. Let the <a
-     href="#insertion">insertion point</a> be just before the <a
-     href="#next-input">next input character</a>.</p>
-
-    <p>Append the new element to the <a href="#current5">current node</a>. <a
-     href="#running" title="running a script">Special processing occurs when
-     a <code>script</code> element is inserted into a document</a> that might
-     cause some script to execute, which might cause <a
-     href="#document.write..." title=dom-document-write-HTML>new characters
-     to be inserted into the tokeniser</a>.</p>
-
-    <p>Let the <a href="#insertion">insertion point</a> have the value of the
-     <var title="">old insertion point</var>. (In other words, restore the <a
-     href="#insertion">insertion point</a> to the value it had before the
-     previous paragraph. This value might be the "undefined" value.)</p>
-
-    <p id=scriptTagParserResumes>At this stage, if there is a <span>pending
-     external script</span>, then:</p>
-
-    <dl class=switch>
-     <dt>If the tree construction stage is <a href="#nestedParsing">being
-      called reentrantly</a>, say from a call to <code
-      title=dom-document-write-HTML><a
-      href="#document.write...">document.write()</a></code>:
-
-     <dd>
-      <p>Abort the processing of any nested invocations of the tokeniser,
-       yielding control back to the caller. (Tokenization will resume when
-       the caller returns to the "outer" tree construction stage.)
-
-     <dt>Otherwise:
-
-     <dd>
-      <p>Follow these steps:</p>
-
-      <ol>
-       <li>
-        <p>Let <var title="">the script</var> be the <span>pending external
-         script</span>. There is no longer a <span>pending external
-         script</span>.
-
-       <li>
-        <p><a href="#pause">Pause</a> until the script has <a
-         href="#completed">completed loading</a>.
-
-       <li>
-        <p>Let the <a href="#insertion">insertion point</a> be just before
-         the <a href="#next-input">next input character</a>.
-
-       <li>
-        <p><a href="#executing0" title="executing a script block">Execute the
-         script</a>.
-
-       <li>
-        <p>Let the <a href="#insertion">insertion point</a> be undefined
-         again.
-
-       <li>
-        <p>If there is once again a <span>pending external script</span>,
-         then repeat these steps from step 1.
-      </ol>
-    </dl>
-
    <dt>An end tag whose tag name is "head"
 
    <dd>
@@ -51625,7 +51564,127 @@
     </ol>
   </dl>
 
-  <h5 id=parsing-main-intable><span class=secno>8.2.5.11. </span>The "<dfn
+  <h5 id=parsing-main-incdata><span class=secno>8.2.5.11. </span>The "<dfn
+   id=in-cdatarcdata title="insertion mode: in CDATA/RCDATA">in
+   CDATA/RCDATA</dfn>" insertion mode</h5>
+
+  <p>When the <span>insertion mode</span> is "<a href="#in-cdatarcdata"
+   title="insertion mode: in CDATA/RCDATA">in CDATA/RCDATA</a>", tokens must
+   be handled as follows:
+
+  <dl class=switch>
+   <dt>A character token
+
+   <dd>
+    <p><a href="#insert" title="insert a character">Insert the token's
+     character</a> into the <a href="#current5">current node</a>.</p>
+
+   <dt>An end-of-file token
+
+   <dd> <!-- can't be the fragment case -->
+    <p><a href="#parse2">Parse error</a>.</p>
+
+    <p>If the <a href="#current5">current node</a> is a <code><a
+     href="#script1">script</a></code> element, mark the <code><a
+     href="#script1">script</a></code> element as <a href="#already">"already
+     executed"</a>.</p>
+
+    <p>Pop the <a href="#current5">current node</a> off the <a
+     href="#stack">stack of open elements</a>.</p>
+
+    <p>Switch the <span>insertion mode</span> to the <a
+     href="#original">original insertion mode</a> and reprocess the current
+     token.</p>
+
+   <dt>An end tag whose tag name is "script"
+
+   <dd>
+    <p>Let <var title="">script</var> be the <a href="#current5">current
+     node</a> (which will be a <code><a href="#script1">script</a></code>
+     element).</p>
+
+    <p>Pop the <a href="#current5">current node</a> off the <a
+     href="#stack">stack of open elements</a>.</p>
+
+    <p>Switch the <span>insertion mode</span> to the <a
+     href="#original">original insertion mode</a>.</p>
+
+    <p>Let the <var title="">old insertion point</var> have the same value as
+     the current <a href="#insertion">insertion point</a>. Let the <a
+     href="#insertion">insertion point</a> be just before the <a
+     href="#next-input">next input character</a>.</p>
+
+    <p><a href="#running" title="running a script">Run</a> the <var
+     title="">script</var>. This might cause some script to execute, which
+     might cause <a href="#document.write..."
+     title=dom-document-write-HTML>new characters to be inserted into the
+     tokeniser</a>, and might cause the tokeniser to output more tokens,
+     resulting in a <a href="#nestedParsing">reentrant invocation of the
+     parser</a>.</p>
+
+    <p>Let the <a href="#insertion">insertion point</a> have the value of the
+     <var title="">old insertion point</var>. (In other words, restore the <a
+     href="#insertion">insertion point</a> to the value it had before the
+     previous paragraph. This value might be the "undefined" value.)</p>
+
+    <p id=scriptTagParserResumes>At this stage, if there is a <span>pending
+     external script</span>, then:</p>
+
+    <dl class=switch>
+     <dt>If the tree construction stage is <a href="#nestedParsing">being
+      called reentrantly</a>, say from a call to <code
+      title=dom-document-write-HTML><a
+      href="#document.write...">document.write()</a></code>:
+
+     <dd>
+      <p>Abort the processing of any nested invocations of the tokeniser,
+       yielding control back to the caller. (Tokenization will resume when
+       the caller returns to the "outer" tree construction stage.)
+
+     <dt>Otherwise:
+
+     <dd>
+      <p>Follow these steps:</p>
+
+      <ol>
+       <li>
+        <p>Let <var title="">the script</var> be the <span>pending external
+         script</span>. There is no longer a <span>pending external
+         script</span>.
+
+       <li>
+        <p><a href="#pause">Pause</a> until the script has <a
+         href="#completed">completed loading</a>.
+
+       <li>
+        <p>Let the <a href="#insertion">insertion point</a> be just before
+         the <a href="#next-input">next input character</a>.
+
+       <li>
+        <p><a href="#executing0" title="executing a script block">Execute the
+         script</a>.
+
+       <li>
+        <p>Let the <a href="#insertion">insertion point</a> be undefined
+         again.
+
+       <li>
+        <p>If there is once again a <span>pending external script</span>,
+         then repeat these steps from step 1.
+      </ol>
+    </dl>
+
+   <dt>Any other end tag
+
+   <dd>
+    <p>Pop the <a href="#current5">current node</a> off the <a
+     href="#stack">stack of open elements</a>.</p>
+
+    <p>Switch the <span>insertion mode</span> to the <a
+     href="#original">original insertion mode</a>.</p>
+  </dl>
+
+  <h5 id=parsing-main-intable><span class=secno>8.2.5.12. </span>The "<dfn
    id=in-table title="insertion mode: in table">in table</dfn>" insertion
    mode</h5>
 
@@ -51810,7 +51869,7 @@
    href="#html">html</a></code> element after this process is a <a
    href="#fragment">fragment case</a>.
 
-  <h5 id=parsing-main-incaption><span class=secno>8.2.5.12. </span>The "<dfn
+  <h5 id=parsing-main-incaption><span class=secno>8.2.5.13. </span>The "<dfn
    id=in-caption title="insertion mode: in caption">in caption</dfn>"
    insertion mode</h5>
 
@@ -51873,7 +51932,7 @@
      <span>insertion mode</span>.</p>
   </dl>
 
-  <h5 id=parsing-main-incolgroup><span class=secno>8.2.5.13. </span>The "<dfn
+  <h5 id=parsing-main-incolgroup><span class=secno>8.2.5.14. </span>The "<dfn
    id=in-column title="insertion mode: in column group">in column
    group</dfn>" insertion mode</h5>
 
@@ -51958,7 +52017,7 @@
      href="#fragment">fragment case</a>.</p>
   </dl>
 
-  <h5 id=parsing-main-intbody><span class=secno>8.2.5.14. </span>The "<dfn
+  <h5 id=parsing-main-intbody><span class=secno>8.2.5.15. </span>The "<dfn
    id=in-table0 title="insertion mode: in table body">in table body</dfn>"
    insertion mode</h5>
 
@@ -52048,7 +52107,7 @@
    href="#html">html</a></code> element after this process is a <a
    href="#fragment">fragment case</a>.
 
-  <h5 id=parsing-main-intr><span class=secno>8.2.5.15. </span>The "<dfn
+  <h5 id=parsing-main-intr><span class=secno>8.2.5.16. </span>The "<dfn
    id=in-row title="insertion mode: in row">in row</dfn>" insertion mode</h5>
 
   <p>When the <span>insertion mode</span> is "<a href="#in-row"
@@ -52137,7 +52196,7 @@
    href="#html">html</a></code> element after this process is a <a
    href="#fragment">fragment case</a>.
 
-  <h5 id=parsing-main-intd><span class=secno>8.2.5.16. </span>The "<dfn
+  <h5 id=parsing-main-intd><span class=secno>8.2.5.17. </span>The "<dfn
    id=in-cell title="insertion mode: in cell">in cell</dfn>" insertion mode</h5>
 
   <p>When the <span>insertion mode</span> is "<a href="#in-cell"
@@ -52238,7 +52297,7 @@
    neither when the <span>insertion mode</span> is "<a href="#in-cell"
    title="insertion mode: in cell">in cell</a>".
 
-  <h5 id=parsing-main-inselect><span class=secno>8.2.5.17. </span>The "<dfn
+  <h5 id=parsing-main-inselect><span class=secno>8.2.5.18. </span>The "<dfn
    id=in-select title="insertion mode: in select">in select</dfn>" insertion
    mode</h5>
 
@@ -52360,7 +52419,7 @@
     <p><a href="#parse2">Parse error</a>. Ignore the token.</p>
   </dl>
 
-  <h5 id=parsing-main-inselectintable><span class=secno>8.2.5.18. </span>The
+  <h5 id=parsing-main-inselectintable><span class=secno>8.2.5.19. </span>The
    "<dfn id=in-select0 title="insertion mode: in select in table">in select
    in table</dfn>" insertion mode</h5>
 
@@ -52396,7 +52455,7 @@
      <span>insertion mode</span>.</p>
   </dl>
 
-  <h5 id=parsing-main-inforeign><span class=secno>8.2.5.19. </span>The "<dfn
+  <h5 id=parsing-main-inforeign><span class=secno>8.2.5.20. </span>The "<dfn
    id=in-foreign title="insertion mode: in foreign content">in foreign
    content</dfn>" insertion mode</h5>
 
@@ -52578,7 +52637,7 @@
      flag">acknowledge the token's <i>self-closing flag</i></a>.</p>
   </dl>
 
-  <h5 id=parsing-main-afterbody><span class=secno>8.2.5.20. </span>The "<dfn
+  <h5 id=parsing-main-afterbody><span class=secno>8.2.5.21. </span>The "<dfn
    id=after10 title="insertion mode: after body">after body</dfn>" insertion
    mode</h5>
 
@@ -52642,7 +52701,7 @@
      body</a>" and reprocess the token.</p>
   </dl>
 
-  <h5 id=parsing-main-inframeset><span class=secno>8.2.5.21. </span>The "<dfn
+  <h5 id=parsing-main-inframeset><span class=secno>8.2.5.22. </span>The "<dfn
    id=in-frameset title="insertion mode: in frameset">in frameset</dfn>"
    insertion mode</h5>
 
@@ -52737,7 +52796,7 @@
     <p><a href="#parse2">Parse error</a>. Ignore the token.</p>
   </dl>
 
-  <h5 id=parsing-main-afterframeset><span class=secno>8.2.5.22. </span>The
+  <h5 id=parsing-main-afterframeset><span class=secno>8.2.5.23. </span>The
    "<dfn id=after11 title="insertion mode: after frameset">after
    frameset</dfn>" insertion mode</h5>
 
@@ -52802,7 +52861,7 @@
    that do support frames but want to show the NOFRAMES content. Supporting
    the former is easy; supporting the latter is harder.
 
-  <h5 id=the-after0><span class=secno>8.2.5.23. </span>The "<dfn id=after12
+  <h5 id=the-after0><span class=secno>8.2.5.24. </span>The "<dfn id=after12
    title="insertion mode: after after body">after after body</dfn>" insertion
    mode</h5>
 
@@ -52844,7 +52903,7 @@
      body</a>" and reprocess the token.</p>
   </dl>
 
-  <h5 id=the-after1><span class=secno>8.2.5.24. </span>The "<dfn id=after13
+  <h5 id=the-after1><span class=secno>8.2.5.25. </span>The "<dfn id=after13
    title="insertion mode: after after frameset">after after frameset</dfn>"
    insertion mode</h5>
 

Modified: source
===================================================================
--- source	2008-09-02 07:25:09 UTC (rev 2138)
+++ source	2008-09-02 09:42:45 UTC (rev 2139)
@@ -24107,9 +24107,25 @@
   encoding</var></dfn>. They are determined when the script is run,
   based on the attributes on the element at that time.</p>
 
+  <p>When an <span>XML parser</span> creates a <code>script</code>
+  element, it must be marked as being
+  <span>"parser-inserted"</span>. When the element's end tag is
+  parsed, the user agent must <span title="running a
+  script">run</span> the <code>script</code> element.</p>
+
+  <p class="note">Equivalent requirements exist for the <span>HTML
+  parser</span>, but they are detailed in that section instead.</p>
+
+  <p>When a <code>script</code> element that is marked as neither
+  having <span>"already executed"</span> nor being
+  <span>"parser-inserted"</span> is <span>inserted into a
+  document</span><!-- XXX xref -->, the user agent must <span
+  title="running a script">run</span> the <code>script</code>
+  element.</p>
+
   <p><dfn title="running a script">Running a script</dfn>: When a
-  script block is <span>inserted into a document</span>, the user
-  agent must act as follows:</p>
+  <code>script</code> element is to be run, the user agent must act as
+  follows:</p>
 
   <ol>
 
@@ -24179,10 +24195,8 @@
     no need to worry about the HTML case, as the HTML parser handles
     that for us -->, or if the user agent does not <span>support the
     scripting language</span> given by <var>the script's type</var>
-    for this <code>script</code> element, or if the
-    <code>script</code> element has its <span>"already
-    executed"</span> flag set, then the user agent must abort these
-    steps at this point. The script is not executed.</p>
+    for this <code>script</code> element, then the user agent must
+    abort these steps at this point. The script is not executed.</p>
 
    </li>
 
@@ -44313,7 +44327,8 @@
   title="insertion mode: in head noscript">in head noscript</span>",
   "<span title="insertion mode: after head">after head</span>", "<span
   title="insertion mode: in body">in body</span>", "<span
-  title="insertion mode: in table">in table</span>", "<span
+  title="insertion mode: in CDATA/RCDATA">in CDATA/RCDATA</span>",
+  "<span title="insertion mode: in table">in table</span>", "<span
   title="insertion mode: in caption">in caption</span>", "<span
   title="insertion mode: in column group">in column group</span>",
   "<span title="insertion mode: in table body">in table body</span>",
@@ -44335,7 +44350,8 @@
 
   <p>Seven of these modes, namely "<span title="insertion mode: in
   head">in head</span>", "<span title="insertion mode: in body">in
-  body</span>", "<span title="insertion mode: in table">in
+  body</span>", "<span title="insertion mode: in CDATA/RCDATA">in
+  CDATA/RCDATA</span>", "<span title="insertion mode: in table">in
   table</span>", "<span title="insertion mode: in table body">in table
   body</span>", "<span title="insertion mode: in row">in row</span>",
   "<span title="insertion mode: in cell">in cell</span>", and "<span
@@ -44351,12 +44367,19 @@
   to a new value.</p>
 
   <p>When the insertion mode is switched to "<span title="insertion
+  mode: in CDATA/RCDATA">in CDATA/RCDATA</span>", the <dfn>original
+  insertion mode</dfn> is also set. This is the insertion mode to
+  which the tree construction stage will return when the corresponding
+  end tag is parsed.</p>
+
+  <p>When the insertion mode is switched to "<span title="insertion
   mode: in foreign content">in foreign content</span>", the
   <dfn>secondary insertion mode</dfn> is also set. This secondary mode
   is used within the rules for the "<span title="insertion mode: in
   foreign content">in foreign content</span>" mode to handle HTML
   (i.e. not foreign) content.</p>
 
+  <hr>
 
   <p>When the steps below require the UA to <dfn>reset the insertion
   mode appropriately</dfn>, it means the UA must follow these
@@ -46466,12 +46489,8 @@
 
   <ol>
 
-   <li><p><span>Create an element for the token</span> in the
-   <span>HTML namespace</span>.</p></li>
+   <li><p><span>Insert an HTML element</span> for the token.</p></li>
 
-   <li><p>Append the new element to the <span>current
-   node</span>.</p></li>
-
    <li><p>If the algorithm that was invoked is the <span>generic CDATA
    element parsing algorithm</span>, switch the tokeniser's
    <span>content model flag</span> to the CDATA state; otherwise the
@@ -46479,22 +46498,13 @@
    algorithm</span>, switch the tokeniser's <span>content model
    flag</span> to the RCDATA state.</p></li>
 
-   <li><p>Then, collect all the character tokens that the tokeniser
-   returns until it returns a token that is not a character token, or
-   until it stops tokenizing.</p></li>
+   <li><p>Let the <span>original insertion mode</span> be the current
+   <span>insertion mode</span>.</p>
 
-   <li><p>If this process resulted in a collection of character
-   tokens, append a single <code>Text</code> node, whose contents is
-   the concatenation of all those tokens' characters, to the new
-   element node.</p></li>
+   <li><p>Then, switch the <span>insertion mode</span> to "<span
+   title="insertion mode: in CDATA/RCDATA">in
+   CDATA/RCDATA</span>".</p></li>
 
-   <li><p>The tokeniser's <span>content model flag</span> will have
-   switched back to the PCDATA state.</p></li>
-
-   <li><p>If the next token is an end tag token with the same tag name
-   as the start tag token, ignore it. Otherwise, it's an end-of-file
-   token, and this is a <span>parse error</span>.</p></li>
-
   </ol>
 
 
@@ -46985,120 +46995,42 @@
    <dt id="scriptTag">A start tag whose tag name is "script"</dt>
    <dd>
 
-    <p><span>Create an element for the token</span> in the <span>HTML
-    namespace</span>.</p>
+    <ol>
 
-    <p>Mark the element as being
-    <span>"parser-inserted"</span>. This ensures that, if the
-    script is external, any <code
-    title="dom-document-write-HTML">document.write()</code> calls
-    in the script will execute in-line, instead of blowing the
-    document away, as would happen in most other cases.</p>
+     <li><p><span>Create an element for the token</span> in the
+     <span>HTML namespace</span>.</p></li>
 
-    <p>Switch the tokeniser's <span>content model flag</span> to
-    the CDATA state.</p>
+     <li>
 
-    <p>Then, collect all the character tokens that the tokeniser
-    returns until it returns a token that is not a character
-    token, or until it stops tokenizing.</p>
+      <p>Mark the element as being <span>"parser-inserted"</span>.</p>
 
-    <p>If this process resulted in a collection of character
-    tokens, append a single <code>Text</code> node to the
-    <code>script</code> element node whose contents is the
-    concatenation of all those tokens' characters.</p>
+      <p class="note">This ensures that, if the script is external, any
+      <code title="dom-document-write-HTML">document.write()</code>
+      calls in the script will execute in-line, instead of blowing the
+      document away, as would happen in most other cases. It also
+      prevents the script from executing until the end tag is seen.</p>
 
-    <p>The tokeniser's <span>content model flag</span> will have
-    switched back to the PCDATA state.</p>
+     </li>
 
-    <p>If the next token is not an end tag token with the tag name
-    "script", then this is a <span>parse error</span>; mark the
-    <code>script</code> element as <span>"already
-    executed"</span>. Otherwise, the token is the
-    <code>script</code> element's end tag, so ignore it.</p>
+     <li><p>If the parser was originally created for the <span>HTML
+     fragment parsing algorithm</span>, then mark the
+     <code>script</code> element as <span>"already
+     executed"</span>. (<span>fragment case</span>)</p></li>
 
-    <p>If the parser was originally created for the <span>HTML
-    fragment parsing algorithm</span>, then mark the
-    <code>script</code> element as <span>"already executed"</span>,
-    and skip the rest of the processing described for this token
-    (including the part below where "<span title="pending external
-    script">pending external scripts</span>" are
-    executed). (<span>fragment case</span>)</p>
+     <li><p>Append the new element to the <span>current node</span>.</p>
 
-    <p class="note">Marking the <code>script</code> element as
-    "already executed" prevents it from executing when it is inserted
-    into the document a few paragraphs below. Thus, scripts missing
-    their end tags and scripts that were inserted using <code
-    title="dom-innerHTML-HTML">innerHTML</code>, <code
-    title="dom-outerHTML-HTML">outerHTML</code>, or <code
-    title="dom-insertAdjacentHTML-HTML">insertAdjacentHTML()</code>
-    aren't executed.</p>
+     <li><p>Switch the tokeniser's <span>content model flag</span> to
+     the CDATA state.</p></li>
 
-    <p>Let the <var title="">old insertion point</var> have the
-    same value as the current <span>insertion point</span>. Let
-    the <span>insertion point</span> be just before the <span>next
-    input character</span>.</p>
+     <li><p>Let the <span>original insertion mode</span> be the current
+     <span>insertion mode</span>.</p>
 
-    <p>Append the new element to the <span>current node</span>.
-    <span title="running a script">Special processing occurs when
-    a <code>script</code> element is inserted into a
-    document</span> that might cause some script to execute, which
-    might cause <span title="dom-document-write-HTML">new
-    characters to be inserted into the tokeniser</span>.</p>
+     <li><p>Switch the <span>insertion mode</span> to "<span
+     title="insertion mode: in CDATA/RCDATA">in
+     CDATA/RCDATA</span>".</p></li>
 
-    <p>Let the <span>insertion point</span> have the value of the
-    <var title="">old insertion point</var>. (In other words,
-    restore the <span>insertion point</span> to the value it had
-    before the previous paragraph. This value might be the
-    "undefined" value.)</p>
+    </ol>
 
-    <p id="scriptTagParserResumes">At this stage, if there is a
-    <span>pending external script</span>, then:</p>
-
-    <dl class="switch">
-
-     <dt>If the tree construction stage is <a
-     href="#nestedParsing">being called reentrantly</a>, say from
-     a call to <code
-     title="dom-document-write-HTML">document.write()</code>:</dt>
-
-     <dd><p>Abort the processing of any nested invocations of the
-     tokeniser, yielding control back to the caller. (Tokenization
-     will resume when the caller returns to the "outer" tree
-     construction stage.)</p></dd>
-
-     <dt>Otherwise:</dt>
-
-     <dd>
-
-      <p>Follow these steps:</p>
-
-      <ol>
-
-       <li><p>Let <var title="">the script</var> be the <span>pending
-       external script</span>. There is no longer a <span>pending
-       external script</span>.</p></li>
-
-       <li><p><span>Pause</span> until the script has <span>completed
-       loading</span>.</p></li>
-
-       <li><p>Let the <span>insertion point</span> be just before the
-       <span>next input character</span>.</p></li>
-
-       <li><p><span title="executing a script block">Execute the
-       script</span>.</p></li>
-
-       <li><p>Let the <span>insertion point</span> be undefined
-       again.</p></li>
-
-       <li><p>If there is once again a <span>pending external
-       script</span>, then repeat these steps from step 1.</p></li>
-
-      </ol>
-
-     </dd>
-
-    </dl>
-
    </dd>
 
    <dt>An end tag whose tag name is "head"</dt>
@@ -48536,6 +48468,136 @@
   </dl>
 
 
+
+  <h5 id="parsing-main-incdata">The "<dfn title="insertion mode: in CDATA/RCDATA">in CDATA/RCDATA</dfn>" insertion mode</h5>
+
+  <p>When the <span>insertion mode</span> is "<span title="insertion
+  mode: in CDATA/RCDATA">in CDATA/RCDATA</span>", tokens must be
+  handled as follows:</p>
+
+  <dl class="switch">
+
+   <dt>A character token</dt>
+   <dd>
+
+    <p><span title="insert a character">Insert the token's
+    character</span> into the <span>current node</span>.</p>
+
+   </dd>
+
+   <dt>An end-of-file token</dt>
+   <dd>
+
+    <!-- can't be the fragment case -->
+    <p><span>Parse error</span>.</p>
+
+    <p>If the <span>current node</span> is a <code>script</code>
+    element, mark the <code>script</code> element as <span>"already
+    executed"</span>.</p>
+
+    <p>Pop the <span>current node</span> off the <span>stack of open
+    elements</span>.</p>
+
+    <p>Switch the <span>insertion mode</span> to the <span>original
+    insertion mode</span> and reprocess the current token.</p>
+
+   </dd>
+
+   <dt>An end tag whose tag name is "script"</dt>
+   <dd>
+
+    <p>Let <var title="">script</var> be the <span>current node</span>
+    (which will be a <code>script</code> element).</p>
+
+    <p>Pop the <span>current node</span> off the <span>stack of open
+    elements</span>.</p>
+
+    <p>Switch the <span>insertion mode</span> to the <span>original
+    insertion mode</span>.</p>
+
+    <p>Let the <var title="">old insertion point</var> have the
+    same value as the current <span>insertion point</span>. Let
+    the <span>insertion point</span> be just before the <span>next
+    input character</span>.</p>
+
+    <p><span title="running a script">Run</span> the <var
+    title="">script</var>. This might cause some script to execute,
+    which might cause <span title="dom-document-write-HTML">new
+    characters to be inserted into the tokeniser</span>, and might
+    cause the tokeniser to output more tokens, resulting in a <a
+    href="#nestedParsing">reentrant invocation of the parser</a>.</p>
+
+    <p>Let the <span>insertion point</span> have the value of the
+    <var title="">old insertion point</var>. (In other words,
+    restore the <span>insertion point</span> to the value it had
+    before the previous paragraph. This value might be the
+    "undefined" value.)</p>
+
+    <p id="scriptTagParserResumes">At this stage, if there is a
+    <span>pending external script</span>, then:</p>
+
+    <dl class="switch">
+
+     <dt>If the tree construction stage is <a
+     href="#nestedParsing">being called reentrantly</a>, say from a
+     call to <code
+     title="dom-document-write-HTML">document.write()</code>:</dt>
+
+     <dd><p>Abort the processing of any nested invocations of the
+     tokeniser, yielding control back to the caller. (Tokenization
+     will resume when the caller returns to the "outer" tree
+     construction stage.)</p></dd>
+
+
+     <dt>Otherwise:</dt>
+
+     <dd>
+
+      <p>Follow these steps:</p>
+
+      <ol>
+
+       <li><p>Let <var title="">the script</var> be the <span>pending
+       external script</span>. There is no longer a <span>pending
+       external script</span>.</p></li>
+
+       <li><p><span>Pause</span> until the script has <span>completed
+       loading</span>.</p></li>
+
+       <li><p>Let the <span>insertion point</span> be just before the
+       <span>next input character</span>.</p></li>
+
+       <li><p><span title="executing a script block">Execute the
+       script</span>.</p></li>
+
+       <li><p>Let the <span>insertion point</span> be undefined
+       again.</p></li>
+
+       <li><p>If there is once again a <span>pending external
+       script</span>, then repeat these steps from step 1.</p></li>
+
+      </ol>
+
+     </dd>
+
+    </dl>
+
+   </dd>
+
+   <dt>Any other end tag</dt>
+   <dd>
+
+    <p>Pop the <span>current node</span> off the <span>stack of open
+    elements</span>.</p>
+
+    <p>Switch the <span>insertion mode</span> to the <span>original
+    insertion mode</span>.</p>
+
+   </dd>
+
+  </dl>
+
+
   <h5 id="parsing-main-intable">The "<dfn title="insertion mode: in table">in table</dfn>" insertion mode</h5>
 
   <p>When the <span>insertion mode</span> is "<span title="insertion