[html5] r886 - /

whatwg at whatwg.org whatwg at whatwg.org
Wed Jun 13 15:34:10 PDT 2007


Author: ianh
Date: 2007-06-13 15:34:08 -0700 (Wed, 13 Jun 2007)
New Revision: 886

Modified:
   index
   source
Log:
[t] (0) Support the insane comment stuff in CDATA and RCDATA blocks

Modified: index
===================================================================
--- index	2007-06-13 21:36:19 UTC (rev 885)
+++ index	2007-06-13 22:34:08 UTC (rev 886)
@@ -32306,10 +32306,25 @@
   <p>Void elements can't have any contents (since there's no end tag, no
    content can be put between the start tag and the end tag.)
 
-  <p>CDATA elements can have <a href="#text1" title=syntax-text>text</a>, but
-   the text must not contain the two character sequence "<code></</code>"
-   (U+003C LESS-THAN SIGN, U+002F SOLIDUS).
+  <p>CDATA elements can have <a href="#text1" title=syntax-text>text</a>,
+   but:
 
+  <ul>
+   <li>The text must not contain the two character sequence "<code
+    title=""></</code>" (U+003C LESS-THAN SIGN, U+002F SOLIDUS).
+
+   <li>For every occurrence of the four character sequence "<code
+    title=""><!--</code>" (U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK,
+    U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS), there must be a corresponding
+    three-character sequence "<code title="">--></code>" (U+002D
+    HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN) whose U+003E
+    GREATER-THAN SIGN (>) character occurs later in the text than the
+    U+003C LESS-THAN SIGN (<) character of the first sequence. (This means
+    the hyphens from the "<code title=""><!--</code>" part can overlap
+    those in the "<code title="">--></code>" part, as in "<code
+    title="">&lt!--></code>".)
+  </ul>
+
   <p>RCDATA elements can have <a href="#text1" title=syntax-text>text</a> and
    <a href="#character0" title=syntax-entities>character entity
    references</a>, but the text must not contain the character U+003C
@@ -33435,7 +33450,10 @@
    id=content2>content model flag</dfn> that is set after certain tokens are
    emitted. The flag has several states: <em title="">PCDATA</em>, <em
    title="">RCDATA</em>, <em title="">CDATA</em>, and <em
-   title="">PLAINTEXT</em>. Initially it is in the PCDATA state.
+   title="">PLAINTEXT</em>. Initially it must be in the PCDATA state. In the
+   RCDATA and CDATA states, a further <dfn id=escape>escape flag</dfn> is
+   used to control the behaviour of the tokeniser. It is either true or
+   false, and initially must be set to the false state.
 
   <p>The output of the tokenisation step is a series of zero or more of the
    following tokens: DOCTYPE, start tag, end tag, comment, character,
@@ -33477,14 +33495,49 @@
 
      <dd>Otherwise: treat it as per the "anything else" entry below.
 
+     <dt>U+002D HYPHEN-MINUS (-)
+
+     <dd>
+      <p>If the <a href="#content2">content model flag</a> is set to either
+       the RCDATA state or the CDATA state, and the <a href="#escape">escape
+       flag</a> is false, and there are at least three characters before this
+       one in the input stream, and the last four characters in the input
+       stream, including this one, are U+003C LESS-THAN SIGN, U+0021
+       EXCLAMATION MARK, U+002D HYPHEN-MINUS, and U+002D HYPHEN-MINUS
+       ("<!--"), then set the <a href="#escape">escape flag</a> to true.</p>
+
+      <p>In any case, emit the input character as a character token. Stay in
+       the <a href="#data-state">data state</a>.</p>
+
      <dt>U+003C LESS-THAN SIGN (<)
 
-     <dd>When the <a href="#content2">content model flag</a> is set to a
-      state other than the PLAINTEXT state: switch to the <a
-      href="#tag-open">tag open state</a>.
+     <dd>When the <a href="#content2">content model flag</a> is set to the
+      PCDATA state: switch to the <a href="#tag-open">tag open state</a>.
 
+     <dd>When the <a href="#content2">content model flag</a> is set to either
+      the RCDATA state or the CDATA state and the <a href="#escape">escape
+      flag</a> is false: switch to the <a href="#tag-open">tag open
+      state</a>.
+
      <dd>Otherwise: treat it as per the "anything else" entry below.
 
+     <dt>U+003E GREATER-THAN SIGN (>)
+
+     <dd>
+      <p>If the <a href="#content2">content model flag</a> is set to either
+       the RCDATA state or the CDATA state, and the <a href="#escape">escape
+       flag</a> is true, and the last three characters in the input stream
+       including this one are U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS,
+       U+003E GREATER-THAN SIGN ("-->"), set the <a href="#escape">escape
+       flag</a> to false.</p>
+      <!-- no need to check
+      that there are enough characters, since you can only run into
+      this if the flag is true in the first place, which requires four
+      characters. -->
+      
+      <p>In any case, emit the input character as a character token. Stay in
+       the <a href="#data-state">data state</a>.</p>
+
      <dt>EOF
 
      <dd>Emit an end-of-file token.
@@ -34795,9 +34848,6 @@
      <ul>
       <li>Comment parsing is different.
 
-      <li>The following is considered one script block (!):
-       <pre><script><!-- document.write('</script>'); --></script></pre>
-
       <li><code title=""></br></code> and <code title=""></p></code> do
        magical things.
 

Modified: source
===================================================================
--- source	2007-06-13 21:36:19 UTC (rev 885)
+++ source	2007-06-13 22:34:08 UTC (rev 886)
@@ -29831,9 +29831,28 @@
   tag.)</p>
 
   <p>CDATA elements can have <span title="syntax-text">text</span>,
-  but the text must not contain the two character sequence
-  "<code></</code>" (U+003C LESS-THAN SIGN, U+002F SOLIDUS).</p>
+  but:</p>
 
+  <ul>
+
+   <li>The text must not contain the two character sequence "<code
+   title=""></</code>" (U+003C LESS-THAN SIGN, U+002F
+   SOLIDUS).</li>
+
+   <li>For every occurrence of the four character sequence "<code
+   title=""><!--</code>" (U+003C LESS-THAN SIGN, U+0021 EXCLAMATION
+   MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS), there must be a
+   corresponding three-character sequence "<code
+   title="">--></code>" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS,
+   U+003E GREATER-THAN SIGN) whose U+003E GREATER-THAN SIGN (>)
+   character occurs later in the text than the U+003C LESS-THAN SIGN
+   (<) character of the first sequence. (This means the hyphens
+   from the "<code title=""><!--</code>" part can overlap those in
+   the "<code title="">--></code>" part, as in "<code
+   title="">&lt!--></code>".)</li>
+
+  </ul>
+
   <p>RCDATA elements can have <span title="syntax-text">text</span>
   and <span title="syntax-entities">character entity
   references</span>, but the text must not contain the character
@@ -31026,7 +31045,11 @@
   model flag</dfn> that is set after certain tokens are emitted. The
   flag has several states: <em title="">PCDATA</em>, <em
   title="">RCDATA</em>, <em title="">CDATA</em>, and <em
-  title="">PLAINTEXT</em>. Initially it is in the PCDATA state.</p>
+  title="">PLAINTEXT</em>. Initially it must be in the PCDATA
+  state. In the RCDATA and CDATA states, a further <dfn>escape
+  flag</dfn> is used to control the behaviour of the tokeniser. It is
+  either true or false, and initially must be set to the false
+  state.</p>
 
   <p>The output of the tokenisation step is a series of zero or more
   of the following tokens: DOCTYPE, start tag, end tag, comment,
@@ -31069,13 +31092,50 @@
      state</span>.</dd>
      <dd>Otherwise: treat it as per the "anything else" entry below.</dd>
 
+     <dt>U+002D HYPHEN-MINUS (-)</dt>
+     <dd>
+
+      <p>If the <span>content model flag</span> is set to either the
+      RCDATA state or the CDATA state, and the <span>escape flag</span>
+      is false, and there are at least three characters before this
+      one in the input stream, and the last four characters in the
+      input stream, including this one, are U+003C LESS-THAN SIGN,
+      U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, and U+002D
+      HYPHEN-MINUS ("<!--"), then set the <span>escape flag</span>
+      to true.</p>
+
+      <p>In any case, emit the input character as a character
+      token. Stay in the <span>data state</span>.</p>
+
+     </dd>
+
      <dt>U+003C LESS-THAN SIGN (<)</dt>
-     <dd>When the <span>content model flag</span> is set to a state
-     other than the PLAINTEXT state: switch to the <span>tag open
-     state</span>.</dd>
+     <dd>When the <span>content model flag</span> is set to the PCDATA
+     state: switch to the <span>tag open state</span>.</dd>
+     <dd>When the <span>content model flag</span> is set to either the
+     RCDATA state or the CDATA state and the <span>escape flag</span>
+     is false: switch to the <span>tag open state</span>.</dd>
      <dd>Otherwise: treat it as per the "anything else" entry
      below.</dd>
 
+     <dt>U+003E GREATER-THAN SIGN (>)</dt>
+     <dd>
+
+      <p>If the <span>content model flag</span> is set to either the
+      RCDATA state or the CDATA state, and the <span>escape
+      flag</span> is true, and the last three characters in the input
+      stream including this one are U+002D HYPHEN-MINUS, U+002D
+      HYPHEN-MINUS, U+003E GREATER-THAN SIGN ("-->"), set the
+      <span>escape flag</span> to false.</p> <!-- no need to check
+      that there are enough characters, since you can only run into
+      this if the flag is true in the first place, which requires four
+      characters. -->
+
+      <p>In any case, emit the input character as a character
+      token. Stay in the <span>data state</span>.</p>
+
+     </dd>
+
      <dt>EOF</dt>
      <dd>Emit an end-of-file token.</dd>
 
@@ -32183,10 +32243,6 @@
 
       <li>Comment parsing is different.</li>
 
-      <li>The following is considered one script block (!):
-       <pre><script><!-- document.write('</script>'); --></script></pre>
-      </li>
-
       <li><code title=""></br></code> and <code title=""></p></code> do magical
       things.</li>
 




More information about the Commit-Watchers mailing list