[html5] r886 - /
whatwg at whatwg.org
whatwg at whatwg.org
Wed Jun 13 15:34:10 PDT 2007
Author: ianh
Date: 2007-06-13 15:34:08 -0700 (Wed, 13 Jun 2007)
New Revision: 886
Modified:
index
source
Log:
[t] (0) Support the insane comment stuff in CDATA and RCDATA blocks
Modified: index
===================================================================
--- index 2007-06-13 21:36:19 UTC (rev 885)
+++ index 2007-06-13 22:34:08 UTC (rev 886)
@@ -32306,10 +32306,25 @@
<p>Void elements can't have any contents (since there's no end tag, no
content can be put between the start tag and the end tag.)
- <p>CDATA elements can have <a href="#text1" title=syntax-text>text</a>, but
- the text must not contain the two character sequence "<code></</code>"
- (U+003C LESS-THAN SIGN, U+002F SOLIDUS).
+ <p>CDATA elements can have <a href="#text1" title=syntax-text>text</a>,
+ but:
+ <ul>
+ <li>The text must not contain the two character sequence "<code
+ title=""></</code>" (U+003C LESS-THAN SIGN, U+002F SOLIDUS).
+
+ <li>For every occurrence of the four character sequence "<code
+ title=""><!--</code>" (U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK,
+ U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS), there must be a corresponding
+ three-character sequence "<code title="">--></code>" (U+002D
+ HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN) whose U+003E
+ GREATER-THAN SIGN (>) character occurs later in the text than the
+ U+003C LESS-THAN SIGN (<) character of the first sequence. (This means
+ the hyphens from the "<code title=""><!--</code>" part can overlap
+ those in the "<code title="">--></code>" part, as in "<code
+ title=""><!--></code>".)
+ </ul>
+
<p>RCDATA elements can have <a href="#text1" title=syntax-text>text</a> and
<a href="#character0" title=syntax-entities>character entity
references</a>, but the text must not contain the character U+003C
@@ -33435,7 +33450,10 @@
id=content2>content model flag</dfn> that is set after certain tokens are
emitted. The flag has several states: <em title="">PCDATA</em>, <em
title="">RCDATA</em>, <em title="">CDATA</em>, and <em
- title="">PLAINTEXT</em>. Initially it is in the PCDATA state.
+ title="">PLAINTEXT</em>. Initially it must be in the PCDATA state. In the
+ RCDATA and CDATA states, a further <dfn id=escape>escape flag</dfn> is
+ used to control the behaviour of the tokeniser. It is either true or
+ false, and initially must be set to the false state.
<p>The output of the tokenisation step is a series of zero or more of the
following tokens: DOCTYPE, start tag, end tag, comment, character,
@@ -33477,14 +33495,49 @@
<dd>Otherwise: treat it as per the "anything else" entry below.
+ <dt>U+002D HYPHEN-MINUS (-)
+
+ <dd>
+ <p>If the <a href="#content2">content model flag</a> is set to either
+ the RCDATA state or the CDATA state, and the <a href="#escape">escape
+ flag</a> is false, and there are at least three characters before this
+ one in the input stream, and the last four characters in the input
+ stream, including this one, are U+003C LESS-THAN SIGN, U+0021
+ EXCLAMATION MARK, U+002D HYPHEN-MINUS, and U+002D HYPHEN-MINUS
+ ("<!--"), then set the <a href="#escape">escape flag</a> to true.</p>
+
+ <p>In any case, emit the input character as a character token. Stay in
+ the <a href="#data-state">data state</a>.</p>
+
<dt>U+003C LESS-THAN SIGN (<)
- <dd>When the <a href="#content2">content model flag</a> is set to a
- state other than the PLAINTEXT state: switch to the <a
- href="#tag-open">tag open state</a>.
+ <dd>When the <a href="#content2">content model flag</a> is set to the
+ PCDATA state: switch to the <a href="#tag-open">tag open state</a>.
+ <dd>When the <a href="#content2">content model flag</a> is set to either
+ the RCDATA state or the CDATA state and the <a href="#escape">escape
+ flag</a> is false: switch to the <a href="#tag-open">tag open
+ state</a>.
+
<dd>Otherwise: treat it as per the "anything else" entry below.
+ <dt>U+003E GREATER-THAN SIGN (>)
+
+ <dd>
+ <p>If the <a href="#content2">content model flag</a> is set to either
+ the RCDATA state or the CDATA state, and the <a href="#escape">escape
+ flag</a> is true, and the last three characters in the input stream
+ including this one are U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS,
+ U+003E GREATER-THAN SIGN ("-->"), set the <a href="#escape">escape
+ flag</a> to false.</p>
+ <!-- no need to check
+ that there are enough characters, since you can only run into
+ this if the flag is true in the first place, which requires four
+ characters. -->
+
+ <p>In any case, emit the input character as a character token. Stay in
+ the <a href="#data-state">data state</a>.</p>
+
<dt>EOF
<dd>Emit an end-of-file token.
@@ -34795,9 +34848,6 @@
<ul>
<li>Comment parsing is different.
- <li>The following is considered one script block (!):
- <pre><script><!-- document.write('</script>'); --></script></pre>
-
<li><code title=""></br></code> and <code title=""></p></code> do
magical things.
Modified: source
===================================================================
--- source 2007-06-13 21:36:19 UTC (rev 885)
+++ source 2007-06-13 22:34:08 UTC (rev 886)
@@ -29831,9 +29831,28 @@
tag.)</p>
<p>CDATA elements can have <span title="syntax-text">text</span>,
- but the text must not contain the two character sequence
- "<code></</code>" (U+003C LESS-THAN SIGN, U+002F SOLIDUS).</p>
+ but:</p>
+ <ul>
+
+ <li>The text must not contain the two character sequence "<code
+ title=""></</code>" (U+003C LESS-THAN SIGN, U+002F
+ SOLIDUS).</li>
+
+ <li>For every occurrence of the four character sequence "<code
+ title=""><!--</code>" (U+003C LESS-THAN SIGN, U+0021 EXCLAMATION
+ MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS), there must be a
+ corresponding three-character sequence "<code
+ title="">--></code>" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS,
+ U+003E GREATER-THAN SIGN) whose U+003E GREATER-THAN SIGN (>)
+ character occurs later in the text than the U+003C LESS-THAN SIGN
+ (<) character of the first sequence. (This means the hyphens
+ from the "<code title=""><!--</code>" part can overlap those in
+ the "<code title="">--></code>" part, as in "<code
+ title=""><!--></code>".)</li>
+
+ </ul>
+
<p>RCDATA elements can have <span title="syntax-text">text</span>
and <span title="syntax-entities">character entity
references</span>, but the text must not contain the character
@@ -31026,7 +31045,11 @@
model flag</dfn> that is set after certain tokens are emitted. The
flag has several states: <em title="">PCDATA</em>, <em
title="">RCDATA</em>, <em title="">CDATA</em>, and <em
- title="">PLAINTEXT</em>. Initially it is in the PCDATA state.</p>
+ title="">PLAINTEXT</em>. Initially it must be in the PCDATA
+ state. In the RCDATA and CDATA states, a further <dfn>escape
+ flag</dfn> is used to control the behaviour of the tokeniser. It is
+ either true or false, and initially must be set to the false
+ state.</p>
<p>The output of the tokenisation step is a series of zero or more
of the following tokens: DOCTYPE, start tag, end tag, comment,
@@ -31069,13 +31092,50 @@
state</span>.</dd>
<dd>Otherwise: treat it as per the "anything else" entry below.</dd>
+ <dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>
+
+ <p>If the <span>content model flag</span> is set to either the
+ RCDATA state or the CDATA state, and the <span>escape flag</span>
+ is false, and there are at least three characters before this
+ one in the input stream, and the last four characters in the
+ input stream, including this one, are U+003C LESS-THAN SIGN,
+ U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, and U+002D
+ HYPHEN-MINUS ("<!--"), then set the <span>escape flag</span>
+ to true.</p>
+
+ <p>In any case, emit the input character as a character
+ token. Stay in the <span>data state</span>.</p>
+
+ </dd>
+
<dt>U+003C LESS-THAN SIGN (<)</dt>
- <dd>When the <span>content model flag</span> is set to a state
- other than the PLAINTEXT state: switch to the <span>tag open
- state</span>.</dd>
+ <dd>When the <span>content model flag</span> is set to the PCDATA
+ state: switch to the <span>tag open state</span>.</dd>
+ <dd>When the <span>content model flag</span> is set to either the
+ RCDATA state or the CDATA state and the <span>escape flag</span>
+ is false: switch to the <span>tag open state</span>.</dd>
<dd>Otherwise: treat it as per the "anything else" entry
below.</dd>
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>
+
+ <p>If the <span>content model flag</span> is set to either the
+ RCDATA state or the CDATA state, and the <span>escape
+ flag</span> is true, and the last three characters in the input
+ stream including this one are U+002D HYPHEN-MINUS, U+002D
+ HYPHEN-MINUS, U+003E GREATER-THAN SIGN ("-->"), set the
+ <span>escape flag</span> to false.</p> <!-- no need to check
+ that there are enough characters, since you can only run into
+ this if the flag is true in the first place, which requires four
+ characters. -->
+
+ <p>In any case, emit the input character as a character
+ token. Stay in the <span>data state</span>.</p>
+
+ </dd>
+
<dt>EOF</dt>
<dd>Emit an end-of-file token.</dd>
@@ -32183,10 +32243,6 @@
<li>Comment parsing is different.</li>
- <li>The following is considered one script block (!):
- <pre><script><!-- document.write('</script>'); --></script></pre>
- </li>
-
<li><code title=""></br></code> and <code title=""></p></code> do magical
things.</li>
More information about the Commit-Watchers
mailing list