[html5] r4177 - [ct] (0) Remove the 'content model flag' and expand it into separate states inst [...]
whatwg at whatwg.org
whatwg at whatwg.org
Mon Oct 19 04:00:34 PDT 2009
Author: ianh
Date: 2009-10-19 04:00:31 -0700 (Mon, 19 Oct 2009)
New Revision: 4177
Modified:
complete.html
index
source
Log:
[ct] (0) Remove the 'content model flag' and expand it into separate states instead. This edit *should* have no effect on black-box conformance requirements. Please report any changes you find.
Modified: complete.html
===================================================================
--- complete.html 2009-10-19 05:52:18 UTC (rev 4176)
+++ complete.html 2009-10-19 11:00:31 UTC (rev 4177)
@@ -1052,47 +1052,65 @@
<li><a href=#tokenization><span class=secno>11.2.4 </span>Tokenization</a>
<ol>
<li><a href=#data-state><span class=secno>11.2.4.1 </span>Data state</a></li>
- <li><a href=#character-reference-in-data-state><span class=secno>11.2.4.2 </span>Character reference in data state</a></li>
- <li><a href=#tag-open-state><span class=secno>11.2.4.3 </span>Tag open state</a></li>
- <li><a href=#close-tag-open-state><span class=secno>11.2.4.4 </span>Close tag open state</a></li>
- <li><a href=#tag-name-state><span class=secno>11.2.4.5 </span>Tag name state</a></li>
- <li><a href=#before-attribute-name-state><span class=secno>11.2.4.6 </span>Before attribute name state</a></li>
- <li><a href=#attribute-name-state><span class=secno>11.2.4.7 </span>Attribute name state</a></li>
- <li><a href=#after-attribute-name-state><span class=secno>11.2.4.8 </span>After attribute name state</a></li>
- <li><a href=#before-attribute-value-state><span class=secno>11.2.4.9 </span>Before attribute value state</a></li>
- <li><a href=#attribute-value-(double-quoted)-state><span class=secno>11.2.4.10 </span>Attribute value (double-quoted) state</a></li>
- <li><a href=#attribute-value-(single-quoted)-state><span class=secno>11.2.4.11 </span>Attribute value (single-quoted) state</a></li>
- <li><a href=#attribute-value-(unquoted)-state><span class=secno>11.2.4.12 </span>Attribute value (unquoted) state</a></li>
- <li><a href=#character-reference-in-attribute-value-state><span class=secno>11.2.4.13 </span>Character reference in attribute value state</a></li>
- <li><a href=#after-attribute-value-(quoted)-state><span class=secno>11.2.4.14 </span>After attribute value (quoted) state</a></li>
- <li><a href=#self-closing-start-tag-state><span class=secno>11.2.4.15 </span>Self-closing start tag state</a></li>
- <li><a href=#bogus-comment-state><span class=secno>11.2.4.16 </span>Bogus comment state</a></li>
- <li><a href=#markup-declaration-open-state><span class=secno>11.2.4.17 </span>Markup declaration open state</a></li>
- <li><a href=#comment-start-state><span class=secno>11.2.4.18 </span>Comment start state</a></li>
- <li><a href=#comment-start-dash-state><span class=secno>11.2.4.19 </span>Comment start dash state</a></li>
- <li><a href=#comment-state><span class=secno>11.2.4.20 </span>Comment state</a></li>
- <li><a href=#comment-end-dash-state><span class=secno>11.2.4.21 </span>Comment end dash state</a></li>
- <li><a href=#comment-end-state><span class=secno>11.2.4.22 </span>Comment end state</a></li>
- <li><a href=#comment-end-bang-state><span class=secno>11.2.4.23 </span>Comment end bang state</a></li>
- <li><a href=#comment-end-space-state><span class=secno>11.2.4.24 </span>Comment end space state</a></li>
- <li><a href=#doctype-state><span class=secno>11.2.4.25 </span>DOCTYPE state</a></li>
- <li><a href=#before-doctype-name-state><span class=secno>11.2.4.26 </span>Before DOCTYPE name state</a></li>
- <li><a href=#doctype-name-state><span class=secno>11.2.4.27 </span>DOCTYPE name state</a></li>
- <li><a href=#after-doctype-name-state><span class=secno>11.2.4.28 </span>After DOCTYPE name state</a></li>
- <li><a href=#after-doctype-public-keyword-state><span class=secno>11.2.4.29 </span>After DOCTYPE public keyword state</a></li>
- <li><a href=#before-doctype-public-identifier-state><span class=secno>11.2.4.30 </span>Before DOCTYPE public identifier state</a></li>
- <li><a href=#doctype-public-identifier-(double-quoted)-state><span class=secno>11.2.4.31 </span>DOCTYPE public identifier (double-quoted) state</a></li>
- <li><a href=#doctype-public-identifier-(single-quoted)-state><span class=secno>11.2.4.32 </span>DOCTYPE public identifier (single-quoted) state</a></li>
- <li><a href=#after-doctype-public-identifier-state><span class=secno>11.2.4.33 </span>After DOCTYPE public identifier state</a></li>
- <li><a href=#between-doctype-public-and-system-identifiers-state><span class=secno>11.2.4.34 </span>Between DOCTYPE public and system identifiers state</a></li>
- <li><a href=#after-doctype-system-keyword-state><span class=secno>11.2.4.35 </span>After DOCTYPE system keyword state</a></li>
- <li><a href=#before-doctype-system-identifier-state><span class=secno>11.2.4.36 </span>Before DOCTYPE system identifier state</a></li>
- <li><a href=#doctype-system-identifier-(double-quoted)-state><span class=secno>11.2.4.37 </span>DOCTYPE system identifier (double-quoted) state</a></li>
- <li><a href=#doctype-system-identifier-(single-quoted)-state><span class=secno>11.2.4.38 </span>DOCTYPE system identifier (single-quoted) state</a></li>
- <li><a href=#after-doctype-system-identifier-state><span class=secno>11.2.4.39 </span>After DOCTYPE system identifier state</a></li>
- <li><a href=#bogus-doctype-state><span class=secno>11.2.4.40 </span>Bogus DOCTYPE state</a></li>
- <li><a href=#cdata-section-state><span class=secno>11.2.4.41 </span>CDATA section state</a></li>
- <li><a href=#tokenizing-character-references><span class=secno>11.2.4.42 </span>Tokenizing character references</a></ol></li>
+ <li><a href=#rcdata-state><span class=secno>11.2.4.2 </span>RCDATA state</a></li>
+ <li><a href=#rawtext-state><span class=secno>11.2.4.3 </span>RAWTEXT state</a></li>
+ <li><a href=#script-data-state><span class=secno>11.2.4.4 </span>Script data state</a></li>
+ <li><a href=#plaintext-state><span class=secno>11.2.4.5 </span>PLAINTEXT state</a></li>
+ <li><a href=#character-reference-in-data-state><span class=secno>11.2.4.6 </span>Character reference in data state</a></li>
+ <li><a href=#tag-open-state><span class=secno>11.2.4.7 </span>Tag open state</a></li>
+ <li><a href=#close-tag-open-state><span class=secno>11.2.4.8 </span>Close tag open state</a></li>
+ <li><a href=#tag-name-state><span class=secno>11.2.4.9 </span>Tag name state</a></li>
+ <li><a href=#rcdata-less-than-sign-state><span class=secno>11.2.4.10 </span>RCDATA less-than sign state</a></li>
+ <li><a href=#rcdata-end-tag-open-state><span class=secno>11.2.4.11 </span>RCDATA end tag open state</a></li>
+ <li><a href=#rcdata-end-tag-name-state><span class=secno>11.2.4.12 </span>RCDATA end tag name state</a></li>
+ <li><a href=#rawtext-less-than-sign-state><span class=secno>11.2.4.13 </span>RAWTEXT less-than sign state</a></li>
+ <li><a href=#rawtext-end-tag-open-state><span class=secno>11.2.4.14 </span>RAWTEXT end tag open state</a></li>
+ <li><a href=#rawtext-end-tag-name-state><span class=secno>11.2.4.15 </span>RAWTEXT end tag name state</a></li>
+ <li><a href=#script-data-less-than-sign-state><span class=secno>11.2.4.16 </span>Script data less-than sign state</a></li>
+ <li><a href=#script-data-end-tag-open-state><span class=secno>11.2.4.17 </span>Script data end tag open state</a></li>
+ <li><a href=#script-data-end-tag-name-state><span class=secno>11.2.4.18 </span>Script data end tag name state</a></li>
+ <li><a href=#script-data-escape-start-state><span class=secno>11.2.4.19 </span>Script data escape start state</a></li>
+ <li><a href=#script-data-escape-start-dash-state><span class=secno>11.2.4.20 </span>Script data escape start dash state</a></li>
+ <li><a href=#script-data-escaped-state><span class=secno>11.2.4.21 </span>Script data escaped state</a></li>
+ <li><a href=#script-data-escaped-dash-state><span class=secno>11.2.4.22 </span>Script data escaped dash state</a></li>
+ <li><a href=#script-data-escaped-dash-dash-state><span class=secno>11.2.4.23 </span>Script data escaped dash dash state</a></li>
+ <li><a href=#before-attribute-name-state><span class=secno>11.2.4.24 </span>Before attribute name state</a></li>
+ <li><a href=#attribute-name-state><span class=secno>11.2.4.25 </span>Attribute name state</a></li>
+ <li><a href=#after-attribute-name-state><span class=secno>11.2.4.26 </span>After attribute name state</a></li>
+ <li><a href=#before-attribute-value-state><span class=secno>11.2.4.27 </span>Before attribute value state</a></li>
+ <li><a href=#attribute-value-(double-quoted)-state><span class=secno>11.2.4.28 </span>Attribute value (double-quoted) state</a></li>
+ <li><a href=#attribute-value-(single-quoted)-state><span class=secno>11.2.4.29 </span>Attribute value (single-quoted) state</a></li>
+ <li><a href=#attribute-value-(unquoted)-state><span class=secno>11.2.4.30 </span>Attribute value (unquoted) state</a></li>
+ <li><a href=#character-reference-in-attribute-value-state><span class=secno>11.2.4.31 </span>Character reference in attribute value state</a></li>
+ <li><a href=#after-attribute-value-(quoted)-state><span class=secno>11.2.4.32 </span>After attribute value (quoted) state</a></li>
+ <li><a href=#self-closing-start-tag-state><span class=secno>11.2.4.33 </span>Self-closing start tag state</a></li>
+ <li><a href=#bogus-comment-state><span class=secno>11.2.4.34 </span>Bogus comment state</a></li>
+ <li><a href=#markup-declaration-open-state><span class=secno>11.2.4.35 </span>Markup declaration open state</a></li>
+ <li><a href=#comment-start-state><span class=secno>11.2.4.36 </span>Comment start state</a></li>
+ <li><a href=#comment-start-dash-state><span class=secno>11.2.4.37 </span>Comment start dash state</a></li>
+ <li><a href=#comment-state><span class=secno>11.2.4.38 </span>Comment state</a></li>
+ <li><a href=#comment-end-dash-state><span class=secno>11.2.4.39 </span>Comment end dash state</a></li>
+ <li><a href=#comment-end-state><span class=secno>11.2.4.40 </span>Comment end state</a></li>
+ <li><a href=#comment-end-bang-state><span class=secno>11.2.4.41 </span>Comment end bang state</a></li>
+ <li><a href=#comment-end-space-state><span class=secno>11.2.4.42 </span>Comment end space state</a></li>
+ <li><a href=#doctype-state><span class=secno>11.2.4.43 </span>DOCTYPE state</a></li>
+ <li><a href=#before-doctype-name-state><span class=secno>11.2.4.44 </span>Before DOCTYPE name state</a></li>
+ <li><a href=#doctype-name-state><span class=secno>11.2.4.45 </span>DOCTYPE name state</a></li>
+ <li><a href=#after-doctype-name-state><span class=secno>11.2.4.46 </span>After DOCTYPE name state</a></li>
+ <li><a href=#after-doctype-public-keyword-state><span class=secno>11.2.4.47 </span>After DOCTYPE public keyword state</a></li>
+ <li><a href=#before-doctype-public-identifier-state><span class=secno>11.2.4.48 </span>Before DOCTYPE public identifier state</a></li>
+ <li><a href=#doctype-public-identifier-(double-quoted)-state><span class=secno>11.2.4.49 </span>DOCTYPE public identifier (double-quoted) state</a></li>
+ <li><a href=#doctype-public-identifier-(single-quoted)-state><span class=secno>11.2.4.50 </span>DOCTYPE public identifier (single-quoted) state</a></li>
+ <li><a href=#after-doctype-public-identifier-state><span class=secno>11.2.4.51 </span>After DOCTYPE public identifier state</a></li>
+ <li><a href=#between-doctype-public-and-system-identifiers-state><span class=secno>11.2.4.52 </span>Between DOCTYPE public and system identifiers state</a></li>
+ <li><a href=#after-doctype-system-keyword-state><span class=secno>11.2.4.53 </span>After DOCTYPE system keyword state</a></li>
+ <li><a href=#before-doctype-system-identifier-state><span class=secno>11.2.4.54 </span>Before DOCTYPE system identifier state</a></li>
+ <li><a href=#doctype-system-identifier-(double-quoted)-state><span class=secno>11.2.4.55 </span>DOCTYPE system identifier (double-quoted) state</a></li>
+ <li><a href=#doctype-system-identifier-(single-quoted)-state><span class=secno>11.2.4.56 </span>DOCTYPE system identifier (single-quoted) state</a></li>
+ <li><a href=#after-doctype-system-identifier-state><span class=secno>11.2.4.57 </span>After DOCTYPE system identifier state</a></li>
+ <li><a href=#bogus-doctype-state><span class=secno>11.2.4.58 </span>Bogus DOCTYPE state</a></li>
+ <li><a href=#cdata-section-state><span class=secno>11.2.4.59 </span>CDATA section state</a></li>
+ <li><a href=#tokenizing-character-references><span class=secno>11.2.4.60 </span>Tokenizing character references</a></ol></li>
<li><a href=#tree-construction><span class=secno>11.2.5 </span>Tree construction</a>
<ol>
<li><a href=#creating-and-inserting-elements><span class=secno>11.2.5.1 </span>Creating and inserting elements</a></li>
@@ -9785,9 +9803,9 @@
<p>If <var title="">type</var> is <em>not</em> now an <a href=#ascii-case-insensitive>ASCII
case-insensitive</a> match for the string
"<code><a href=#text/html>text/html</a></code>", then act as if the tokenizer had emitted
- a start tag token with the tag name "pre", then set the <a href=#html-parser>HTML
- parser</a>'s <a href=#tokenization>tokenization</a> stage's <a href=#content-model-flag>content
- model flag</a> to <i title="">PLAINTEXT</i>.</p>
+ a start tag token with the tag name "pre", then switch the
+ <a href=#html-parser>HTML parser</a>'s tokenizer to the <a href=#plaintext-state>PLAINTEXT
+ state</a>.</p>
<!--
http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E...%3Ciframe%3E%3C%2Fiframe%3E%3Cscript%3Eonload%20%3D%20function%20()%20%7B%20%0D%0A%20%20var%20d%20%3D%20document.getElementsByTagName('iframe')%5B0%5D.contentDocument%3B%0D%0A%20%20d.open('image%2Fsvg%2Bxml')%3B%0D%0A%20%20d.write(%22%3Cinput%20xmlns%3D'http%3A%2F%2Fwww.w3.org%2F1999%2Fxhtml'%20value%3D'(x)html'%2F%3E%22)%3B%0D%0A%20%20d.close()%3B%0D%0A%7D%3B%3C%2Fscript%3E
@@ -55758,9 +55776,9 @@
context</a>, the user agent should <a href=#create-a-document-object>create a
<code>Document</code> object</a>, mark it as being an <a href=#html-documents title="HTML documents">HTML document</a>, create an <a href=#html-parser>HTML
parser</a>, associate it with the document, act as if the
- tokenizer had emitted a start tag token with the tag name "pre", set
- the <a href=#tokenization>tokenization</a> stage's <a href=#content-model-flag>content model
- flag</a> to <i title="">PLAINTEXT</i>, and begin to pass the stream of
+ tokenizer had emitted a start tag token with the tag name "pre",
+ switch the <a href=#html-parser>HTML parser</a>'s tokenizer to the
+ <a href=#plaintext-state>PLAINTEXT state</a>, and begin to pass the stream of
characters in the plain text document to that tokenizer.</p>
<p>The rules for how to convert the bytes of the plain text document
@@ -70362,16 +70380,13 @@
switches it to a new state (to consume the next character), or
repeats the same state (to consume the next character). Some states
have more complicated behavior and can consume several characters
- before switching to another state.</p>
+ before switching to another state. In some cases, the tokenizer
+ state is also changed by the tree construction stage.</p>
- <p>The exact behavior of certain states depends on a <dfn id=content-model-flag>content
- model flag</dfn> that is set after certain tokens are emitted. The
- flag has several states: <i title="">PCDATA</i>, <i title="">RCDATA</i>, <i title="">RAWTEXT</i>, and <i title="">PLAINTEXT</i>. Initially, it must be in the PCDATA
- state. In the RCDATA and RAWTEXT states, a further <dfn id=escape-flag>escape
- flag</dfn> is used to control the behavior of the tokenizer. It is
- either true or false, and initially must be set to the false
- state. The <a href=#insertion-mode>insertion mode</a> and the <a href=#stack-of-open-elements>stack of open
- elements</a> also affects tokenization.</p>
+ <p>The exact behavior of certain states depends on the
+ <a href=#insertion-mode>insertion mode</a> and the <a href=#stack-of-open-elements>stack of open
+ elements</a>. Certain states also use a <dfn id=temporary-buffer><var>temporary
+ buffer</var></dfn> to track progress.</p>
<p>The output of the tokenization step is a series of zero or more
of the following tokens: DOCTYPE, start tag, end tag, comment,
@@ -70390,8 +70405,8 @@
<p>When a token is emitted, it must immediately be handled by the
<a href=#tree-construction>tree construction</a> stage. The tree construction stage
- can affect the state of the <a href=#content-model-flag>content model flag</a>, and can
- insert additional characters into the stream. (For example, the
+ can affect the state of the tokenization stage, and can insert
+ additional characters into the stream. (For example, the
<code><a href=#script>script</a></code> element can result in scripts executing and
using the <a href=#dynamic-markup-insertion>dynamic markup insertion</a> APIs to insert
characters into the stream being tokenized.)</p>
@@ -70401,15 +70416,18 @@
self-closing flag">acknowledged</dfn> when it is processed by the
tree construction stage, that is a <a href=#parse-error>parse error</a>.</p>
- <p>When an end tag token is emitted, the <a href=#content-model-flag>content model
- flag</a> must be switched to the PCDATA state.</p>
-
<p>When an end tag token is emitted with attributes, that is a
<a href=#parse-error>parse error</a>.</p>
<p>When an end tag token is emitted with its <i>self-closing
flag</i> set, that is a <a href=#parse-error>parse error</a>.</p>
+ <p>An <dfn id=appropriate-end-tag-token>appropriate end tag token</dfn> is an end tag token whose
+ tag name matches the tag name of the last start tag to have been
+ emitted from this tokenizer, if any. If no start tag has been
+ emitted from this tokenizer, then no end tag token is
+ appropriate.</p>
+
<p>Before each step of the tokenizer, the user agent must first
check the <a href=#parser-pause-flag>parser pause flag</a>. If it is true, then the
tokenizer must abort the processing of any nested invocations of the
@@ -70418,187 +70436,152 @@
<p>The tokenizer state machine consists of the states defined in the
following subsections.</p>
+
<!-- Order of the lists below is supposed to be non-error then
error, by unicode, then EOF, ending with "anything else" -->
+
<h5 id=data-state><span class=secno>11.2.4.1 </span><dfn>Data state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
<dl class=switch><dt>U+0026 AMPERSAND (&)</dt>
- <dd>When the <a href=#content-model-flag>content model flag</a> is set to one of the
- PCDATA or RCDATA states and the <a href=#escape-flag>escape flag</a> is
- false: switch to the <a href=#character-reference-in-data-state>character reference in data
+ <dd>Switch to the <a href=#character-reference-in-data-state>character reference in data
state</a>.</dd>
- <dd>Otherwise: treat it as per the "anything else" entry
- below.</dd>
- <dt>U+002D HYPHEN-MINUS (-)</dt>
- <dd>
+ <dt>U+003C LESS-THAN SIGN (<)</dt>
+ <dd>Switch to the <a href=#tag-open-state>tag open state</a>.</dd>
- <p>If the <a href=#content-model-flag>content model flag</a> is set to either the
- RCDATA state or the RAWTEXT state, and the <a href=#escape-flag>escape flag</a>
- is false, and there are at least three characters before this
- one in the input stream, and the last four characters in the
- input stream, including this one, are U+003C LESS-THAN SIGN,
- U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, and U+002D
- HYPHEN-MINUS ("<!--"), then set the <a href=#escape-flag>escape flag</a>
- to true.</p>
+ <dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>In any case, emit the input character as a character
- token. Stay in the <a href=#data-state>data state</a>.</p>
+ <dt>Anything else</dt>
+ <dd>Emit the <a href=#current-input-character>current input character</a> as a character
+ token. Stay in the <a href=#data-state>data state</a>.</dd>
- </dd>
+ </dl><h5 id=rcdata-state><span class=secno>11.2.4.2 </span><dfn>RCDATA state</dfn></h5>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0026 AMPERSAND (&)</dt>
+ <dd>Switch to the <a href=#character-reference-in-data-state>character reference in data
+ state</a>.</dd>
+
<dt>U+003C LESS-THAN SIGN (<)</dt>
- <dd>When the <a href=#content-model-flag>content model flag</a> is set to the PCDATA
- state: switch to the <a href=#tag-open-state>tag open state</a>.</dd>
- <dd>When the <a href=#content-model-flag>content model flag</a> is set to either the
- RCDATA state or the RAWTEXT state, and the <a href=#escape-flag>escape flag</a>
- is false: switch to the <a href=#tag-open-state>tag open state</a>.</dd>
- <dd>Otherwise: treat it as per the "anything else" entry
- below.</dd>
+ <dd>Switch to the <a href=#rcdata-less-than-sign-state>RCDATA less-than sign state</a>.</dd>
- <dt>U+003E GREATER-THAN SIGN (>)</dt>
- <dd>
+ <dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>If the <a href=#content-model-flag>content model flag</a> is set to either the
- RCDATA state or the RAWTEXT state, and the <a href=#escape-flag>escape
- flag</a> is true, and the last three characters in the input
- stream including this one are U+002D HYPHEN-MINUS, U+002D
- HYPHEN-MINUS, U+003E GREATER-THAN SIGN ("-->"), set the
- <a href=#escape-flag>escape flag</a> to false.</p> <!-- no need to check
- that there are enough characters, since you can only run into
- this if the flag is true in the first place, which requires four
- characters. -->
+ <dt>Anything else</dt>
+ <dd>Emit the <a href=#current-input-character>current input character</a> as a character
+ token. Stay in the <a href=#rcdata-state>RCDATA state</a>.</dd>
- <p>In any case, emit the input character as a character
- token. Stay in the <a href=#data-state>data state</a>.</p>
+ </dl><h5 id=rawtext-state><span class=secno>11.2.4.3 </span><dfn>RAWTEXT state</dfn></h5>
- </dd>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+ <dl class=switch><dt>U+003C LESS-THAN SIGN (<)</dt>
+ <dd>Switch to the <a href=#rawtext-less-than-sign-state>RAWTEXT less-than sign state</a>.</dd>
+
<dt>EOF</dt>
<dd>Emit an end-of-file token.</dd>
<dt>Anything else</dt>
- <dd>Emit the input character as a character token. Stay in the
- <a href=#data-state>data state</a>.</dd>
+ <dd>Emit the <a href=#current-input-character>current input character</a> as a character
+ token. Stay in the <a href=#rawtext-state>RAWTEXT state</a>.</dd>
- </dl><h5 id=character-reference-in-data-state><span class=secno>11.2.4.2 </span><dfn>Character reference in data state</dfn></h5>
+ </dl><h5 id=script-data-state><span class=secno>11.2.4.4 </span><dfn>Script data state</dfn></h5>
- <p><i>(This cannot happen if the <a href=#content-model-flag>content model flag</a>
- is set to the RAWTEXT state.)</i></p>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
- <p>Attempt to <a href=#consume-a-character-reference>consume a character reference</a>, with no
- <a href=#additional-allowed-character>additional allowed character</a>.</p>
+ <dl class=switch><dt>U+003C LESS-THAN SIGN (<)</dt>
+ <dd>Switch to the <a href=#script-data-less-than-sign-state>script data less-than sign state</a>.</dd>
- <p>If nothing is returned, emit a U+0026 AMPERSAND character
- token.</p>
+ <dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>Otherwise, emit the character token that was returned.</p>
+ <dt>Anything else</dt>
+ <dd>Emit the <a href=#current-input-character>current input character</a> as a character
+ token. Stay in the <a href=#script-data-state>script data state</a>.</dd>
- <p>Finally, switch to the <a href=#data-state>data state</a>.</p>
+ </dl><h5 id=plaintext-state><span class=secno>11.2.4.5 </span><dfn>PLAINTEXT state</dfn></h5>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
- <h5 id=tag-open-state><span class=secno>11.2.4.3 </span><dfn>Tag open state</dfn></h5>
+ <dl class=switch><dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>The behavior of this state depends on the <a href=#content-model-flag>content model
- flag</a>.</p>
+ <dt>Anything else</dt>
+ <dd>Emit the <a href=#current-input-character>current input character</a> as a character
+ token. Stay in the <a href=#plaintext-state>PLAINTEXT state</a>.</dd>
- <dl><dt>If the <a href=#content-model-flag>content model flag</a> is set to the RCDATA
- or RAWTEXT states</dt>
+ </dl><h5 id=character-reference-in-data-state><span class=secno>11.2.4.6 </span><dfn>Character reference in data state</dfn></h5>
- <dd>
+ <p>Attempt to <a href=#consume-a-character-reference>consume a character reference</a>, with no
+ <a href=#additional-allowed-character>additional allowed character</a>.</p>
- <p>Consume the <a href=#next-input-character>next input character</a>. If it is a
- U+002F SOLIDUS character (/), switch to the <a href=#close-tag-open-state>close tag open
- state</a>. Otherwise, emit a U+003C LESS-THAN SIGN character
- token and reconsume the <a href=#current-input-character>current input character</a> in the
- <a href=#data-state>data state</a>.</p>
+ <p>If nothing is returned, emit a U+0026 AMPERSAND character
+ token.</p>
- </dd>
+ <p>Otherwise, emit the character token that was returned.</p>
- <dt>If the <a href=#content-model-flag>content model flag</a> is set to the PCDATA
- state</dt>
+ <p>Finally, switch to the <a href=#data-state>data state</a>.</p>
- <dd>
- <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+ <h5 id=tag-open-state><span class=secno>11.2.4.7 </span><dfn>Tag open state</dfn></h5>
- <dl class=switch><dt>U+0021 EXCLAMATION MARK (!)</dt>
- <dd>Switch to the <a href=#markup-declaration-open-state>markup declaration open state</a>.</dd>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
- <dt>U+002F SOLIDUS (/)</dt>
- <dd>Switch to the <a href=#close-tag-open-state>close tag open state</a>.</dd>
+ <dl class=switch><dt>U+0021 EXCLAMATION MARK (!)</dt>
+ <dd>Switch to the <a href=#markup-declaration-open-state>markup declaration open state</a>.</dd>
- <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Create a new start tag token, set its tag name to the
- lowercase version of the input character (add 0x0020 to the
- character's code point), then switch to the <a href=#tag-name-state>tag name
- state</a>. (Don't emit the token yet; further details will
- be filled in before it is emitted.)</dd>
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>Switch to the <a href=#close-tag-open-state>close tag open state</a>.</dd>
- <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
- <dd>Create a new start tag token, set its tag name to the input
- character, then switch to the <a href=#tag-name-state>tag name
- state</a>. (Don't emit the token yet; further details will
- be filled in before it is emitted.)</dd>
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new start tag token, set its tag name to the
+ lowercase version of the <a href=#current-input-character>current input character</a> (add 0x0020 to the
+ character's code point), then switch to the <a href=#tag-name-state>tag name
+ state</a>. (Don't emit the token yet; further details will
+ be filled in before it is emitted.)</dd>
- <dt>U+003E GREATER-THAN SIGN (>)</dt>
- <dd><a href=#parse-error>Parse error</a>. Emit a U+003C LESS-THAN SIGN
- character token and a U+003E GREATER-THAN SIGN character
- token. Switch to the <a href=#data-state>data state</a>.</dd>
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new start tag token, set its tag name to the
+ <a href=#current-input-character>current input character</a>, then switch to the <a href=#tag-name-state>tag
+ name state</a>. (Don't emit the token yet; further details will
+ be filled in before it is emitted.)</dd>
- <dt>U+003F QUESTION MARK (?)</dt>
- <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#bogus-comment-state>bogus
- comment state</a>.</dd>
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd><a href=#parse-error>Parse error</a>. Emit a U+003C LESS-THAN SIGN
+ character token and a U+003E GREATER-THAN SIGN character
+ token. Switch to the <a href=#data-state>data state</a>.</dd>
- <dt>Anything else</dt>
- <dd><a href=#parse-error>Parse error</a>. Emit a U+003C LESS-THAN SIGN
- character token and reconsume the <a href=#current-input-character>current input character</a> in the
- <a href=#data-state>data state</a>.</dd>
+ <dt>U+003F QUESTION MARK (?)</dt>
+ <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#bogus-comment-state>bogus
+ comment state</a>.</dd>
- </dl></dd>
+ <dt>Anything else</dt>
+ <dd><a href=#parse-error>Parse error</a>. Emit a U+003C LESS-THAN SIGN
+ character token and reconsume the <a href=#current-input-character>current input
+ character</a> in the <a href=#data-state>data state</a>.</dd>
- </dl><h5 id=close-tag-open-state><span class=secno>11.2.4.4 </span><dfn>Close tag open state</dfn></h5>
+ </dl><h5 id=close-tag-open-state><span class=secno>11.2.4.8 </span><dfn>Close tag open state</dfn></h5>
- <p>If the <a href=#content-model-flag>content model flag</a> is set to the RCDATA or
- RAWTEXT states but no start tag token has ever been emitted by this
- instance of the tokenizer (<a href=#fragment-case>fragment case</a>), or, if the
- <a href=#content-model-flag>content model flag</a> is set to the RCDATA or RAWTEXT states
- and the next few characters do not match the tag name of the last
- start tag token emitted (compared in an <a href=#ascii-case-insensitive>ASCII
- case-insensitive</a> manner), or if they do but they are not
- immediately followed by one of the following characters:</p>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
- <ul class=brief><li>U+0009 CHARACTER TABULATION</li>
- <li>U+000A LINE FEED (LF)</li>
- <li>U+000C FORM FEED (FF)</li>
- <!--<li>U+000D CARRIAGE RETURN (CR)</li>-->
- <li>U+0020 SPACE</li>
- <li>U+003E GREATER-THAN SIGN (>)</li>
- <li>U+002F SOLIDUS (/)</li>
- <li>EOF</li>
- </ul><p>...then emit a U+003C LESS-THAN SIGN character token, a U+002F
- SOLIDUS character token, and switch to the <a href=#data-state>data state</a>
- to process the <a href=#next-input-character>next input character</a>.</p>
-
- <p>Otherwise, if the <a href=#content-model-flag>content model flag</a> is set to the
- PCDATA state, or if the next few characters <em>do</em> match that tag
- name, consume the <a href=#next-input-character>next input character</a>:</p>
-
<dl class=switch><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
<dd>Create a new end tag token, set its tag name to the lowercase
- version of the input character (add 0x0020 to the character's
- code point), then switch to the <a href=#tag-name-state>tag name
+ version of the <a href=#current-input-character>current input character</a> (add 0x0020 to
+ the character's code point), then switch to the <a href=#tag-name-state>tag name
state</a>. (Don't emit the token yet; further details will be
filled in before it is emitted.)</dd>
<dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
- <dd>Create a new end tag token, set its tag name to the input
- character, then switch to the <a href=#tag-name-state>tag name state</a>. (Don't
- emit the token yet; further details will be filled in before it
- is emitted.)</dd>
+ <dd>Create a new end tag token, set its tag name to the
+ <a href=#current-input-character>current input character</a>, then switch to the <a href=#tag-name-state>tag
+ name state</a>. (Don't emit the token yet; further details will
+ be filled in before it is emitted.)</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#data-state>data
@@ -70613,7 +70596,7 @@
<dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#bogus-comment-state>bogus
comment state</a>.</dd>
- </dl><h5 id=tag-name-state><span class=secno>11.2.4.5 </span><dfn>Tag name state</dfn></h5>
+ </dl><h5 id=tag-name-state><span class=secno>11.2.4.9 </span><dfn>Tag name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -70632,27 +70615,372 @@
state</a>.</dd>
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Append the lowercase version of the <a href=#current-input-character>current input character</a>
- (add 0x0020 to the character's code point) to the current tag
- token's tag name. Stay in the <a href=#tag-name-state>tag name state</a>.</dd>
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Stay in the <a href=#tag-name-state>tag name
+ state</a>.</dd>
<dt>EOF</dt>
<dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
<a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current tag token's
- tag name. Stay in the <a href=#tag-name-state>tag name state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ tag token's tag name. Stay in the <a href=#tag-name-state>tag name state</a>.</dd>
- </dl><h5 id=before-attribute-name-state><span class=secno>11.2.4.6 </span><dfn>Before attribute name state</dfn></h5>
+ </dl><h5 id=rcdata-less-than-sign-state><span class=secno>11.2.4.10 </span><dfn>RCDATA less-than sign state</dfn></h5>
+ <!-- identical to the RAWTEXT less-than sign state, except s/RAWTEXT/RCDATA/g -->
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
+ <dl class=switch><dt>U+002F SOLIDUS (/)</dt>
+ <dd>Set the <var><a href=#temporary-buffer>temporary buffer</a></var> to the empty string. Switch
+ to the <a href=#rcdata-end-tag-open-state>RCDATA end tag open state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and reconsume the
+ <a href=#current-input-character>current input character</a> in the <a href=#rcdata-state>RCDATA
+ state</a>.</dd>
+
+ </dl><h5 id=rcdata-end-tag-open-state><span class=secno>11.2.4.11 </span><dfn>RCDATA end tag open state</dfn></h5>
+ <!-- identical to the RAWTEXT (and Script data) end tag open state, except s/RAWTEXT/RCDATA/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ lowercase version of the <a href=#current-input-character>current input character</a> (add
+ 0x0020 to the character's code point). Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#rcdata-end-tag-name-state>RCDATA end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ <a href=#current-input-character>current input character</a>. Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#rcdata-end-tag-name-state>RCDATA end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, and reconsume the <a href=#current-input-character>current input
+ character</a> in the <a href=#rcdata-state>RCDATA state</a>.</dd>
+
+ </dl><h5 id=rcdata-end-tag-name-state><span class=secno>11.2.4.12 </span><dfn>RCDATA end tag name state</dfn></h5>
+ <!-- identical to the RAWTEXT (and Script data) end tag name state, except s/RAWTEXT/RCDATA/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
<dl class=switch><dt>U+0009 CHARACTER TABULATION</dt>
<dt>U+000A LINE FEED (LF)</dt>
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#before-attribute-name-state>before attribute name
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#self-closing-start-tag-state>self-closing start tag
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then emit the current tag token and switch to the
+ <a href=#data-state>data state</a>. Otherwise, treat it as per the "anything
+ else" entry below.</dd>
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#rcdata-end-tag-name-state>RCDATA end tag name state</a>.</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#rcdata-end-tag-name-state>RCDATA end tag name state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, a character token for each of the characters in
+ the <var><a href=#temporary-buffer>temporary buffer</a></var> (in the order they were added to
+ the buffer), and reconsume the <a href=#current-input-character>current input character</a>
+ in the <a href=#rcdata-state>RCDATA state</a>.</dd>
+
+ </dl><h5 id=rawtext-less-than-sign-state><span class=secno>11.2.4.13 </span><dfn>RAWTEXT less-than sign state</dfn></h5>
+ <!-- identical to the RCDATA less-than sign state, except s/RCDATA/RAWTEXT/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002F SOLIDUS (/)</dt>
+ <dd>Set the <var><a href=#temporary-buffer>temporary buffer</a></var> to the empty string. Switch
+ to the <a href=#rawtext-end-tag-open-state>RAWTEXT end tag open state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and reconsume the
+ <a href=#current-input-character>current input character</a> in the <a href=#rawtext-state>RAWTEXT
+ state</a>.</dd>
+
+ </dl><h5 id=rawtext-end-tag-open-state><span class=secno>11.2.4.14 </span><dfn>RAWTEXT end tag open state</dfn></h5>
+ <!-- identical to the RCDATA (and Script data) end tag open state, except s/RCDATA/RAWTEXT/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ lowercase version of the <a href=#current-input-character>current input character</a> (add
+ 0x0020 to the character's code point). Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#rawtext-end-tag-name-state>RAWTEXT end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ <a href=#current-input-character>current input character</a>. Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#rawtext-end-tag-name-state>RAWTEXT end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, and reconsume the <a href=#current-input-character>current input
+ character</a> in the <a href=#rawtext-state>RAWTEXT state</a>.</dd>
+
+ </dl><h5 id=rawtext-end-tag-name-state><span class=secno>11.2.4.15 </span><dfn>RAWTEXT end tag name state</dfn></h5>
+ <!-- identical to the RCDATA (and Script data) end tag name state, except s/RCDATA/RAWTEXT/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0009 CHARACTER TABULATION</dt>
+ <dt>U+000A LINE FEED (LF)</dt>
+ <dt>U+000C FORM FEED (FF)</dt>
+ <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+ <dt>U+0020 SPACE</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#before-attribute-name-state>before attribute name
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#self-closing-start-tag-state>self-closing start tag
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then emit the current tag token and switch to the
+ <a href=#data-state>data state</a>. Otherwise, treat it as per the "anything
+ else" entry below.</dd>
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#rawtext-end-tag-name-state>RAWTEXT end tag name state</a>.</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#rawtext-end-tag-name-state>RAWTEXT end tag name state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, a character token for each of the characters in
+ the <var><a href=#temporary-buffer>temporary buffer</a></var> (in the order they were added to
+ the buffer), and reconsume the <a href=#current-input-character>current input character</a>
+ in the <a href=#rawtext-state>RAWTEXT state</a>.</dd>
+
+ </dl><h5 id=script-data-less-than-sign-state><span class=secno>11.2.4.16 </span><dfn>Script data less-than sign state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002F SOLIDUS (/)</dt>
+ <dd>Set the <var><a href=#temporary-buffer>temporary buffer</a></var> to the empty string. Switch
+ to the <a href=#script-data-end-tag-open-state>script data end tag open state</a>.</dd>
+
+ <dt>U+0021 EXCLAMATION MARK (!)</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and a U+0021
+ EXCLAMATION MARK character token. Switch to the <a href=#script-data-escape-start-state>script data
+ escape start state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and reconsume the
+ <a href=#current-input-character>current input character</a> in the <a href=#script-data-state>script data
+ state</a>.</dd>
+
+ </dl><h5 id=script-data-end-tag-open-state><span class=secno>11.2.4.17 </span><dfn>Script data end tag open state</dfn></h5>
+ <!-- identical to the RCDATA (and RAWTEXT) end tag open state, except s/RCDATA/Script data/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ lowercase version of the <a href=#current-input-character>current input character</a> (add
+ 0x0020 to the character's code point). Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#script-data-end-tag-name-state>script data end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ <a href=#current-input-character>current input character</a>. Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#script-data-end-tag-name-state>script data end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, and reconsume the <a href=#current-input-character>current input
+ character</a> in the <a href=#script-data-state>script data state</a>.</dd>
+
+ </dl><h5 id=script-data-end-tag-name-state><span class=secno>11.2.4.18 </span><dfn>Script data end tag name state</dfn></h5>
+ <!-- identical to the RCDATA (and RAWTEXT) end tag name state, except s/RCDATA/Script data/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0009 CHARACTER TABULATION</dt>
+ <dt>U+000A LINE FEED (LF)</dt>
+ <dt>U+000C FORM FEED (FF)</dt>
+ <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+ <dt>U+0020 SPACE</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#before-attribute-name-state>before attribute name
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#self-closing-start-tag-state>self-closing start tag
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then emit the current tag token and switch to the
+ <a href=#data-state>data state</a>. Otherwise, treat it as per the "anything
+ else" entry below.</dd>
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#script-data-end-tag-name-state>Script data end tag name state</a>.</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#script-data-end-tag-name-state>Script data end tag name state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, a character token for each of the characters in
+ the <var><a href=#temporary-buffer>temporary buffer</a></var> (in the order they were added to
+ the buffer), and reconsume the <a href=#current-input-character>current input character</a>
+ in the <a href=#script-data-state>script data state</a>.</dd>
+
+ </dl><h5 id=script-data-escape-start-state><span class=secno>11.2.4.19 </span><dfn>Script data escape start state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <a href=#script-data-escape-start-dash-state>script data escape start dash state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Reconsume the <a href=#current-input-character>current input character</a> in the
+ <a href=#script-data-state>script data state</a>.</dd>
+
+ </dl><h5 id=script-data-escape-start-dash-state><span class=secno>11.2.4.20 </span><dfn>Script data escape start dash state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <a href=#script-data-escaped-dash-dash-state>script data escaped dash dash state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Reconsume the <a href=#current-input-character>current input character</a> in the
+ <a href=#script-data-state>script data state</a>.</dd>
+
+ </dl><h5 id=script-data-escaped-state><span class=secno>11.2.4.21 </span><dfn>Script data escaped state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <a href=#script-data-escaped-dash-state>script data escaped dash state</a>.</dd>
+
+ <dt>EOF</dt>
+ <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
+ <a href=#data-state>data state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit the current input character as a character token. Stay in
+ the <a href=#script-data-escaped-state>script data escaped state</a>.</dd>
+
+ </dl><h5 id=script-data-escaped-dash-state><span class=secno>11.2.4.22 </span><dfn>Script data escaped dash state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <a href=#script-data-escaped-dash-dash-state>script data escaped dash dash state</a>.</dd>
+
+ <dt>EOF</dt>
+ <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
+ <a href=#data-state>data state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit the current input character as a character token. Switch
+ to the <a href=#script-data-escaped-state>script data escaped state</a>.</dd>
+
+ </dl><h5 id=script-data-escaped-dash-dash-state><span class=secno>11.2.4.23 </span><dfn>Script data escaped dash dash state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Stay in the
+ <a href=#script-data-escaped-dash-dash-state>script data escaped dash dash state</a>.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>Emit a U+003E GREATER-THAN SIGN character token. Switch to the
+ <a href=#script-data-state>script data state</a>.</dd>
+
+ <dt>EOF</dt>
+ <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
+ <a href=#data-state>data state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit the current input character as a character token. Switch
+ to the <a href=#script-data-escaped-state>script data escaped state</a>.</dd>
+
+ </dl><h5 id=before-attribute-name-state><span class=secno>11.2.4.24 </span><dfn>Before attribute name state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0009 CHARACTER TABULATION</dt>
+ <dt>U+000A LINE FEED (LF)</dt>
+ <dt>U+000C FORM FEED (FF)</dt>
+ <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+ <dt>U+0020 SPACE</dt>
<dd>Stay in the <a href=#before-attribute-name-state>before attribute name state</a>.</dd>
<dt>U+002F SOLIDUS (/)</dt>
@@ -70686,7 +71014,7 @@
the empty string. Switch to the <a href=#attribute-name-state>attribute name
state</a>.</dd>
- </dl><h5 id=attribute-name-state><span class=secno>11.2.4.7 </span><dfn>Attribute name state</dfn></h5>
+ </dl><h5 id=attribute-name-state><span class=secno>11.2.4.25 </span><dfn>Attribute name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -70708,9 +71036,9 @@
state</a>.</dd>
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Append the lowercase version of the <a href=#current-input-character>current input character</a>
- (add 0x0020 to the character's code point) to the current
- attribute's name. Stay in the <a href=#attribute-name-state>attribute name
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current attribute's name. Stay in the <a href=#attribute-name-state>attribute name
state</a>.</dd>
<dt>U+0022 QUOTATION MARK (")</dt>
@@ -70724,8 +71052,9 @@
<a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current attribute's
- name. Stay in the <a href=#attribute-name-state>attribute name state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ attribute's name. Stay in the <a href=#attribute-name-state>attribute name
+ state</a>.</dd>
</dl><p>When the user agent leaves the attribute name state (and before
emitting the tag token, if appropriate), the complete attribute's
@@ -70736,7 +71065,7 @@
associated with it (if any).</p>
- <h5 id=after-attribute-name-state><span class=secno>11.2.4.8 </span><dfn>After attribute name state</dfn></h5>
+ <h5 id=after-attribute-name-state><span class=secno>11.2.4.26 </span><dfn>After attribute name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -70759,10 +71088,10 @@
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
<dd>Start a new attribute in the current tag token. Set that
- attribute's name to the lowercase version of the <a href=#current-input-character>current input character</a>
- (add 0x0020 to the character's code point), and its value to
- the empty string. Switch to the <a href=#attribute-name-state>attribute name
- state</a>.</dd>
+ attribute's name to the lowercase version of the <a href=#current-input-character>current
+ input character</a> (add 0x0020 to the character's code point),
+ and its value to the empty string. Switch to the <a href=#attribute-name-state>attribute
+ name state</a>.</dd>
<dt>U+0022 QUOTATION MARK (")</dt>
<dt>U+0027 APOSTROPHE (')</dt>
@@ -70776,11 +71105,11 @@
<dt>Anything else</dt>
<dd>Start a new attribute in the current tag token. Set that
- attribute's name to the <a href=#current-input-character>current input character</a>, and its value to
- the empty string. Switch to the <a href=#attribute-name-state>attribute name
+ attribute's name to the <a href=#current-input-character>current input character</a>, and
+ its value to the empty string. Switch to the <a href=#attribute-name-state>attribute name
state</a>.</dd>
- </dl><h5 id=before-attribute-value-state><span class=secno>11.2.4.9 </span><dfn>Before attribute value state</dfn></h5>
+ </dl><h5 id=before-attribute-value-state><span class=secno>11.2.4.27 </span><dfn>Before attribute value state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -70796,7 +71125,7 @@
<dt>U+0026 AMPERSAND (&)</dt>
<dd>Switch to the <a href=#attribute-value-(unquoted)-state>attribute value (unquoted) state</a>
- and reconsume this input character.</dd>
+ and reconsume this <a href=#current-input-character>current input character</a>.</dd>
<dt>U+0027 APOSTROPHE (')</dt>
<dd>Switch to the <a href=#attribute-value-(single-quoted)-state>attribute value (single-quoted) state</a>.</dd>
@@ -70820,7 +71149,7 @@
attribute's value. Switch to the <a href=#attribute-value-(unquoted)-state>attribute value (unquoted)
state</a>.</dd>
- </dl><h5 id=attribute-value-(double-quoted)-state><span class=secno>11.2.4.10 </span><dfn>Attribute value (double-quoted) state</dfn></h5>
+ </dl><h5 id=attribute-value-(double-quoted)-state><span class=secno>11.2.4.28 </span><dfn>Attribute value (double-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -70838,11 +71167,11 @@
<a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current attribute's
- value. Stay in the <a href=#attribute-value-(double-quoted)-state>attribute value (double-quoted)
- state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ attribute's value. Stay in the <a href=#attribute-value-(double-quoted)-state>attribute value
+ (double-quoted) state</a>.</dd>
- </dl><h5 id=attribute-value-(single-quoted)-state><span class=secno>11.2.4.11 </span><dfn>Attribute value (single-quoted) state</dfn></h5>
+ </dl><h5 id=attribute-value-(single-quoted)-state><span class=secno>11.2.4.29 </span><dfn>Attribute value (single-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -70860,11 +71189,11 @@
<a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current attribute's
- value. Stay in the <a href=#attribute-value-(single-quoted)-state>attribute value (single-quoted)
- state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ attribute's value. Stay in the <a href=#attribute-value-(single-quoted)-state>attribute value
+ (single-quoted) state</a>.</dd>
- </dl><h5 id=attribute-value-(unquoted)-state><span class=secno>11.2.4.12 </span><dfn>Attribute value (unquoted) state</dfn></h5>
+ </dl><h5 id=attribute-value-(unquoted)-state><span class=secno>11.2.4.30 </span><dfn>Attribute value (unquoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -70897,11 +71226,11 @@
<a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current attribute's
- value. Stay in the <a href=#attribute-value-(unquoted)-state>attribute value (unquoted)
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ attribute's value. Stay in the <a href=#attribute-value-(unquoted)-state>attribute value (unquoted)
state</a>.</dd>
- </dl><h5 id=character-reference-in-attribute-value-state><span class=secno>11.2.4.13 </span><dfn>Character reference in attribute value state</dfn></h5>
+ </dl><h5 id=character-reference-in-attribute-value-state><span class=secno>11.2.4.31 </span><dfn>Character reference in attribute value state</dfn></h5>
<p>Attempt to <a href=#consume-a-character-reference>consume a character reference</a>.</p>
@@ -70915,7 +71244,7 @@
in when were switched into this state.</p>
- <h5 id=after-attribute-value-(quoted)-state><span class=secno>11.2.4.14 </span><dfn>After attribute value (quoted) state</dfn></h5>
+ <h5 id=after-attribute-value-(quoted)-state><span class=secno>11.2.4.32 </span><dfn>After attribute value (quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -70941,7 +71270,7 @@
<dd><a href=#parse-error>Parse error</a>. Reconsume the character in
the <a href=#before-attribute-name-state>before attribute name state</a>.</dd>
- </dl><h5 id=self-closing-start-tag-state><span class=secno>11.2.4.15 </span><dfn>Self-closing start tag state</dfn></h5>
+ </dl><h5 id=self-closing-start-tag-state><span class=secno>11.2.4.33 </span><dfn>Self-closing start tag state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -70958,11 +71287,8 @@
<dd><a href=#parse-error>Parse error</a>. Reconsume the character in
the <a href=#before-attribute-name-state>before attribute name state</a>.</dd>
- </dl><h5 id=bogus-comment-state><span class=secno>11.2.4.16 </span><dfn>Bogus comment state</dfn></h5>
+ </dl><h5 id=bogus-comment-state><span class=secno>11.2.4.34 </span><dfn>Bogus comment state</dfn></h5>
- <p><i>(This can only happen if the <a href=#content-model-flag>content model
- flag</a> is set to the PCDATA state.)</i></p>
-
<p>Consume every character up to and including the first U+003E
GREATER-THAN SIGN character (>) or the end of the file (EOF),
whichever comes first. Emit a comment token whose data is the
@@ -70979,11 +71305,8 @@
character.</p>
- <h5 id=markup-declaration-open-state><span class=secno>11.2.4.17 </span><dfn>Markup declaration open state</dfn></h5>
+ <h5 id=markup-declaration-open-state><span class=secno>11.2.4.35 </span><dfn>Markup declaration open state</dfn></h5>
- <p><i>(This can only happen if the <a href=#content-model-flag>content model
- flag</a> is set to the PCDATA state.)</i></p>
-
<p>If the next two characters are both U+002D HYPHEN-MINUS (-)
characters, consume those two characters, create a comment token
whose data is the empty string, and switch to the <a href=#comment-start-state>comment
@@ -71007,7 +71330,7 @@
comment.</p>
- <h5 id=comment-start-state><span class=secno>11.2.4.18 </span><dfn>Comment start state</dfn></h5>
+ <h5 id=comment-start-state><span class=secno>11.2.4.36 </span><dfn>Comment start state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71024,10 +71347,10 @@
the EOF character in the <a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the input character to the comment token's
- data. Switch to the <a href=#comment-state>comment state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the comment
+ token's data. Switch to the <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=comment-start-dash-state><span class=secno>11.2.4.19 </span><dfn>Comment start dash state</dfn></h5>
+ </dl><h5 id=comment-start-dash-state><span class=secno>11.2.4.37 </span><dfn>Comment start dash state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71044,11 +71367,11 @@
in comment end state -->
<dt>Anything else</dt>
- <dd>Append a U+002D HYPHEN-MINUS character (-) and the input
- character to the comment token's data. Switch to the
- <a href=#comment-state>comment state</a>.</dd>
+ <dd>Append a U+002D HYPHEN-MINUS character (-) and the
+ <a href=#current-input-character>current input character</a> to the comment token's
+ data. Switch to the <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=comment-state><span class=secno>11.2.4.20 </span><dfn id=comment>Comment state</dfn></h5>
+ </dl><h5 id=comment-state><span class=secno>11.2.4.38 </span><dfn id=comment>Comment state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71061,10 +71384,10 @@
in comment end state -->
<dt>Anything else</dt>
- <dd>Append the input character to the comment token's data. Stay
- in the <a href=#comment-state>comment state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the comment
+ token's data. Stay in the <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=comment-end-dash-state><span class=secno>11.2.4.21 </span><dfn>Comment end dash state</dfn></h5>
+ </dl><h5 id=comment-end-dash-state><span class=secno>11.2.4.39 </span><dfn>Comment end dash state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71077,11 +71400,11 @@
in comment end state -->
<dt>Anything else</dt>
- <dd>Append a U+002D HYPHEN-MINUS character (-) and the input
- character to the comment token's data. Switch to the
- <a href=#comment-state>comment state</a>.</dd>
+ <dd>Append a U+002D HYPHEN-MINUS character (-) and the
+ <a href=#current-input-character>current input character</a> to the comment token's
+ data. Switch to the <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=comment-end-state><span class=secno>11.2.4.22 </span><dfn>Comment end state</dfn></h5>
+ </dl><h5 id=comment-end-state><span class=secno>11.2.4.40 </span><dfn>Comment end state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71095,8 +71418,9 @@
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
<dd><a href=#parse-error>Parse error</a>. Append two U+002D HYPHEN-MINUS (-)
- characters and the input character to the comment token's
- data. Switch to the <a href=#comment-end-space-state>comment end space state</a>.</dd>
+ characters and the <a href=#current-input-character>current input character</a> to the
+ comment token's data. Switch to the <a href=#comment-end-space-state>comment end space
+ state</a>.</dd>
<dt>U+0021 EXCLAMATION MARK (!)</dt>
<dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#comment-end-bang-state>comment end bang
@@ -71117,10 +71441,11 @@
<dt>Anything else</dt>
<dd><a href=#parse-error>Parse error</a>. Append two U+002D HYPHEN-MINUS (-)
- characters and the input character to the comment token's
- data. Switch to the <a href=#comment-state>comment state</a>.</dd>
+ characters and the <a href=#current-input-character>current input character</a> to the
+ comment token's data. Switch to the <a href=#comment-state>comment
+ state</a>.</dd>
- </dl><h5 id=comment-end-bang-state><span class=secno>11.2.4.23 </span><dfn>Comment end bang state</dfn></h5>
+ </dl><h5 id=comment-end-bang-state><span class=secno>11.2.4.41 </span><dfn>Comment end bang state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71140,11 +71465,11 @@
<dt>Anything else</dt>
<dd>Append two U+002D HYPHEN-MINUS (-) characters, a U+0021
- EXCLAMATION MARK character (!), and the input character to the
- comment token's data. Switch to the <a href=#comment-state>comment
- state</a>.</dd>
+ EXCLAMATION MARK character (!), and the <a href=#current-input-character>current input
+ character</a> to the comment token's data. Switch to the
+ <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=comment-end-space-state><span class=secno>11.2.4.24 </span><dfn>Comment end space state</dfn></h5>
+ </dl><h5 id=comment-end-space-state><span class=secno>11.2.4.42 </span><dfn>Comment end space state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71153,7 +71478,7 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Append the input character to the comment token's data. Stay in
+ <dd>Append the <a href=#current-input-character>current input character</a> to the comment token's data. Stay in
the <a href=#comment-end-space-state>comment end space state</a>.</dd>
<dt>U+002D HYPHEN-MINUS (-)</dt>
@@ -71169,10 +71494,10 @@
comment in comment end state -->
<dt>Anything else</dt>
- <dd>Append the input character to the comment token's data. Switch
+ <dd>Append the <a href=#current-input-character>current input character</a> to the comment token's data. Switch
to the <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=doctype-state><span class=secno>11.2.4.25 </span><dfn>DOCTYPE state</dfn></h5>
+ </dl><h5 id=doctype-state><span class=secno>11.2.4.43 </span><dfn>DOCTYPE state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71192,7 +71517,7 @@
<dd><a href=#parse-error>Parse error</a>. Reconsume the current
character in the <a href=#before-doctype-name-state>before DOCTYPE name state</a>.</dd>
- </dl><h5 id=before-doctype-name-state><span class=secno>11.2.4.26 </span><dfn>Before DOCTYPE name state</dfn></h5>
+ </dl><h5 id=before-doctype-name-state><span class=secno>11.2.4.44 </span><dfn>Before DOCTYPE name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71205,7 +71530,7 @@
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
<dd>Create a new DOCTYPE token. Set the token's name to the
- lowercase version of the input character (add 0x0020 to the
+ lowercase version of the <a href=#current-input-character>current input character</a> (add 0x0020 to the
character's code point). Switch to the <a href=#doctype-name-state>DOCTYPE name
state</a>.</dd>
@@ -71224,7 +71549,7 @@
<a href=#current-input-character>current input character</a>. Switch to the <a href=#doctype-name-state>DOCTYPE name
state</a>.</dd>
- </dl><h5 id=doctype-name-state><span class=secno>11.2.4.27 </span><dfn>DOCTYPE name state</dfn></h5>
+ </dl><h5 id=doctype-name-state><span class=secno>11.2.4.45 </span><dfn>DOCTYPE name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71240,9 +71565,10 @@
state</a>.</dd>
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Append the lowercase version of the input character (add 0x0020
- to the character's code point) to the current DOCTYPE token's
- name. Stay in the <a href=#doctype-name-state>DOCTYPE name state</a>.</dd>
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current DOCTYPE token's name. Stay in the <a href=#doctype-name-state>DOCTYPE name
+ state</a>.</dd>
<dt>EOF</dt>
<dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
@@ -71250,10 +71576,11 @@
Reconsume the EOF character in the <a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current DOCTYPE
- token's name. Stay in the <a href=#doctype-name-state>DOCTYPE name state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ DOCTYPE token's name. Stay in the <a href=#doctype-name-state>DOCTYPE name
+ state</a>.</dd>
- </dl><h5 id=after-doctype-name-state><span class=secno>11.2.4.28 </span><dfn>After DOCTYPE name state</dfn></h5>
+ </dl><h5 id=after-doctype-name-state><span class=secno>11.2.4.46 </span><dfn>After DOCTYPE name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71293,7 +71620,7 @@
</dd>
- </dl><h5 id=after-doctype-public-keyword-state><span class=secno>11.2.4.29 </span><dfn>After DOCTYPE public keyword state</dfn></h5>
+ </dl><h5 id=after-doctype-public-keyword-state><span class=secno>11.2.4.47 </span><dfn>After DOCTYPE public keyword state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71314,7 +71641,7 @@
<dd><a href=#parse-error>Parse error</a>. Reconsume the current character in
the <a href=#before-doctype-public-identifier-state>before DOCTYPE public identifier state</a>.</dd>
- </dl><h5 id=before-doctype-public-identifier-state><span class=secno>11.2.4.30 </span><dfn>Before DOCTYPE public identifier state</dfn></h5>
+ </dl><h5 id=before-doctype-public-identifier-state><span class=secno>11.2.4.48 </span><dfn>Before DOCTYPE public identifier state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71350,7 +71677,7 @@
<i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#bogus-doctype-state>bogus
DOCTYPE state</a>.</dd>
- </dl><h5 id=doctype-public-identifier-(double-quoted)-state><span class=secno>11.2.4.31 </span><dfn>DOCTYPE public identifier (double-quoted) state</dfn></h5>
+ </dl><h5 id=doctype-public-identifier-(double-quoted)-state><span class=secno>11.2.4.49 </span><dfn>DOCTYPE public identifier (double-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71372,7 +71699,7 @@
token's public identifier. Stay in the <a href=#doctype-public-identifier-(double-quoted)-state>DOCTYPE public
identifier (double-quoted) state</a>.</dd>
- </dl><h5 id=doctype-public-identifier-(single-quoted)-state><span class=secno>11.2.4.32 </span><dfn>DOCTYPE public identifier (single-quoted) state</dfn></h5>
+ </dl><h5 id=doctype-public-identifier-(single-quoted)-state><span class=secno>11.2.4.50 </span><dfn>DOCTYPE public identifier (single-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71394,7 +71721,7 @@
token's public identifier. Stay in the <a href=#doctype-public-identifier-(single-quoted)-state>DOCTYPE public
identifier (single-quoted) state</a>.</dd>
- </dl><h5 id=after-doctype-public-identifier-state><span class=secno>11.2.4.33 </span><dfn>After DOCTYPE public identifier state</dfn></h5>
+ </dl><h5 id=after-doctype-public-identifier-state><span class=secno>11.2.4.51 </span><dfn>After DOCTYPE public identifier state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71403,7 +71730,8 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Switch to the <a href=#between-doctype-public-and-system-identifiers-state>between DOCTYPE public and system identifiers state</a>.</dd>
+ <dd>Switch to the <a href=#between-doctype-public-and-system-identifiers-state>between DOCTYPE public and system
+ identifiers state</a>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd>Emit the current DOCTYPE token. Switch to the <a href=#data-state>data
@@ -71429,7 +71757,7 @@
<i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#bogus-doctype-state>bogus
DOCTYPE state</a>.</dd>
- </dl><h5 id=between-doctype-public-and-system-identifiers-state><span class=secno>11.2.4.34 </span><dfn>Between DOCTYPE public and system identifiers state</dfn></h5>
+ </dl><h5 id=between-doctype-public-and-system-identifiers-state><span class=secno>11.2.4.52 </span><dfn>Between DOCTYPE public and system identifiers state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71438,7 +71766,8 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Stay in the <a href=#between-doctype-public-and-system-identifiers-state>between DOCTYPE public and system identifiers state</a>.</dd>
+ <dd>Stay in the <a href=#between-doctype-public-and-system-identifiers-state>between DOCTYPE public and system identifiers
+ state</a>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd>Emit the current DOCTYPE token. Switch to the <a href=#data-state>data
@@ -71464,7 +71793,7 @@
<i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#bogus-doctype-state>bogus
DOCTYPE state</a>.</dd>
- </dl><h5 id=after-doctype-system-keyword-state><span class=secno>11.2.4.35 </span><dfn>After DOCTYPE system keyword state</dfn></h5>
+ </dl><h5 id=after-doctype-system-keyword-state><span class=secno>11.2.4.53 </span><dfn>After DOCTYPE system keyword state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71485,7 +71814,7 @@
<dd><a href=#parse-error>Parse error</a>. Reconsume the current character in
the <a href=#before-doctype-system-identifier-state>before DOCTYPE system identifier state</a>.</dd>
- </dl><h5 id=before-doctype-system-identifier-state><span class=secno>11.2.4.36 </span><dfn>Before DOCTYPE system identifier state</dfn></h5>
+ </dl><h5 id=before-doctype-system-identifier-state><span class=secno>11.2.4.54 </span><dfn>Before DOCTYPE system identifier state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71521,12 +71850,13 @@
<i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#bogus-doctype-state>bogus
DOCTYPE state</a>.</dd>
- </dl><h5 id=doctype-system-identifier-(double-quoted)-state><span class=secno>11.2.4.37 </span><dfn>DOCTYPE system identifier (double-quoted) state</dfn></h5>
+ </dl><h5 id=doctype-system-identifier-(double-quoted)-state><span class=secno>11.2.4.55 </span><dfn>DOCTYPE system identifier (double-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
<dl class=switch><dt>U+0022 QUOTATION MARK (")</dt>
- <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier state</a>.</dd>
+ <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier
+ state</a>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
@@ -71539,16 +71869,17 @@
Reconsume the EOF character in the <a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current DOCTYPE
- token's system identifier. Stay in the <a href=#doctype-system-identifier-(double-quoted)-state>DOCTYPE system
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ DOCTYPE token's system identifier. Stay in the <a href=#doctype-system-identifier-(double-quoted)-state>DOCTYPE system
identifier (double-quoted) state</a>.</dd>
- </dl><h5 id=doctype-system-identifier-(single-quoted)-state><span class=secno>11.2.4.38 </span><dfn>DOCTYPE system identifier (single-quoted) state</dfn></h5>
+ </dl><h5 id=doctype-system-identifier-(single-quoted)-state><span class=secno>11.2.4.56 </span><dfn>DOCTYPE system identifier (single-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
<dl class=switch><dt>U+0027 APOSTROPHE (')</dt>
- <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier state</a>.</dd>
+ <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier
+ state</a>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
@@ -71561,11 +71892,11 @@
Reconsume the EOF character in the <a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current DOCTYPE
- token's system identifier. Stay in the <a href=#doctype-system-identifier-(single-quoted)-state>DOCTYPE system
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ DOCTYPE token's system identifier. Stay in the <a href=#doctype-system-identifier-(single-quoted)-state>DOCTYPE system
identifier (single-quoted) state</a>.</dd>
- </dl><h5 id=after-doctype-system-identifier-state><span class=secno>11.2.4.39 </span><dfn>After DOCTYPE system identifier state</dfn></h5>
+ </dl><h5 id=after-doctype-system-identifier-state><span class=secno>11.2.4.57 </span><dfn>After DOCTYPE system identifier state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71574,7 +71905,8 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Stay in the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier state</a>.</dd>
+ <dd>Stay in the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier
+ state</a>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd>Emit the current DOCTYPE token. Switch to the <a href=#data-state>data
@@ -71590,7 +71922,7 @@
state</a>. (This does <em>not</em> set the DOCTYPE token's
<i>force-quirks flag</i> to <i>on</i>.)</dd>
- </dl><h5 id=bogus-doctype-state><span class=secno>11.2.4.40 </span><dfn>Bogus DOCTYPE state</dfn></h5>
+ </dl><h5 id=bogus-doctype-state><span class=secno>11.2.4.58 </span><dfn>Bogus DOCTYPE state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -71605,11 +71937,8 @@
<dt>Anything else</dt>
<dd>Stay in the <a href=#bogus-doctype-state>bogus DOCTYPE state</a>.</dd>
- </dl><h5 id=cdata-section-state><span class=secno>11.2.4.41 </span><dfn>CDATA section state</dfn></h5>
+ </dl><h5 id=cdata-section-state><span class=secno>11.2.4.59 </span><dfn>CDATA section state</dfn></h5>
- <p><i>(This can only happen if the <a href=#content-model-flag>content model
- flag</a> is set to the PCDATA state.)</i></p>
-
<p>Consume every character up to the next occurrence of the three
character sequence U+005D RIGHT SQUARE BRACKET U+005D RIGHT SQUARE
BRACKET U+003E GREATER-THAN SIGN (<code title="">]]></code>), or the
@@ -71625,7 +71954,7 @@
- <h5 id=tokenizing-character-references><span class=secno>11.2.4.42 </span>Tokenizing character references</h5>
+ <h5 id=tokenizing-character-references><span class=secno>11.2.4.60 </span>Tokenizing character references</h5>
<p>This section defines how to <dfn id=consume-a-character-reference>consume a character
reference</dfn>. This definition is used when parsing character
@@ -72072,11 +72401,10 @@
<ol><li><p><a href=#insert-an-html-element>Insert an HTML element</a> for the token.</li>
<li><p>If the algorithm that was invoked is the <a href=#generic-raw-text-element-parsing-algorithm>generic raw
- text element parsing algorithm</a>, switch the tokenizer's
- <a href=#content-model-flag>content model flag</a> to the RAWTEXT state; otherwise the
- algorithm invoked was the <a href=#generic-rcdata-element-parsing-algorithm>generic RCDATA element parsing
- algorithm</a>, switch the tokenizer's <a href=#content-model-flag>content model
- flag</a> to the RCDATA state.</li>
+ text element parsing algorithm</a>, switch the tokenizer to the
+ <a href=#rawtext-state>RAWTEXT state</a>; otherwise the algorithm invoked
+ was the <a href=#generic-rcdata-element-parsing-algorithm>generic RCDATA element parsing algorithm</a>,
+ switch the tokenizer to the <a href=#rcdata-state>RCDATA state</a>.</li>
<li><p>Let the <a href=#original-insertion-mode>original insertion mode</a> be the current
<a href=#insertion-mode>insertion mode</a>.</p>
@@ -72590,8 +72918,8 @@
and push it onto the <a href=#stack-of-open-elements>stack of open
elements</a>.</li>
- <li><p>Switch the tokenizer's <a href=#content-model-flag>content model flag</a> to
- the RAWTEXT state.</li>
+ <li><p>Switch the tokenizer to the <a href=#script-data-state>script data
+ state</a>.</li>
<li><p>Let the <a href=#original-insertion-mode>original insertion mode</a> be the current
<a href=#insertion-mode>insertion mode</a>.</p>
@@ -73130,14 +73458,12 @@
<p><a href=#insert-an-html-element>Insert an HTML element</a> for the token.</p>
- <p>Switch the <a href=#content-model-flag>content model flag</a> to the PLAINTEXT
- state.</p>
+ <p>Switch the tokenizer to the <a href=#plaintext-state>PLAINTEXT state</a>.</p>
- <p class=note>Once a start tag with the tag name "plaintext"
- has been seen, that will be the last token ever seen other
- than character tokens (and the end-of-file token), because
- there is no way to switch the <a href=#content-model-flag>content model flag</a>
- out of the PLAINTEXT state.</p>
+ <p class=note>Once a start tag with the tag name "plaintext" has
+ been seen, that will be the last token ever seen other than
+ character tokens (and the end-of-file token), because there is no
+ way to switch out of the <a href=#plaintext-state>PLAINTEXT state</a>.</p>
</dd>
@@ -73733,8 +74059,8 @@
one. (Newlines at the start of <code><a href=#the-textarea-element>textarea</a></code> elements are
ignored as an authoring convenience.)</li>
- <li><p>Switch the tokenizer's <a href=#content-model-flag>content model flag</a> to
- the RCDATA state.</li>
+ <li><p>Switch the tokenizer to the the <a href=#rcdata-state>RCDATA
+ state</a>.</li>
<li><p>Let the <a href=#original-insertion-mode>original insertion mode</a> be the
current <a href=#insertion-mode>insertion mode</a>.</p>
@@ -76096,42 +76422,38 @@
<ol><li>
- <p>Set the <a href=#html-parser>HTML parser</a>'s <a href=#tokenization>tokenization</a>
- stage's <a href=#content-model-flag>content model flag</a> according to the <var title="">context</var> element, as follows:</p>
+ <p>Set the state of the <a href=#html-parser>HTML parser</a>'s
+ <a href=#tokenization>tokenization</a> stage as follows:</p>
<dl class=switch><dt>If it is a <code><a href=#the-title-element-0>title</a></code> or <code><a href=#the-textarea-element>textarea</a></code>
element</dt>
- <dd>Set the <a href=#content-model-flag>content model flag</a> to
- the RCDATA state.</dd>
+ <dd>Switch the tokenizer to the <a href=#rcdata-state>RCDATA state</a>.</dd>
<dt>If it is a <code><a href=#the-style-element>style</a></code>, <code><a href=#script>script</a></code>,
<code><a href=#xmp>xmp</a></code>, <code><a href=#the-iframe-element>iframe</a></code>, <code><a href=#noembed>noembed</a></code>, or
<code><a href=#noframes>noframes</a></code> element</dt>
- <dd>Set the <a href=#content-model-flag>content model flag</a> to
- the RAWTEXT state.</dd>
+ <dd>Switch the tokenizer to the <a href=#rawtext-state>RAWTEXT state</a>.</dd>
<dt>If it is a <code><a href=#the-noscript-element>noscript</a></code> element</dt>
- <dd>If the <a href=#scripting-flag>scripting flag</a> is enabled, set the
- <a href=#content-model-flag>content model flag</a> to the RAWTEXT
- state. Otherwise, set the <a href=#content-model-flag>content model flag</a> to the
- PCDATA state.</dd>
+ <dd>If the <a href=#scripting-flag>scripting flag</a> is enabled, switch the
+ tokenizer to the <a href=#rawtext-state>RAWTEXT state</a>. Otherwise,
+ leave the tokenizer in the <a href=#data-state>data state</a>.</dd>
<dt>If it is a <code><a href=#plaintext>plaintext</a></code> element</dt>
- <dd>Set the <a href=#content-model-flag>content model flag</a> to
- PLAINTEXT.</dd>
+ <dd>Switch the tokenizer to the <a href=#plaintext-state>PLAINTEXT
+ state</a>.</dd>
<dt>Otherwise</dt>
- <dd>Leave the <a href=#content-model-flag>content model flag</a> in the PCDATA
- state.</dd>
+ <dd>Leave the tokenizer in the <a href=#data-state>data state</a>.</dd>
</dl></li>
Modified: index
===================================================================
--- index 2009-10-19 05:52:18 UTC (rev 4176)
+++ index 2009-10-19 11:00:31 UTC (rev 4177)
@@ -881,47 +881,65 @@
<li><a href=#tokenization><span class=secno>9.2.4 </span>Tokenization</a>
<ol>
<li><a href=#data-state><span class=secno>9.2.4.1 </span>Data state</a></li>
- <li><a href=#character-reference-in-data-state><span class=secno>9.2.4.2 </span>Character reference in data state</a></li>
- <li><a href=#tag-open-state><span class=secno>9.2.4.3 </span>Tag open state</a></li>
- <li><a href=#close-tag-open-state><span class=secno>9.2.4.4 </span>Close tag open state</a></li>
- <li><a href=#tag-name-state><span class=secno>9.2.4.5 </span>Tag name state</a></li>
- <li><a href=#before-attribute-name-state><span class=secno>9.2.4.6 </span>Before attribute name state</a></li>
- <li><a href=#attribute-name-state><span class=secno>9.2.4.7 </span>Attribute name state</a></li>
- <li><a href=#after-attribute-name-state><span class=secno>9.2.4.8 </span>After attribute name state</a></li>
- <li><a href=#before-attribute-value-state><span class=secno>9.2.4.9 </span>Before attribute value state</a></li>
- <li><a href=#attribute-value-(double-quoted)-state><span class=secno>9.2.4.10 </span>Attribute value (double-quoted) state</a></li>
- <li><a href=#attribute-value-(single-quoted)-state><span class=secno>9.2.4.11 </span>Attribute value (single-quoted) state</a></li>
- <li><a href=#attribute-value-(unquoted)-state><span class=secno>9.2.4.12 </span>Attribute value (unquoted) state</a></li>
- <li><a href=#character-reference-in-attribute-value-state><span class=secno>9.2.4.13 </span>Character reference in attribute value state</a></li>
- <li><a href=#after-attribute-value-(quoted)-state><span class=secno>9.2.4.14 </span>After attribute value (quoted) state</a></li>
- <li><a href=#self-closing-start-tag-state><span class=secno>9.2.4.15 </span>Self-closing start tag state</a></li>
- <li><a href=#bogus-comment-state><span class=secno>9.2.4.16 </span>Bogus comment state</a></li>
- <li><a href=#markup-declaration-open-state><span class=secno>9.2.4.17 </span>Markup declaration open state</a></li>
- <li><a href=#comment-start-state><span class=secno>9.2.4.18 </span>Comment start state</a></li>
- <li><a href=#comment-start-dash-state><span class=secno>9.2.4.19 </span>Comment start dash state</a></li>
- <li><a href=#comment-state><span class=secno>9.2.4.20 </span>Comment state</a></li>
- <li><a href=#comment-end-dash-state><span class=secno>9.2.4.21 </span>Comment end dash state</a></li>
- <li><a href=#comment-end-state><span class=secno>9.2.4.22 </span>Comment end state</a></li>
- <li><a href=#comment-end-bang-state><span class=secno>9.2.4.23 </span>Comment end bang state</a></li>
- <li><a href=#comment-end-space-state><span class=secno>9.2.4.24 </span>Comment end space state</a></li>
- <li><a href=#doctype-state><span class=secno>9.2.4.25 </span>DOCTYPE state</a></li>
- <li><a href=#before-doctype-name-state><span class=secno>9.2.4.26 </span>Before DOCTYPE name state</a></li>
- <li><a href=#doctype-name-state><span class=secno>9.2.4.27 </span>DOCTYPE name state</a></li>
- <li><a href=#after-doctype-name-state><span class=secno>9.2.4.28 </span>After DOCTYPE name state</a></li>
- <li><a href=#after-doctype-public-keyword-state><span class=secno>9.2.4.29 </span>After DOCTYPE public keyword state</a></li>
- <li><a href=#before-doctype-public-identifier-state><span class=secno>9.2.4.30 </span>Before DOCTYPE public identifier state</a></li>
- <li><a href=#doctype-public-identifier-(double-quoted)-state><span class=secno>9.2.4.31 </span>DOCTYPE public identifier (double-quoted) state</a></li>
- <li><a href=#doctype-public-identifier-(single-quoted)-state><span class=secno>9.2.4.32 </span>DOCTYPE public identifier (single-quoted) state</a></li>
- <li><a href=#after-doctype-public-identifier-state><span class=secno>9.2.4.33 </span>After DOCTYPE public identifier state</a></li>
- <li><a href=#between-doctype-public-and-system-identifiers-state><span class=secno>9.2.4.34 </span>Between DOCTYPE public and system identifiers state</a></li>
- <li><a href=#after-doctype-system-keyword-state><span class=secno>9.2.4.35 </span>After DOCTYPE system keyword state</a></li>
- <li><a href=#before-doctype-system-identifier-state><span class=secno>9.2.4.36 </span>Before DOCTYPE system identifier state</a></li>
- <li><a href=#doctype-system-identifier-(double-quoted)-state><span class=secno>9.2.4.37 </span>DOCTYPE system identifier (double-quoted) state</a></li>
- <li><a href=#doctype-system-identifier-(single-quoted)-state><span class=secno>9.2.4.38 </span>DOCTYPE system identifier (single-quoted) state</a></li>
- <li><a href=#after-doctype-system-identifier-state><span class=secno>9.2.4.39 </span>After DOCTYPE system identifier state</a></li>
- <li><a href=#bogus-doctype-state><span class=secno>9.2.4.40 </span>Bogus DOCTYPE state</a></li>
- <li><a href=#cdata-section-state><span class=secno>9.2.4.41 </span>CDATA section state</a></li>
- <li><a href=#tokenizing-character-references><span class=secno>9.2.4.42 </span>Tokenizing character references</a></ol></li>
+ <li><a href=#rcdata-state><span class=secno>9.2.4.2 </span>RCDATA state</a></li>
+ <li><a href=#rawtext-state><span class=secno>9.2.4.3 </span>RAWTEXT state</a></li>
+ <li><a href=#script-data-state><span class=secno>9.2.4.4 </span>Script data state</a></li>
+ <li><a href=#plaintext-state><span class=secno>9.2.4.5 </span>PLAINTEXT state</a></li>
+ <li><a href=#character-reference-in-data-state><span class=secno>9.2.4.6 </span>Character reference in data state</a></li>
+ <li><a href=#tag-open-state><span class=secno>9.2.4.7 </span>Tag open state</a></li>
+ <li><a href=#close-tag-open-state><span class=secno>9.2.4.8 </span>Close tag open state</a></li>
+ <li><a href=#tag-name-state><span class=secno>9.2.4.9 </span>Tag name state</a></li>
+ <li><a href=#rcdata-less-than-sign-state><span class=secno>9.2.4.10 </span>RCDATA less-than sign state</a></li>
+ <li><a href=#rcdata-end-tag-open-state><span class=secno>9.2.4.11 </span>RCDATA end tag open state</a></li>
+ <li><a href=#rcdata-end-tag-name-state><span class=secno>9.2.4.12 </span>RCDATA end tag name state</a></li>
+ <li><a href=#rawtext-less-than-sign-state><span class=secno>9.2.4.13 </span>RAWTEXT less-than sign state</a></li>
+ <li><a href=#rawtext-end-tag-open-state><span class=secno>9.2.4.14 </span>RAWTEXT end tag open state</a></li>
+ <li><a href=#rawtext-end-tag-name-state><span class=secno>9.2.4.15 </span>RAWTEXT end tag name state</a></li>
+ <li><a href=#script-data-less-than-sign-state><span class=secno>9.2.4.16 </span>Script data less-than sign state</a></li>
+ <li><a href=#script-data-end-tag-open-state><span class=secno>9.2.4.17 </span>Script data end tag open state</a></li>
+ <li><a href=#script-data-end-tag-name-state><span class=secno>9.2.4.18 </span>Script data end tag name state</a></li>
+ <li><a href=#script-data-escape-start-state><span class=secno>9.2.4.19 </span>Script data escape start state</a></li>
+ <li><a href=#script-data-escape-start-dash-state><span class=secno>9.2.4.20 </span>Script data escape start dash state</a></li>
+ <li><a href=#script-data-escaped-state><span class=secno>9.2.4.21 </span>Script data escaped state</a></li>
+ <li><a href=#script-data-escaped-dash-state><span class=secno>9.2.4.22 </span>Script data escaped dash state</a></li>
+ <li><a href=#script-data-escaped-dash-dash-state><span class=secno>9.2.4.23 </span>Script data escaped dash dash state</a></li>
+ <li><a href=#before-attribute-name-state><span class=secno>9.2.4.24 </span>Before attribute name state</a></li>
+ <li><a href=#attribute-name-state><span class=secno>9.2.4.25 </span>Attribute name state</a></li>
+ <li><a href=#after-attribute-name-state><span class=secno>9.2.4.26 </span>After attribute name state</a></li>
+ <li><a href=#before-attribute-value-state><span class=secno>9.2.4.27 </span>Before attribute value state</a></li>
+ <li><a href=#attribute-value-(double-quoted)-state><span class=secno>9.2.4.28 </span>Attribute value (double-quoted) state</a></li>
+ <li><a href=#attribute-value-(single-quoted)-state><span class=secno>9.2.4.29 </span>Attribute value (single-quoted) state</a></li>
+ <li><a href=#attribute-value-(unquoted)-state><span class=secno>9.2.4.30 </span>Attribute value (unquoted) state</a></li>
+ <li><a href=#character-reference-in-attribute-value-state><span class=secno>9.2.4.31 </span>Character reference in attribute value state</a></li>
+ <li><a href=#after-attribute-value-(quoted)-state><span class=secno>9.2.4.32 </span>After attribute value (quoted) state</a></li>
+ <li><a href=#self-closing-start-tag-state><span class=secno>9.2.4.33 </span>Self-closing start tag state</a></li>
+ <li><a href=#bogus-comment-state><span class=secno>9.2.4.34 </span>Bogus comment state</a></li>
+ <li><a href=#markup-declaration-open-state><span class=secno>9.2.4.35 </span>Markup declaration open state</a></li>
+ <li><a href=#comment-start-state><span class=secno>9.2.4.36 </span>Comment start state</a></li>
+ <li><a href=#comment-start-dash-state><span class=secno>9.2.4.37 </span>Comment start dash state</a></li>
+ <li><a href=#comment-state><span class=secno>9.2.4.38 </span>Comment state</a></li>
+ <li><a href=#comment-end-dash-state><span class=secno>9.2.4.39 </span>Comment end dash state</a></li>
+ <li><a href=#comment-end-state><span class=secno>9.2.4.40 </span>Comment end state</a></li>
+ <li><a href=#comment-end-bang-state><span class=secno>9.2.4.41 </span>Comment end bang state</a></li>
+ <li><a href=#comment-end-space-state><span class=secno>9.2.4.42 </span>Comment end space state</a></li>
+ <li><a href=#doctype-state><span class=secno>9.2.4.43 </span>DOCTYPE state</a></li>
+ <li><a href=#before-doctype-name-state><span class=secno>9.2.4.44 </span>Before DOCTYPE name state</a></li>
+ <li><a href=#doctype-name-state><span class=secno>9.2.4.45 </span>DOCTYPE name state</a></li>
+ <li><a href=#after-doctype-name-state><span class=secno>9.2.4.46 </span>After DOCTYPE name state</a></li>
+ <li><a href=#after-doctype-public-keyword-state><span class=secno>9.2.4.47 </span>After DOCTYPE public keyword state</a></li>
+ <li><a href=#before-doctype-public-identifier-state><span class=secno>9.2.4.48 </span>Before DOCTYPE public identifier state</a></li>
+ <li><a href=#doctype-public-identifier-(double-quoted)-state><span class=secno>9.2.4.49 </span>DOCTYPE public identifier (double-quoted) state</a></li>
+ <li><a href=#doctype-public-identifier-(single-quoted)-state><span class=secno>9.2.4.50 </span>DOCTYPE public identifier (single-quoted) state</a></li>
+ <li><a href=#after-doctype-public-identifier-state><span class=secno>9.2.4.51 </span>After DOCTYPE public identifier state</a></li>
+ <li><a href=#between-doctype-public-and-system-identifiers-state><span class=secno>9.2.4.52 </span>Between DOCTYPE public and system identifiers state</a></li>
+ <li><a href=#after-doctype-system-keyword-state><span class=secno>9.2.4.53 </span>After DOCTYPE system keyword state</a></li>
+ <li><a href=#before-doctype-system-identifier-state><span class=secno>9.2.4.54 </span>Before DOCTYPE system identifier state</a></li>
+ <li><a href=#doctype-system-identifier-(double-quoted)-state><span class=secno>9.2.4.55 </span>DOCTYPE system identifier (double-quoted) state</a></li>
+ <li><a href=#doctype-system-identifier-(single-quoted)-state><span class=secno>9.2.4.56 </span>DOCTYPE system identifier (single-quoted) state</a></li>
+ <li><a href=#after-doctype-system-identifier-state><span class=secno>9.2.4.57 </span>After DOCTYPE system identifier state</a></li>
+ <li><a href=#bogus-doctype-state><span class=secno>9.2.4.58 </span>Bogus DOCTYPE state</a></li>
+ <li><a href=#cdata-section-state><span class=secno>9.2.4.59 </span>CDATA section state</a></li>
+ <li><a href=#tokenizing-character-references><span class=secno>9.2.4.60 </span>Tokenizing character references</a></ol></li>
<li><a href=#tree-construction><span class=secno>9.2.5 </span>Tree construction</a>
<ol>
<li><a href=#creating-and-inserting-elements><span class=secno>9.2.5.1 </span>Creating and inserting elements</a></li>
@@ -9614,9 +9632,9 @@
<p>If <var title="">type</var> is <em>not</em> now an <a href=#ascii-case-insensitive>ASCII
case-insensitive</a> match for the string
"<code><a href=#text/html>text/html</a></code>", then act as if the tokenizer had emitted
- a start tag token with the tag name "pre", then set the <a href=#html-parser>HTML
- parser</a>'s <a href=#tokenization>tokenization</a> stage's <a href=#content-model-flag>content
- model flag</a> to <i title="">PLAINTEXT</i>.</p>
+ a start tag token with the tag name "pre", then switch the
+ <a href=#html-parser>HTML parser</a>'s tokenizer to the <a href=#plaintext-state>PLAINTEXT
+ state</a>.</p>
<!--
http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E...%3Ciframe%3E%3C%2Fiframe%3E%3Cscript%3Eonload%20%3D%20function%20()%20%7B%20%0D%0A%20%20var%20d%20%3D%20document.getElementsByTagName('iframe')%5B0%5D.contentDocument%3B%0D%0A%20%20d.open('image%2Fsvg%2Bxml')%3B%0D%0A%20%20d.write(%22%3Cinput%20xmlns%3D'http%3A%2F%2Fwww.w3.org%2F1999%2Fxhtml'%20value%3D'(x)html'%2F%3E%22)%3B%0D%0A%20%20d.close()%3B%0D%0A%7D%3B%3C%2Fscript%3E
@@ -52917,9 +52935,9 @@
context</a>, the user agent should <a href=#create-a-document-object>create a
<code>Document</code> object</a>, mark it as being an <a href=#html-documents title="HTML documents">HTML document</a>, create an <a href=#html-parser>HTML
parser</a>, associate it with the document, act as if the
- tokenizer had emitted a start tag token with the tag name "pre", set
- the <a href=#tokenization>tokenization</a> stage's <a href=#content-model-flag>content model
- flag</a> to <i title="">PLAINTEXT</i>, and begin to pass the stream of
+ tokenizer had emitted a start tag token with the tag name "pre",
+ switch the <a href=#html-parser>HTML parser</a>'s tokenizer to the
+ <a href=#plaintext-state>PLAINTEXT state</a>, and begin to pass the stream of
characters in the plain text document to that tokenizer.</p>
<p>The rules for how to convert the bytes of the plain text document
@@ -61420,16 +61438,13 @@
switches it to a new state (to consume the next character), or
repeats the same state (to consume the next character). Some states
have more complicated behavior and can consume several characters
- before switching to another state.</p>
+ before switching to another state. In some cases, the tokenizer
+ state is also changed by the tree construction stage.</p>
- <p>The exact behavior of certain states depends on a <dfn id=content-model-flag>content
- model flag</dfn> that is set after certain tokens are emitted. The
- flag has several states: <i title="">PCDATA</i>, <i title="">RCDATA</i>, <i title="">RAWTEXT</i>, and <i title="">PLAINTEXT</i>. Initially, it must be in the PCDATA
- state. In the RCDATA and RAWTEXT states, a further <dfn id=escape-flag>escape
- flag</dfn> is used to control the behavior of the tokenizer. It is
- either true or false, and initially must be set to the false
- state. The <a href=#insertion-mode>insertion mode</a> and the <a href=#stack-of-open-elements>stack of open
- elements</a> also affects tokenization.</p>
+ <p>The exact behavior of certain states depends on the
+ <a href=#insertion-mode>insertion mode</a> and the <a href=#stack-of-open-elements>stack of open
+ elements</a>. Certain states also use a <dfn id=temporary-buffer><var>temporary
+ buffer</var></dfn> to track progress.</p>
<p>The output of the tokenization step is a series of zero or more
of the following tokens: DOCTYPE, start tag, end tag, comment,
@@ -61448,8 +61463,8 @@
<p>When a token is emitted, it must immediately be handled by the
<a href=#tree-construction>tree construction</a> stage. The tree construction stage
- can affect the state of the <a href=#content-model-flag>content model flag</a>, and can
- insert additional characters into the stream. (For example, the
+ can affect the state of the tokenization stage, and can insert
+ additional characters into the stream. (For example, the
<code><a href=#script>script</a></code> element can result in scripts executing and
using the <a href=#dynamic-markup-insertion>dynamic markup insertion</a> APIs to insert
characters into the stream being tokenized.)</p>
@@ -61459,15 +61474,18 @@
self-closing flag">acknowledged</dfn> when it is processed by the
tree construction stage, that is a <a href=#parse-error>parse error</a>.</p>
- <p>When an end tag token is emitted, the <a href=#content-model-flag>content model
- flag</a> must be switched to the PCDATA state.</p>
-
<p>When an end tag token is emitted with attributes, that is a
<a href=#parse-error>parse error</a>.</p>
<p>When an end tag token is emitted with its <i>self-closing
flag</i> set, that is a <a href=#parse-error>parse error</a>.</p>
+ <p>An <dfn id=appropriate-end-tag-token>appropriate end tag token</dfn> is an end tag token whose
+ tag name matches the tag name of the last start tag to have been
+ emitted from this tokenizer, if any. If no start tag has been
+ emitted from this tokenizer, then no end tag token is
+ appropriate.</p>
+
<p>Before each step of the tokenizer, the user agent must first
check the <a href=#parser-pause-flag>parser pause flag</a>. If it is true, then the
tokenizer must abort the processing of any nested invocations of the
@@ -61476,187 +61494,152 @@
<p>The tokenizer state machine consists of the states defined in the
following subsections.</p>
+
<!-- Order of the lists below is supposed to be non-error then
error, by unicode, then EOF, ending with "anything else" -->
+
<h5 id=data-state><span class=secno>9.2.4.1 </span><dfn>Data state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
<dl class=switch><dt>U+0026 AMPERSAND (&)</dt>
- <dd>When the <a href=#content-model-flag>content model flag</a> is set to one of the
- PCDATA or RCDATA states and the <a href=#escape-flag>escape flag</a> is
- false: switch to the <a href=#character-reference-in-data-state>character reference in data
+ <dd>Switch to the <a href=#character-reference-in-data-state>character reference in data
state</a>.</dd>
- <dd>Otherwise: treat it as per the "anything else" entry
- below.</dd>
- <dt>U+002D HYPHEN-MINUS (-)</dt>
- <dd>
+ <dt>U+003C LESS-THAN SIGN (<)</dt>
+ <dd>Switch to the <a href=#tag-open-state>tag open state</a>.</dd>
- <p>If the <a href=#content-model-flag>content model flag</a> is set to either the
- RCDATA state or the RAWTEXT state, and the <a href=#escape-flag>escape flag</a>
- is false, and there are at least three characters before this
- one in the input stream, and the last four characters in the
- input stream, including this one, are U+003C LESS-THAN SIGN,
- U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, and U+002D
- HYPHEN-MINUS ("<!--"), then set the <a href=#escape-flag>escape flag</a>
- to true.</p>
+ <dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>In any case, emit the input character as a character
- token. Stay in the <a href=#data-state>data state</a>.</p>
+ <dt>Anything else</dt>
+ <dd>Emit the <a href=#current-input-character>current input character</a> as a character
+ token. Stay in the <a href=#data-state>data state</a>.</dd>
- </dd>
+ </dl><h5 id=rcdata-state><span class=secno>9.2.4.2 </span><dfn>RCDATA state</dfn></h5>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0026 AMPERSAND (&)</dt>
+ <dd>Switch to the <a href=#character-reference-in-data-state>character reference in data
+ state</a>.</dd>
+
<dt>U+003C LESS-THAN SIGN (<)</dt>
- <dd>When the <a href=#content-model-flag>content model flag</a> is set to the PCDATA
- state: switch to the <a href=#tag-open-state>tag open state</a>.</dd>
- <dd>When the <a href=#content-model-flag>content model flag</a> is set to either the
- RCDATA state or the RAWTEXT state, and the <a href=#escape-flag>escape flag</a>
- is false: switch to the <a href=#tag-open-state>tag open state</a>.</dd>
- <dd>Otherwise: treat it as per the "anything else" entry
- below.</dd>
+ <dd>Switch to the <a href=#rcdata-less-than-sign-state>RCDATA less-than sign state</a>.</dd>
- <dt>U+003E GREATER-THAN SIGN (>)</dt>
- <dd>
+ <dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>If the <a href=#content-model-flag>content model flag</a> is set to either the
- RCDATA state or the RAWTEXT state, and the <a href=#escape-flag>escape
- flag</a> is true, and the last three characters in the input
- stream including this one are U+002D HYPHEN-MINUS, U+002D
- HYPHEN-MINUS, U+003E GREATER-THAN SIGN ("-->"), set the
- <a href=#escape-flag>escape flag</a> to false.</p> <!-- no need to check
- that there are enough characters, since you can only run into
- this if the flag is true in the first place, which requires four
- characters. -->
+ <dt>Anything else</dt>
+ <dd>Emit the <a href=#current-input-character>current input character</a> as a character
+ token. Stay in the <a href=#rcdata-state>RCDATA state</a>.</dd>
- <p>In any case, emit the input character as a character
- token. Stay in the <a href=#data-state>data state</a>.</p>
+ </dl><h5 id=rawtext-state><span class=secno>9.2.4.3 </span><dfn>RAWTEXT state</dfn></h5>
- </dd>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+ <dl class=switch><dt>U+003C LESS-THAN SIGN (<)</dt>
+ <dd>Switch to the <a href=#rawtext-less-than-sign-state>RAWTEXT less-than sign state</a>.</dd>
+
<dt>EOF</dt>
<dd>Emit an end-of-file token.</dd>
<dt>Anything else</dt>
- <dd>Emit the input character as a character token. Stay in the
- <a href=#data-state>data state</a>.</dd>
+ <dd>Emit the <a href=#current-input-character>current input character</a> as a character
+ token. Stay in the <a href=#rawtext-state>RAWTEXT state</a>.</dd>
- </dl><h5 id=character-reference-in-data-state><span class=secno>9.2.4.2 </span><dfn>Character reference in data state</dfn></h5>
+ </dl><h5 id=script-data-state><span class=secno>9.2.4.4 </span><dfn>Script data state</dfn></h5>
- <p><i>(This cannot happen if the <a href=#content-model-flag>content model flag</a>
- is set to the RAWTEXT state.)</i></p>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
- <p>Attempt to <a href=#consume-a-character-reference>consume a character reference</a>, with no
- <a href=#additional-allowed-character>additional allowed character</a>.</p>
+ <dl class=switch><dt>U+003C LESS-THAN SIGN (<)</dt>
+ <dd>Switch to the <a href=#script-data-less-than-sign-state>script data less-than sign state</a>.</dd>
- <p>If nothing is returned, emit a U+0026 AMPERSAND character
- token.</p>
+ <dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>Otherwise, emit the character token that was returned.</p>
+ <dt>Anything else</dt>
+ <dd>Emit the <a href=#current-input-character>current input character</a> as a character
+ token. Stay in the <a href=#script-data-state>script data state</a>.</dd>
- <p>Finally, switch to the <a href=#data-state>data state</a>.</p>
+ </dl><h5 id=plaintext-state><span class=secno>9.2.4.5 </span><dfn>PLAINTEXT state</dfn></h5>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
- <h5 id=tag-open-state><span class=secno>9.2.4.3 </span><dfn>Tag open state</dfn></h5>
+ <dl class=switch><dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>The behavior of this state depends on the <a href=#content-model-flag>content model
- flag</a>.</p>
+ <dt>Anything else</dt>
+ <dd>Emit the <a href=#current-input-character>current input character</a> as a character
+ token. Stay in the <a href=#plaintext-state>PLAINTEXT state</a>.</dd>
- <dl><dt>If the <a href=#content-model-flag>content model flag</a> is set to the RCDATA
- or RAWTEXT states</dt>
+ </dl><h5 id=character-reference-in-data-state><span class=secno>9.2.4.6 </span><dfn>Character reference in data state</dfn></h5>
- <dd>
+ <p>Attempt to <a href=#consume-a-character-reference>consume a character reference</a>, with no
+ <a href=#additional-allowed-character>additional allowed character</a>.</p>
- <p>Consume the <a href=#next-input-character>next input character</a>. If it is a
- U+002F SOLIDUS character (/), switch to the <a href=#close-tag-open-state>close tag open
- state</a>. Otherwise, emit a U+003C LESS-THAN SIGN character
- token and reconsume the <a href=#current-input-character>current input character</a> in the
- <a href=#data-state>data state</a>.</p>
+ <p>If nothing is returned, emit a U+0026 AMPERSAND character
+ token.</p>
- </dd>
+ <p>Otherwise, emit the character token that was returned.</p>
- <dt>If the <a href=#content-model-flag>content model flag</a> is set to the PCDATA
- state</dt>
+ <p>Finally, switch to the <a href=#data-state>data state</a>.</p>
- <dd>
- <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+ <h5 id=tag-open-state><span class=secno>9.2.4.7 </span><dfn>Tag open state</dfn></h5>
- <dl class=switch><dt>U+0021 EXCLAMATION MARK (!)</dt>
- <dd>Switch to the <a href=#markup-declaration-open-state>markup declaration open state</a>.</dd>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
- <dt>U+002F SOLIDUS (/)</dt>
- <dd>Switch to the <a href=#close-tag-open-state>close tag open state</a>.</dd>
+ <dl class=switch><dt>U+0021 EXCLAMATION MARK (!)</dt>
+ <dd>Switch to the <a href=#markup-declaration-open-state>markup declaration open state</a>.</dd>
- <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Create a new start tag token, set its tag name to the
- lowercase version of the input character (add 0x0020 to the
- character's code point), then switch to the <a href=#tag-name-state>tag name
- state</a>. (Don't emit the token yet; further details will
- be filled in before it is emitted.)</dd>
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>Switch to the <a href=#close-tag-open-state>close tag open state</a>.</dd>
- <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
- <dd>Create a new start tag token, set its tag name to the input
- character, then switch to the <a href=#tag-name-state>tag name
- state</a>. (Don't emit the token yet; further details will
- be filled in before it is emitted.)</dd>
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new start tag token, set its tag name to the
+ lowercase version of the <a href=#current-input-character>current input character</a> (add 0x0020 to the
+ character's code point), then switch to the <a href=#tag-name-state>tag name
+ state</a>. (Don't emit the token yet; further details will
+ be filled in before it is emitted.)</dd>
- <dt>U+003E GREATER-THAN SIGN (>)</dt>
- <dd><a href=#parse-error>Parse error</a>. Emit a U+003C LESS-THAN SIGN
- character token and a U+003E GREATER-THAN SIGN character
- token. Switch to the <a href=#data-state>data state</a>.</dd>
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new start tag token, set its tag name to the
+ <a href=#current-input-character>current input character</a>, then switch to the <a href=#tag-name-state>tag
+ name state</a>. (Don't emit the token yet; further details will
+ be filled in before it is emitted.)</dd>
- <dt>U+003F QUESTION MARK (?)</dt>
- <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#bogus-comment-state>bogus
- comment state</a>.</dd>
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd><a href=#parse-error>Parse error</a>. Emit a U+003C LESS-THAN SIGN
+ character token and a U+003E GREATER-THAN SIGN character
+ token. Switch to the <a href=#data-state>data state</a>.</dd>
- <dt>Anything else</dt>
- <dd><a href=#parse-error>Parse error</a>. Emit a U+003C LESS-THAN SIGN
- character token and reconsume the <a href=#current-input-character>current input character</a> in the
- <a href=#data-state>data state</a>.</dd>
+ <dt>U+003F QUESTION MARK (?)</dt>
+ <dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#bogus-comment-state>bogus
+ comment state</a>.</dd>
- </dl></dd>
+ <dt>Anything else</dt>
+ <dd><a href=#parse-error>Parse error</a>. Emit a U+003C LESS-THAN SIGN
+ character token and reconsume the <a href=#current-input-character>current input
+ character</a> in the <a href=#data-state>data state</a>.</dd>
- </dl><h5 id=close-tag-open-state><span class=secno>9.2.4.4 </span><dfn>Close tag open state</dfn></h5>
+ </dl><h5 id=close-tag-open-state><span class=secno>9.2.4.8 </span><dfn>Close tag open state</dfn></h5>
- <p>If the <a href=#content-model-flag>content model flag</a> is set to the RCDATA or
- RAWTEXT states but no start tag token has ever been emitted by this
- instance of the tokenizer (<a href=#fragment-case>fragment case</a>), or, if the
- <a href=#content-model-flag>content model flag</a> is set to the RCDATA or RAWTEXT states
- and the next few characters do not match the tag name of the last
- start tag token emitted (compared in an <a href=#ascii-case-insensitive>ASCII
- case-insensitive</a> manner), or if they do but they are not
- immediately followed by one of the following characters:</p>
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
- <ul class=brief><li>U+0009 CHARACTER TABULATION</li>
- <li>U+000A LINE FEED (LF)</li>
- <li>U+000C FORM FEED (FF)</li>
- <!--<li>U+000D CARRIAGE RETURN (CR)</li>-->
- <li>U+0020 SPACE</li>
- <li>U+003E GREATER-THAN SIGN (>)</li>
- <li>U+002F SOLIDUS (/)</li>
- <li>EOF</li>
- </ul><p>...then emit a U+003C LESS-THAN SIGN character token, a U+002F
- SOLIDUS character token, and switch to the <a href=#data-state>data state</a>
- to process the <a href=#next-input-character>next input character</a>.</p>
-
- <p>Otherwise, if the <a href=#content-model-flag>content model flag</a> is set to the
- PCDATA state, or if the next few characters <em>do</em> match that tag
- name, consume the <a href=#next-input-character>next input character</a>:</p>
-
<dl class=switch><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
<dd>Create a new end tag token, set its tag name to the lowercase
- version of the input character (add 0x0020 to the character's
- code point), then switch to the <a href=#tag-name-state>tag name
+ version of the <a href=#current-input-character>current input character</a> (add 0x0020 to
+ the character's code point), then switch to the <a href=#tag-name-state>tag name
state</a>. (Don't emit the token yet; further details will be
filled in before it is emitted.)</dd>
<dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
- <dd>Create a new end tag token, set its tag name to the input
- character, then switch to the <a href=#tag-name-state>tag name state</a>. (Don't
- emit the token yet; further details will be filled in before it
- is emitted.)</dd>
+ <dd>Create a new end tag token, set its tag name to the
+ <a href=#current-input-character>current input character</a>, then switch to the <a href=#tag-name-state>tag
+ name state</a>. (Don't emit the token yet; further details will
+ be filled in before it is emitted.)</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#data-state>data
@@ -61671,7 +61654,7 @@
<dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#bogus-comment-state>bogus
comment state</a>.</dd>
- </dl><h5 id=tag-name-state><span class=secno>9.2.4.5 </span><dfn>Tag name state</dfn></h5>
+ </dl><h5 id=tag-name-state><span class=secno>9.2.4.9 </span><dfn>Tag name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -61690,27 +61673,372 @@
state</a>.</dd>
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Append the lowercase version of the <a href=#current-input-character>current input character</a>
- (add 0x0020 to the character's code point) to the current tag
- token's tag name. Stay in the <a href=#tag-name-state>tag name state</a>.</dd>
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Stay in the <a href=#tag-name-state>tag name
+ state</a>.</dd>
<dt>EOF</dt>
<dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
<a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current tag token's
- tag name. Stay in the <a href=#tag-name-state>tag name state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ tag token's tag name. Stay in the <a href=#tag-name-state>tag name state</a>.</dd>
- </dl><h5 id=before-attribute-name-state><span class=secno>9.2.4.6 </span><dfn>Before attribute name state</dfn></h5>
+ </dl><h5 id=rcdata-less-than-sign-state><span class=secno>9.2.4.10 </span><dfn>RCDATA less-than sign state</dfn></h5>
+ <!-- identical to the RAWTEXT less-than sign state, except s/RAWTEXT/RCDATA/g -->
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
+ <dl class=switch><dt>U+002F SOLIDUS (/)</dt>
+ <dd>Set the <var><a href=#temporary-buffer>temporary buffer</a></var> to the empty string. Switch
+ to the <a href=#rcdata-end-tag-open-state>RCDATA end tag open state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and reconsume the
+ <a href=#current-input-character>current input character</a> in the <a href=#rcdata-state>RCDATA
+ state</a>.</dd>
+
+ </dl><h5 id=rcdata-end-tag-open-state><span class=secno>9.2.4.11 </span><dfn>RCDATA end tag open state</dfn></h5>
+ <!-- identical to the RAWTEXT (and Script data) end tag open state, except s/RAWTEXT/RCDATA/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ lowercase version of the <a href=#current-input-character>current input character</a> (add
+ 0x0020 to the character's code point). Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#rcdata-end-tag-name-state>RCDATA end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ <a href=#current-input-character>current input character</a>. Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#rcdata-end-tag-name-state>RCDATA end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, and reconsume the <a href=#current-input-character>current input
+ character</a> in the <a href=#rcdata-state>RCDATA state</a>.</dd>
+
+ </dl><h5 id=rcdata-end-tag-name-state><span class=secno>9.2.4.12 </span><dfn>RCDATA end tag name state</dfn></h5>
+ <!-- identical to the RAWTEXT (and Script data) end tag name state, except s/RAWTEXT/RCDATA/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
<dl class=switch><dt>U+0009 CHARACTER TABULATION</dt>
<dt>U+000A LINE FEED (LF)</dt>
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#before-attribute-name-state>before attribute name
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#self-closing-start-tag-state>self-closing start tag
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then emit the current tag token and switch to the
+ <a href=#data-state>data state</a>. Otherwise, treat it as per the "anything
+ else" entry below.</dd>
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#rcdata-end-tag-name-state>RCDATA end tag name state</a>.</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#rcdata-end-tag-name-state>RCDATA end tag name state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, a character token for each of the characters in
+ the <var><a href=#temporary-buffer>temporary buffer</a></var> (in the order they were added to
+ the buffer), and reconsume the <a href=#current-input-character>current input character</a>
+ in the <a href=#rcdata-state>RCDATA state</a>.</dd>
+
+ </dl><h5 id=rawtext-less-than-sign-state><span class=secno>9.2.4.13 </span><dfn>RAWTEXT less-than sign state</dfn></h5>
+ <!-- identical to the RCDATA less-than sign state, except s/RCDATA/RAWTEXT/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002F SOLIDUS (/)</dt>
+ <dd>Set the <var><a href=#temporary-buffer>temporary buffer</a></var> to the empty string. Switch
+ to the <a href=#rawtext-end-tag-open-state>RAWTEXT end tag open state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and reconsume the
+ <a href=#current-input-character>current input character</a> in the <a href=#rawtext-state>RAWTEXT
+ state</a>.</dd>
+
+ </dl><h5 id=rawtext-end-tag-open-state><span class=secno>9.2.4.14 </span><dfn>RAWTEXT end tag open state</dfn></h5>
+ <!-- identical to the RCDATA (and Script data) end tag open state, except s/RCDATA/RAWTEXT/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ lowercase version of the <a href=#current-input-character>current input character</a> (add
+ 0x0020 to the character's code point). Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#rawtext-end-tag-name-state>RAWTEXT end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ <a href=#current-input-character>current input character</a>. Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#rawtext-end-tag-name-state>RAWTEXT end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, and reconsume the <a href=#current-input-character>current input
+ character</a> in the <a href=#rawtext-state>RAWTEXT state</a>.</dd>
+
+ </dl><h5 id=rawtext-end-tag-name-state><span class=secno>9.2.4.15 </span><dfn>RAWTEXT end tag name state</dfn></h5>
+ <!-- identical to the RCDATA (and Script data) end tag name state, except s/RCDATA/RAWTEXT/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0009 CHARACTER TABULATION</dt>
+ <dt>U+000A LINE FEED (LF)</dt>
+ <dt>U+000C FORM FEED (FF)</dt>
+ <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+ <dt>U+0020 SPACE</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#before-attribute-name-state>before attribute name
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#self-closing-start-tag-state>self-closing start tag
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then emit the current tag token and switch to the
+ <a href=#data-state>data state</a>. Otherwise, treat it as per the "anything
+ else" entry below.</dd>
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#rawtext-end-tag-name-state>RAWTEXT end tag name state</a>.</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#rawtext-end-tag-name-state>RAWTEXT end tag name state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, a character token for each of the characters in
+ the <var><a href=#temporary-buffer>temporary buffer</a></var> (in the order they were added to
+ the buffer), and reconsume the <a href=#current-input-character>current input character</a>
+ in the <a href=#rawtext-state>RAWTEXT state</a>.</dd>
+
+ </dl><h5 id=script-data-less-than-sign-state><span class=secno>9.2.4.16 </span><dfn>Script data less-than sign state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002F SOLIDUS (/)</dt>
+ <dd>Set the <var><a href=#temporary-buffer>temporary buffer</a></var> to the empty string. Switch
+ to the <a href=#script-data-end-tag-open-state>script data end tag open state</a>.</dd>
+
+ <dt>U+0021 EXCLAMATION MARK (!)</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and a U+0021
+ EXCLAMATION MARK character token. Switch to the <a href=#script-data-escape-start-state>script data
+ escape start state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and reconsume the
+ <a href=#current-input-character>current input character</a> in the <a href=#script-data-state>script data
+ state</a>.</dd>
+
+ </dl><h5 id=script-data-end-tag-open-state><span class=secno>9.2.4.17 </span><dfn>Script data end tag open state</dfn></h5>
+ <!-- identical to the RCDATA (and RAWTEXT) end tag open state, except s/RCDATA/Script data/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ lowercase version of the <a href=#current-input-character>current input character</a> (add
+ 0x0020 to the character's code point). Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#script-data-end-tag-name-state>script data end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ <a href=#current-input-character>current input character</a>. Append the <a href=#current-input-character>current
+ input character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Finally,
+ switch to the <a href=#script-data-end-tag-name-state>script data end tag name state</a>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, and reconsume the <a href=#current-input-character>current input
+ character</a> in the <a href=#script-data-state>script data state</a>.</dd>
+
+ </dl><h5 id=script-data-end-tag-name-state><span class=secno>9.2.4.18 </span><dfn>Script data end tag name state</dfn></h5>
+ <!-- identical to the RCDATA (and RAWTEXT) end tag name state, except s/RCDATA/Script data/g -->
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0009 CHARACTER TABULATION</dt>
+ <dt>U+000A LINE FEED (LF)</dt>
+ <dt>U+000C FORM FEED (FF)</dt>
+ <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+ <dt>U+0020 SPACE</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#before-attribute-name-state>before attribute name
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then switch to the <a href=#self-closing-start-tag-state>self-closing start tag
+ state</a>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>If the current end tag token is an <a href=#appropriate-end-tag-token>appropriate end tag
+ token</a>, then emit the current tag token and switch to the
+ <a href=#data-state>data state</a>. Otherwise, treat it as per the "anything
+ else" entry below.</dd>
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#script-data-end-tag-name-state>Script data end tag name state</a>.</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ tag token's tag name. Append the <a href=#current-input-character>current input
+ character</a> to the <var><a href=#temporary-buffer>temporary buffer</a></var>. Stay in the
+ <a href=#script-data-end-tag-name-state>Script data end tag name state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, a character token for each of the characters in
+ the <var><a href=#temporary-buffer>temporary buffer</a></var> (in the order they were added to
+ the buffer), and reconsume the <a href=#current-input-character>current input character</a>
+ in the <a href=#script-data-state>script data state</a>.</dd>
+
+ </dl><h5 id=script-data-escape-start-state><span class=secno>9.2.4.19 </span><dfn>Script data escape start state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <a href=#script-data-escape-start-dash-state>script data escape start dash state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Reconsume the <a href=#current-input-character>current input character</a> in the
+ <a href=#script-data-state>script data state</a>.</dd>
+
+ </dl><h5 id=script-data-escape-start-dash-state><span class=secno>9.2.4.20 </span><dfn>Script data escape start dash state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <a href=#script-data-escaped-dash-dash-state>script data escaped dash dash state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Reconsume the <a href=#current-input-character>current input character</a> in the
+ <a href=#script-data-state>script data state</a>.</dd>
+
+ </dl><h5 id=script-data-escaped-state><span class=secno>9.2.4.21 </span><dfn>Script data escaped state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <a href=#script-data-escaped-dash-state>script data escaped dash state</a>.</dd>
+
+ <dt>EOF</dt>
+ <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
+ <a href=#data-state>data state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit the current input character as a character token. Stay in
+ the <a href=#script-data-escaped-state>script data escaped state</a>.</dd>
+
+ </dl><h5 id=script-data-escaped-dash-state><span class=secno>9.2.4.22 </span><dfn>Script data escaped dash state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <a href=#script-data-escaped-dash-dash-state>script data escaped dash dash state</a>.</dd>
+
+ <dt>EOF</dt>
+ <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
+ <a href=#data-state>data state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit the current input character as a character token. Switch
+ to the <a href=#script-data-escaped-state>script data escaped state</a>.</dd>
+
+ </dl><h5 id=script-data-escaped-dash-dash-state><span class=secno>9.2.4.23 </span><dfn>Script data escaped dash dash state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Stay in the
+ <a href=#script-data-escaped-dash-dash-state>script data escaped dash dash state</a>.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>Emit a U+003E GREATER-THAN SIGN character token. Switch to the
+ <a href=#script-data-state>script data state</a>.</dd>
+
+ <dt>EOF</dt>
+ <dd><a href=#parse-error>Parse error</a>. Reconsume the EOF character in the
+ <a href=#data-state>data state</a>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit the current input character as a character token. Switch
+ to the <a href=#script-data-escaped-state>script data escaped state</a>.</dd>
+
+ </dl><h5 id=before-attribute-name-state><span class=secno>9.2.4.24 </span><dfn>Before attribute name state</dfn></h5>
+
+ <p>Consume the <a href=#next-input-character>next input character</a>:</p>
+
+ <dl class=switch><dt>U+0009 CHARACTER TABULATION</dt>
+ <dt>U+000A LINE FEED (LF)</dt>
+ <dt>U+000C FORM FEED (FF)</dt>
+ <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+ <dt>U+0020 SPACE</dt>
<dd>Stay in the <a href=#before-attribute-name-state>before attribute name state</a>.</dd>
<dt>U+002F SOLIDUS (/)</dt>
@@ -61744,7 +62072,7 @@
the empty string. Switch to the <a href=#attribute-name-state>attribute name
state</a>.</dd>
- </dl><h5 id=attribute-name-state><span class=secno>9.2.4.7 </span><dfn>Attribute name state</dfn></h5>
+ </dl><h5 id=attribute-name-state><span class=secno>9.2.4.25 </span><dfn>Attribute name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -61766,9 +62094,9 @@
state</a>.</dd>
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Append the lowercase version of the <a href=#current-input-character>current input character</a>
- (add 0x0020 to the character's code point) to the current
- attribute's name. Stay in the <a href=#attribute-name-state>attribute name
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current attribute's name. Stay in the <a href=#attribute-name-state>attribute name
state</a>.</dd>
<dt>U+0022 QUOTATION MARK (")</dt>
@@ -61782,8 +62110,9 @@
<a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current attribute's
- name. Stay in the <a href=#attribute-name-state>attribute name state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ attribute's name. Stay in the <a href=#attribute-name-state>attribute name
+ state</a>.</dd>
</dl><p>When the user agent leaves the attribute name state (and before
emitting the tag token, if appropriate), the complete attribute's
@@ -61794,7 +62123,7 @@
associated with it (if any).</p>
- <h5 id=after-attribute-name-state><span class=secno>9.2.4.8 </span><dfn>After attribute name state</dfn></h5>
+ <h5 id=after-attribute-name-state><span class=secno>9.2.4.26 </span><dfn>After attribute name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -61817,10 +62146,10 @@
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
<dd>Start a new attribute in the current tag token. Set that
- attribute's name to the lowercase version of the <a href=#current-input-character>current input character</a>
- (add 0x0020 to the character's code point), and its value to
- the empty string. Switch to the <a href=#attribute-name-state>attribute name
- state</a>.</dd>
+ attribute's name to the lowercase version of the <a href=#current-input-character>current
+ input character</a> (add 0x0020 to the character's code point),
+ and its value to the empty string. Switch to the <a href=#attribute-name-state>attribute
+ name state</a>.</dd>
<dt>U+0022 QUOTATION MARK (")</dt>
<dt>U+0027 APOSTROPHE (')</dt>
@@ -61834,11 +62163,11 @@
<dt>Anything else</dt>
<dd>Start a new attribute in the current tag token. Set that
- attribute's name to the <a href=#current-input-character>current input character</a>, and its value to
- the empty string. Switch to the <a href=#attribute-name-state>attribute name
+ attribute's name to the <a href=#current-input-character>current input character</a>, and
+ its value to the empty string. Switch to the <a href=#attribute-name-state>attribute name
state</a>.</dd>
- </dl><h5 id=before-attribute-value-state><span class=secno>9.2.4.9 </span><dfn>Before attribute value state</dfn></h5>
+ </dl><h5 id=before-attribute-value-state><span class=secno>9.2.4.27 </span><dfn>Before attribute value state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -61854,7 +62183,7 @@
<dt>U+0026 AMPERSAND (&)</dt>
<dd>Switch to the <a href=#attribute-value-(unquoted)-state>attribute value (unquoted) state</a>
- and reconsume this input character.</dd>
+ and reconsume this <a href=#current-input-character>current input character</a>.</dd>
<dt>U+0027 APOSTROPHE (')</dt>
<dd>Switch to the <a href=#attribute-value-(single-quoted)-state>attribute value (single-quoted) state</a>.</dd>
@@ -61878,7 +62207,7 @@
attribute's value. Switch to the <a href=#attribute-value-(unquoted)-state>attribute value (unquoted)
state</a>.</dd>
- </dl><h5 id=attribute-value-(double-quoted)-state><span class=secno>9.2.4.10 </span><dfn>Attribute value (double-quoted) state</dfn></h5>
+ </dl><h5 id=attribute-value-(double-quoted)-state><span class=secno>9.2.4.28 </span><dfn>Attribute value (double-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -61896,11 +62225,11 @@
<a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current attribute's
- value. Stay in the <a href=#attribute-value-(double-quoted)-state>attribute value (double-quoted)
- state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ attribute's value. Stay in the <a href=#attribute-value-(double-quoted)-state>attribute value
+ (double-quoted) state</a>.</dd>
- </dl><h5 id=attribute-value-(single-quoted)-state><span class=secno>9.2.4.11 </span><dfn>Attribute value (single-quoted) state</dfn></h5>
+ </dl><h5 id=attribute-value-(single-quoted)-state><span class=secno>9.2.4.29 </span><dfn>Attribute value (single-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -61918,11 +62247,11 @@
<a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current attribute's
- value. Stay in the <a href=#attribute-value-(single-quoted)-state>attribute value (single-quoted)
- state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ attribute's value. Stay in the <a href=#attribute-value-(single-quoted)-state>attribute value
+ (single-quoted) state</a>.</dd>
- </dl><h5 id=attribute-value-(unquoted)-state><span class=secno>9.2.4.12 </span><dfn>Attribute value (unquoted) state</dfn></h5>
+ </dl><h5 id=attribute-value-(unquoted)-state><span class=secno>9.2.4.30 </span><dfn>Attribute value (unquoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -61955,11 +62284,11 @@
<a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current attribute's
- value. Stay in the <a href=#attribute-value-(unquoted)-state>attribute value (unquoted)
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ attribute's value. Stay in the <a href=#attribute-value-(unquoted)-state>attribute value (unquoted)
state</a>.</dd>
- </dl><h5 id=character-reference-in-attribute-value-state><span class=secno>9.2.4.13 </span><dfn>Character reference in attribute value state</dfn></h5>
+ </dl><h5 id=character-reference-in-attribute-value-state><span class=secno>9.2.4.31 </span><dfn>Character reference in attribute value state</dfn></h5>
<p>Attempt to <a href=#consume-a-character-reference>consume a character reference</a>.</p>
@@ -61973,7 +62302,7 @@
in when were switched into this state.</p>
- <h5 id=after-attribute-value-(quoted)-state><span class=secno>9.2.4.14 </span><dfn>After attribute value (quoted) state</dfn></h5>
+ <h5 id=after-attribute-value-(quoted)-state><span class=secno>9.2.4.32 </span><dfn>After attribute value (quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -61999,7 +62328,7 @@
<dd><a href=#parse-error>Parse error</a>. Reconsume the character in
the <a href=#before-attribute-name-state>before attribute name state</a>.</dd>
- </dl><h5 id=self-closing-start-tag-state><span class=secno>9.2.4.15 </span><dfn>Self-closing start tag state</dfn></h5>
+ </dl><h5 id=self-closing-start-tag-state><span class=secno>9.2.4.33 </span><dfn>Self-closing start tag state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62016,11 +62345,8 @@
<dd><a href=#parse-error>Parse error</a>. Reconsume the character in
the <a href=#before-attribute-name-state>before attribute name state</a>.</dd>
- </dl><h5 id=bogus-comment-state><span class=secno>9.2.4.16 </span><dfn>Bogus comment state</dfn></h5>
+ </dl><h5 id=bogus-comment-state><span class=secno>9.2.4.34 </span><dfn>Bogus comment state</dfn></h5>
- <p><i>(This can only happen if the <a href=#content-model-flag>content model
- flag</a> is set to the PCDATA state.)</i></p>
-
<p>Consume every character up to and including the first U+003E
GREATER-THAN SIGN character (>) or the end of the file (EOF),
whichever comes first. Emit a comment token whose data is the
@@ -62037,11 +62363,8 @@
character.</p>
- <h5 id=markup-declaration-open-state><span class=secno>9.2.4.17 </span><dfn>Markup declaration open state</dfn></h5>
+ <h5 id=markup-declaration-open-state><span class=secno>9.2.4.35 </span><dfn>Markup declaration open state</dfn></h5>
- <p><i>(This can only happen if the <a href=#content-model-flag>content model
- flag</a> is set to the PCDATA state.)</i></p>
-
<p>If the next two characters are both U+002D HYPHEN-MINUS (-)
characters, consume those two characters, create a comment token
whose data is the empty string, and switch to the <a href=#comment-start-state>comment
@@ -62065,7 +62388,7 @@
comment.</p>
- <h5 id=comment-start-state><span class=secno>9.2.4.18 </span><dfn>Comment start state</dfn></h5>
+ <h5 id=comment-start-state><span class=secno>9.2.4.36 </span><dfn>Comment start state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62082,10 +62405,10 @@
the EOF character in the <a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the input character to the comment token's
- data. Switch to the <a href=#comment-state>comment state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the comment
+ token's data. Switch to the <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=comment-start-dash-state><span class=secno>9.2.4.19 </span><dfn>Comment start dash state</dfn></h5>
+ </dl><h5 id=comment-start-dash-state><span class=secno>9.2.4.37 </span><dfn>Comment start dash state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62102,11 +62425,11 @@
in comment end state -->
<dt>Anything else</dt>
- <dd>Append a U+002D HYPHEN-MINUS character (-) and the input
- character to the comment token's data. Switch to the
- <a href=#comment-state>comment state</a>.</dd>
+ <dd>Append a U+002D HYPHEN-MINUS character (-) and the
+ <a href=#current-input-character>current input character</a> to the comment token's
+ data. Switch to the <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=comment-state><span class=secno>9.2.4.20 </span><dfn id=comment>Comment state</dfn></h5>
+ </dl><h5 id=comment-state><span class=secno>9.2.4.38 </span><dfn id=comment>Comment state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62119,10 +62442,10 @@
in comment end state -->
<dt>Anything else</dt>
- <dd>Append the input character to the comment token's data. Stay
- in the <a href=#comment-state>comment state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the comment
+ token's data. Stay in the <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=comment-end-dash-state><span class=secno>9.2.4.21 </span><dfn>Comment end dash state</dfn></h5>
+ </dl><h5 id=comment-end-dash-state><span class=secno>9.2.4.39 </span><dfn>Comment end dash state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62135,11 +62458,11 @@
in comment end state -->
<dt>Anything else</dt>
- <dd>Append a U+002D HYPHEN-MINUS character (-) and the input
- character to the comment token's data. Switch to the
- <a href=#comment-state>comment state</a>.</dd>
+ <dd>Append a U+002D HYPHEN-MINUS character (-) and the
+ <a href=#current-input-character>current input character</a> to the comment token's
+ data. Switch to the <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=comment-end-state><span class=secno>9.2.4.22 </span><dfn>Comment end state</dfn></h5>
+ </dl><h5 id=comment-end-state><span class=secno>9.2.4.40 </span><dfn>Comment end state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62153,8 +62476,9 @@
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
<dd><a href=#parse-error>Parse error</a>. Append two U+002D HYPHEN-MINUS (-)
- characters and the input character to the comment token's
- data. Switch to the <a href=#comment-end-space-state>comment end space state</a>.</dd>
+ characters and the <a href=#current-input-character>current input character</a> to the
+ comment token's data. Switch to the <a href=#comment-end-space-state>comment end space
+ state</a>.</dd>
<dt>U+0021 EXCLAMATION MARK (!)</dt>
<dd><a href=#parse-error>Parse error</a>. Switch to the <a href=#comment-end-bang-state>comment end bang
@@ -62175,10 +62499,11 @@
<dt>Anything else</dt>
<dd><a href=#parse-error>Parse error</a>. Append two U+002D HYPHEN-MINUS (-)
- characters and the input character to the comment token's
- data. Switch to the <a href=#comment-state>comment state</a>.</dd>
+ characters and the <a href=#current-input-character>current input character</a> to the
+ comment token's data. Switch to the <a href=#comment-state>comment
+ state</a>.</dd>
- </dl><h5 id=comment-end-bang-state><span class=secno>9.2.4.23 </span><dfn>Comment end bang state</dfn></h5>
+ </dl><h5 id=comment-end-bang-state><span class=secno>9.2.4.41 </span><dfn>Comment end bang state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62198,11 +62523,11 @@
<dt>Anything else</dt>
<dd>Append two U+002D HYPHEN-MINUS (-) characters, a U+0021
- EXCLAMATION MARK character (!), and the input character to the
- comment token's data. Switch to the <a href=#comment-state>comment
- state</a>.</dd>
+ EXCLAMATION MARK character (!), and the <a href=#current-input-character>current input
+ character</a> to the comment token's data. Switch to the
+ <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=comment-end-space-state><span class=secno>9.2.4.24 </span><dfn>Comment end space state</dfn></h5>
+ </dl><h5 id=comment-end-space-state><span class=secno>9.2.4.42 </span><dfn>Comment end space state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62211,7 +62536,7 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Append the input character to the comment token's data. Stay in
+ <dd>Append the <a href=#current-input-character>current input character</a> to the comment token's data. Stay in
the <a href=#comment-end-space-state>comment end space state</a>.</dd>
<dt>U+002D HYPHEN-MINUS (-)</dt>
@@ -62227,10 +62552,10 @@
comment in comment end state -->
<dt>Anything else</dt>
- <dd>Append the input character to the comment token's data. Switch
+ <dd>Append the <a href=#current-input-character>current input character</a> to the comment token's data. Switch
to the <a href=#comment-state>comment state</a>.</dd>
- </dl><h5 id=doctype-state><span class=secno>9.2.4.25 </span><dfn>DOCTYPE state</dfn></h5>
+ </dl><h5 id=doctype-state><span class=secno>9.2.4.43 </span><dfn>DOCTYPE state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62250,7 +62575,7 @@
<dd><a href=#parse-error>Parse error</a>. Reconsume the current
character in the <a href=#before-doctype-name-state>before DOCTYPE name state</a>.</dd>
- </dl><h5 id=before-doctype-name-state><span class=secno>9.2.4.26 </span><dfn>Before DOCTYPE name state</dfn></h5>
+ </dl><h5 id=before-doctype-name-state><span class=secno>9.2.4.44 </span><dfn>Before DOCTYPE name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62263,7 +62588,7 @@
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
<dd>Create a new DOCTYPE token. Set the token's name to the
- lowercase version of the input character (add 0x0020 to the
+ lowercase version of the <a href=#current-input-character>current input character</a> (add 0x0020 to the
character's code point). Switch to the <a href=#doctype-name-state>DOCTYPE name
state</a>.</dd>
@@ -62282,7 +62607,7 @@
<a href=#current-input-character>current input character</a>. Switch to the <a href=#doctype-name-state>DOCTYPE name
state</a>.</dd>
- </dl><h5 id=doctype-name-state><span class=secno>9.2.4.27 </span><dfn>DOCTYPE name state</dfn></h5>
+ </dl><h5 id=doctype-name-state><span class=secno>9.2.4.45 </span><dfn>DOCTYPE name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62298,9 +62623,10 @@
state</a>.</dd>
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Append the lowercase version of the input character (add 0x0020
- to the character's code point) to the current DOCTYPE token's
- name. Stay in the <a href=#doctype-name-state>DOCTYPE name state</a>.</dd>
+ <dd>Append the lowercase version of the <a href=#current-input-character>current input
+ character</a> (add 0x0020 to the character's code point) to the
+ current DOCTYPE token's name. Stay in the <a href=#doctype-name-state>DOCTYPE name
+ state</a>.</dd>
<dt>EOF</dt>
<dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
@@ -62308,10 +62634,11 @@
Reconsume the EOF character in the <a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current DOCTYPE
- token's name. Stay in the <a href=#doctype-name-state>DOCTYPE name state</a>.</dd>
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ DOCTYPE token's name. Stay in the <a href=#doctype-name-state>DOCTYPE name
+ state</a>.</dd>
- </dl><h5 id=after-doctype-name-state><span class=secno>9.2.4.28 </span><dfn>After DOCTYPE name state</dfn></h5>
+ </dl><h5 id=after-doctype-name-state><span class=secno>9.2.4.46 </span><dfn>After DOCTYPE name state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62351,7 +62678,7 @@
</dd>
- </dl><h5 id=after-doctype-public-keyword-state><span class=secno>9.2.4.29 </span><dfn>After DOCTYPE public keyword state</dfn></h5>
+ </dl><h5 id=after-doctype-public-keyword-state><span class=secno>9.2.4.47 </span><dfn>After DOCTYPE public keyword state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62372,7 +62699,7 @@
<dd><a href=#parse-error>Parse error</a>. Reconsume the current character in
the <a href=#before-doctype-public-identifier-state>before DOCTYPE public identifier state</a>.</dd>
- </dl><h5 id=before-doctype-public-identifier-state><span class=secno>9.2.4.30 </span><dfn>Before DOCTYPE public identifier state</dfn></h5>
+ </dl><h5 id=before-doctype-public-identifier-state><span class=secno>9.2.4.48 </span><dfn>Before DOCTYPE public identifier state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62408,7 +62735,7 @@
<i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#bogus-doctype-state>bogus
DOCTYPE state</a>.</dd>
- </dl><h5 id=doctype-public-identifier-(double-quoted)-state><span class=secno>9.2.4.31 </span><dfn>DOCTYPE public identifier (double-quoted) state</dfn></h5>
+ </dl><h5 id=doctype-public-identifier-(double-quoted)-state><span class=secno>9.2.4.49 </span><dfn>DOCTYPE public identifier (double-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62430,7 +62757,7 @@
token's public identifier. Stay in the <a href=#doctype-public-identifier-(double-quoted)-state>DOCTYPE public
identifier (double-quoted) state</a>.</dd>
- </dl><h5 id=doctype-public-identifier-(single-quoted)-state><span class=secno>9.2.4.32 </span><dfn>DOCTYPE public identifier (single-quoted) state</dfn></h5>
+ </dl><h5 id=doctype-public-identifier-(single-quoted)-state><span class=secno>9.2.4.50 </span><dfn>DOCTYPE public identifier (single-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62452,7 +62779,7 @@
token's public identifier. Stay in the <a href=#doctype-public-identifier-(single-quoted)-state>DOCTYPE public
identifier (single-quoted) state</a>.</dd>
- </dl><h5 id=after-doctype-public-identifier-state><span class=secno>9.2.4.33 </span><dfn>After DOCTYPE public identifier state</dfn></h5>
+ </dl><h5 id=after-doctype-public-identifier-state><span class=secno>9.2.4.51 </span><dfn>After DOCTYPE public identifier state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62461,7 +62788,8 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Switch to the <a href=#between-doctype-public-and-system-identifiers-state>between DOCTYPE public and system identifiers state</a>.</dd>
+ <dd>Switch to the <a href=#between-doctype-public-and-system-identifiers-state>between DOCTYPE public and system
+ identifiers state</a>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd>Emit the current DOCTYPE token. Switch to the <a href=#data-state>data
@@ -62487,7 +62815,7 @@
<i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#bogus-doctype-state>bogus
DOCTYPE state</a>.</dd>
- </dl><h5 id=between-doctype-public-and-system-identifiers-state><span class=secno>9.2.4.34 </span><dfn>Between DOCTYPE public and system identifiers state</dfn></h5>
+ </dl><h5 id=between-doctype-public-and-system-identifiers-state><span class=secno>9.2.4.52 </span><dfn>Between DOCTYPE public and system identifiers state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62496,7 +62824,8 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Stay in the <a href=#between-doctype-public-and-system-identifiers-state>between DOCTYPE public and system identifiers state</a>.</dd>
+ <dd>Stay in the <a href=#between-doctype-public-and-system-identifiers-state>between DOCTYPE public and system identifiers
+ state</a>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd>Emit the current DOCTYPE token. Switch to the <a href=#data-state>data
@@ -62522,7 +62851,7 @@
<i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#bogus-doctype-state>bogus
DOCTYPE state</a>.</dd>
- </dl><h5 id=after-doctype-system-keyword-state><span class=secno>9.2.4.35 </span><dfn>After DOCTYPE system keyword state</dfn></h5>
+ </dl><h5 id=after-doctype-system-keyword-state><span class=secno>9.2.4.53 </span><dfn>After DOCTYPE system keyword state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62543,7 +62872,7 @@
<dd><a href=#parse-error>Parse error</a>. Reconsume the current character in
the <a href=#before-doctype-system-identifier-state>before DOCTYPE system identifier state</a>.</dd>
- </dl><h5 id=before-doctype-system-identifier-state><span class=secno>9.2.4.36 </span><dfn>Before DOCTYPE system identifier state</dfn></h5>
+ </dl><h5 id=before-doctype-system-identifier-state><span class=secno>9.2.4.54 </span><dfn>Before DOCTYPE system identifier state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62579,12 +62908,13 @@
<i>force-quirks flag</i> to <i>on</i>. Switch to the <a href=#bogus-doctype-state>bogus
DOCTYPE state</a>.</dd>
- </dl><h5 id=doctype-system-identifier-(double-quoted)-state><span class=secno>9.2.4.37 </span><dfn>DOCTYPE system identifier (double-quoted) state</dfn></h5>
+ </dl><h5 id=doctype-system-identifier-(double-quoted)-state><span class=secno>9.2.4.55 </span><dfn>DOCTYPE system identifier (double-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
<dl class=switch><dt>U+0022 QUOTATION MARK (")</dt>
- <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier state</a>.</dd>
+ <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier
+ state</a>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
@@ -62597,16 +62927,17 @@
Reconsume the EOF character in the <a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current DOCTYPE
- token's system identifier. Stay in the <a href=#doctype-system-identifier-(double-quoted)-state>DOCTYPE system
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ DOCTYPE token's system identifier. Stay in the <a href=#doctype-system-identifier-(double-quoted)-state>DOCTYPE system
identifier (double-quoted) state</a>.</dd>
- </dl><h5 id=doctype-system-identifier-(single-quoted)-state><span class=secno>9.2.4.38 </span><dfn>DOCTYPE system identifier (single-quoted) state</dfn></h5>
+ </dl><h5 id=doctype-system-identifier-(single-quoted)-state><span class=secno>9.2.4.56 </span><dfn>DOCTYPE system identifier (single-quoted) state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
<dl class=switch><dt>U+0027 APOSTROPHE (')</dt>
- <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier state</a>.</dd>
+ <dd>Switch to the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier
+ state</a>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd><a href=#parse-error>Parse error</a>. Set the DOCTYPE token's
@@ -62619,11 +62950,11 @@
Reconsume the EOF character in the <a href=#data-state>data state</a>.</dd>
<dt>Anything else</dt>
- <dd>Append the <a href=#current-input-character>current input character</a> to the current DOCTYPE
- token's system identifier. Stay in the <a href=#doctype-system-identifier-(single-quoted)-state>DOCTYPE system
+ <dd>Append the <a href=#current-input-character>current input character</a> to the current
+ DOCTYPE token's system identifier. Stay in the <a href=#doctype-system-identifier-(single-quoted)-state>DOCTYPE system
identifier (single-quoted) state</a>.</dd>
- </dl><h5 id=after-doctype-system-identifier-state><span class=secno>9.2.4.39 </span><dfn>After DOCTYPE system identifier state</dfn></h5>
+ </dl><h5 id=after-doctype-system-identifier-state><span class=secno>9.2.4.57 </span><dfn>After DOCTYPE system identifier state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62632,7 +62963,8 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Stay in the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier state</a>.</dd>
+ <dd>Stay in the <a href=#after-doctype-system-identifier-state>after DOCTYPE system identifier
+ state</a>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd>Emit the current DOCTYPE token. Switch to the <a href=#data-state>data
@@ -62648,7 +62980,7 @@
state</a>. (This does <em>not</em> set the DOCTYPE token's
<i>force-quirks flag</i> to <i>on</i>.)</dd>
- </dl><h5 id=bogus-doctype-state><span class=secno>9.2.4.40 </span><dfn>Bogus DOCTYPE state</dfn></h5>
+ </dl><h5 id=bogus-doctype-state><span class=secno>9.2.4.58 </span><dfn>Bogus DOCTYPE state</dfn></h5>
<p>Consume the <a href=#next-input-character>next input character</a>:</p>
@@ -62663,11 +62995,8 @@
<dt>Anything else</dt>
<dd>Stay in the <a href=#bogus-doctype-state>bogus DOCTYPE state</a>.</dd>
- </dl><h5 id=cdata-section-state><span class=secno>9.2.4.41 </span><dfn>CDATA section state</dfn></h5>
+ </dl><h5 id=cdata-section-state><span class=secno>9.2.4.59 </span><dfn>CDATA section state</dfn></h5>
- <p><i>(This can only happen if the <a href=#content-model-flag>content model
- flag</a> is set to the PCDATA state.)</i></p>
-
<p>Consume every character up to the next occurrence of the three
character sequence U+005D RIGHT SQUARE BRACKET U+005D RIGHT SQUARE
BRACKET U+003E GREATER-THAN SIGN (<code title="">]]></code>), or the
@@ -62683,7 +63012,7 @@
- <h5 id=tokenizing-character-references><span class=secno>9.2.4.42 </span>Tokenizing character references</h5>
+ <h5 id=tokenizing-character-references><span class=secno>9.2.4.60 </span>Tokenizing character references</h5>
<p>This section defines how to <dfn id=consume-a-character-reference>consume a character
reference</dfn>. This definition is used when parsing character
@@ -63130,11 +63459,10 @@
<ol><li><p><a href=#insert-an-html-element>Insert an HTML element</a> for the token.</li>
<li><p>If the algorithm that was invoked is the <a href=#generic-raw-text-element-parsing-algorithm>generic raw
- text element parsing algorithm</a>, switch the tokenizer's
- <a href=#content-model-flag>content model flag</a> to the RAWTEXT state; otherwise the
- algorithm invoked was the <a href=#generic-rcdata-element-parsing-algorithm>generic RCDATA element parsing
- algorithm</a>, switch the tokenizer's <a href=#content-model-flag>content model
- flag</a> to the RCDATA state.</li>
+ text element parsing algorithm</a>, switch the tokenizer to the
+ <a href=#rawtext-state>RAWTEXT state</a>; otherwise the algorithm invoked
+ was the <a href=#generic-rcdata-element-parsing-algorithm>generic RCDATA element parsing algorithm</a>,
+ switch the tokenizer to the <a href=#rcdata-state>RCDATA state</a>.</li>
<li><p>Let the <a href=#original-insertion-mode>original insertion mode</a> be the current
<a href=#insertion-mode>insertion mode</a>.</p>
@@ -63648,8 +63976,8 @@
and push it onto the <a href=#stack-of-open-elements>stack of open
elements</a>.</li>
- <li><p>Switch the tokenizer's <a href=#content-model-flag>content model flag</a> to
- the RAWTEXT state.</li>
+ <li><p>Switch the tokenizer to the <a href=#script-data-state>script data
+ state</a>.</li>
<li><p>Let the <a href=#original-insertion-mode>original insertion mode</a> be the current
<a href=#insertion-mode>insertion mode</a>.</p>
@@ -64188,14 +64516,12 @@
<p><a href=#insert-an-html-element>Insert an HTML element</a> for the token.</p>
- <p>Switch the <a href=#content-model-flag>content model flag</a> to the PLAINTEXT
- state.</p>
+ <p>Switch the tokenizer to the <a href=#plaintext-state>PLAINTEXT state</a>.</p>
- <p class=note>Once a start tag with the tag name "plaintext"
- has been seen, that will be the last token ever seen other
- than character tokens (and the end-of-file token), because
- there is no way to switch the <a href=#content-model-flag>content model flag</a>
- out of the PLAINTEXT state.</p>
+ <p class=note>Once a start tag with the tag name "plaintext" has
+ been seen, that will be the last token ever seen other than
+ character tokens (and the end-of-file token), because there is no
+ way to switch out of the <a href=#plaintext-state>PLAINTEXT state</a>.</p>
</dd>
@@ -64791,8 +65117,8 @@
one. (Newlines at the start of <code><a href=#the-textarea-element>textarea</a></code> elements are
ignored as an authoring convenience.)</li>
- <li><p>Switch the tokenizer's <a href=#content-model-flag>content model flag</a> to
- the RCDATA state.</li>
+ <li><p>Switch the tokenizer to the the <a href=#rcdata-state>RCDATA
+ state</a>.</li>
<li><p>Let the <a href=#original-insertion-mode>original insertion mode</a> be the
current <a href=#insertion-mode>insertion mode</a>.</p>
@@ -67154,42 +67480,38 @@
<ol><li>
- <p>Set the <a href=#html-parser>HTML parser</a>'s <a href=#tokenization>tokenization</a>
- stage's <a href=#content-model-flag>content model flag</a> according to the <var title="">context</var> element, as follows:</p>
+ <p>Set the state of the <a href=#html-parser>HTML parser</a>'s
+ <a href=#tokenization>tokenization</a> stage as follows:</p>
<dl class=switch><dt>If it is a <code><a href=#the-title-element-0>title</a></code> or <code><a href=#the-textarea-element>textarea</a></code>
element</dt>
- <dd>Set the <a href=#content-model-flag>content model flag</a> to
- the RCDATA state.</dd>
+ <dd>Switch the tokenizer to the <a href=#rcdata-state>RCDATA state</a>.</dd>
<dt>If it is a <code><a href=#the-style-element>style</a></code>, <code><a href=#script>script</a></code>,
<code><a href=#xmp>xmp</a></code>, <code><a href=#the-iframe-element>iframe</a></code>, <code><a href=#noembed>noembed</a></code>, or
<code><a href=#noframes>noframes</a></code> element</dt>
- <dd>Set the <a href=#content-model-flag>content model flag</a> to
- the RAWTEXT state.</dd>
+ <dd>Switch the tokenizer to the <a href=#rawtext-state>RAWTEXT state</a>.</dd>
<dt>If it is a <code><a href=#the-noscript-element>noscript</a></code> element</dt>
- <dd>If the <a href=#scripting-flag>scripting flag</a> is enabled, set the
- <a href=#content-model-flag>content model flag</a> to the RAWTEXT
- state. Otherwise, set the <a href=#content-model-flag>content model flag</a> to the
- PCDATA state.</dd>
+ <dd>If the <a href=#scripting-flag>scripting flag</a> is enabled, switch the
+ tokenizer to the <a href=#rawtext-state>RAWTEXT state</a>. Otherwise,
+ leave the tokenizer in the <a href=#data-state>data state</a>.</dd>
<dt>If it is a <code><a href=#plaintext>plaintext</a></code> element</dt>
- <dd>Set the <a href=#content-model-flag>content model flag</a> to
- PLAINTEXT.</dd>
+ <dd>Switch the tokenizer to the <a href=#plaintext-state>PLAINTEXT
+ state</a>.</dd>
<dt>Otherwise</dt>
- <dd>Leave the <a href=#content-model-flag>content model flag</a> in the PCDATA
- state.</dd>
+ <dd>Leave the tokenizer in the <a href=#data-state>data state</a>.</dd>
</dl></li>
Modified: source
===================================================================
--- source 2009-10-19 05:52:18 UTC (rev 4176)
+++ source 2009-10-19 11:00:31 UTC (rev 4177)
@@ -9981,9 +9981,9 @@
<p>If <var title="">type</var> is <em>not</em> now an <span>ASCII
case-insensitive</span> match for the string
"<code>text/html</code>", then act as if the tokenizer had emitted
- a start tag token with the tag name "pre", then set the <span>HTML
- parser</span>'s <span>tokenization</span> stage's <span>content
- model flag</span> to <i title="">PLAINTEXT</i>.</p>
+ a start tag token with the tag name "pre", then switch the
+ <span>HTML parser</span>'s tokenizer to the <span>PLAINTEXT
+ state</span>.</p>
<!--
http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E...%3Ciframe%3E%3C%2Fiframe%3E%3Cscript%3Eonload%20%3D%20function%20()%20%7B%20%0D%0A%20%20var%20d%20%3D%20document.getElementsByTagName('iframe')%5B0%5D.contentDocument%3B%0D%0A%20%20d.open('image%2Fsvg%2Bxml')%3B%0D%0A%20%20d.write(%22%3Cinput%20xmlns%3D'http%3A%2F%2Fwww.w3.org%2F1999%2Fxhtml'%20value%3D'(x)html'%2F%3E%22)%3B%0D%0A%20%20d.close()%3B%0D%0A%7D%3B%3C%2Fscript%3E
@@ -62932,9 +62932,9 @@
<code>Document</code> object</span>, mark it as being an <span
title="HTML documents">HTML document</span>, create an <span>HTML
parser</span>, associate it with the document, act as if the
- tokenizer had emitted a start tag token with the tag name "pre", set
- the <span>tokenization</span> stage's <span>content model
- flag</span> to <i title="">PLAINTEXT</i>, and begin to pass the stream of
+ tokenizer had emitted a start tag token with the tag name "pre",
+ switch the <span>HTML parser</span>'s tokenizer to the
+ <span>PLAINTEXT state</span>, and begin to pass the stream of
characters in the plain text document to that tokenizer.</p>
<p>The rules for how to convert the bytes of the plain text document
@@ -79210,18 +79210,13 @@
switches it to a new state (to consume the next character), or
repeats the same state (to consume the next character). Some states
have more complicated behavior and can consume several characters
- before switching to another state.</p>
+ before switching to another state. In some cases, the tokenizer
+ state is also changed by the tree construction stage.</p>
- <p>The exact behavior of certain states depends on a <dfn>content
- model flag</dfn> that is set after certain tokens are emitted. The
- flag has several states: <i title="">PCDATA</i>, <i
- title="">RCDATA</i>, <i title="">RAWTEXT</i>, and <i
- title="">PLAINTEXT</i>. Initially, it must be in the PCDATA
- state. In the RCDATA and RAWTEXT states, a further <dfn>escape
- flag</dfn> is used to control the behavior of the tokenizer. It is
- either true or false, and initially must be set to the false
- state. The <span>insertion mode</span> and the <span>stack of open
- elements</span> also affects tokenization.</p>
+ <p>The exact behavior of certain states depends on the
+ <span>insertion mode</span> and the <span>stack of open
+ elements</span>. Certain states also use a <dfn><var>temporary
+ buffer</var></dfn> to track progress.</p>
<p>The output of the tokenization step is a series of zero or more
of the following tokens: DOCTYPE, start tag, end tag, comment,
@@ -79240,8 +79235,8 @@
<p>When a token is emitted, it must immediately be handled by the
<span>tree construction</span> stage. The tree construction stage
- can affect the state of the <span>content model flag</span>, and can
- insert additional characters into the stream. (For example, the
+ can affect the state of the tokenization stage, and can insert
+ additional characters into the stream. (For example, the
<code>script</code> element can result in scripts executing and
using the <span>dynamic markup insertion</span> APIs to insert
characters into the stream being tokenized.)</p>
@@ -79251,15 +79246,18 @@
self-closing flag">acknowledged</dfn> when it is processed by the
tree construction stage, that is a <span>parse error</span>.</p>
- <p>When an end tag token is emitted, the <span>content model
- flag</span> must be switched to the PCDATA state.</p>
-
<p>When an end tag token is emitted with attributes, that is a
<span>parse error</span>.</p>
<p>When an end tag token is emitted with its <i>self-closing
flag</i> set, that is a <span>parse error</span>.</p>
+ <p>An <dfn>appropriate end tag token</dfn> is an end tag token whose
+ tag name matches the tag name of the last start tag to have been
+ emitted from this tokenizer, if any. If no start tag has been
+ emitted from this tokenizer, then no end tag token is
+ appropriate.</p>
+
<p>Before each step of the tokenizer, the user agent must first
check the <span>parser pause flag</span>. If it is true, then the
tokenizer must abort the processing of any nested invocations of the
@@ -79268,9 +79266,11 @@
<p>The tokenizer state machine consists of the states defined in the
following subsections.</p>
+
<!-- Order of the lists below is supposed to be non-error then
error, by unicode, then EOF, ending with "anything else" -->
+
<h5><dfn>Data state</dfn></h5>
<p>Consume the <span>next input character</span>:</p>
@@ -79278,196 +79278,172 @@
<dl class="switch">
<dt>U+0026 AMPERSAND (&)</dt>
- <dd>When the <span>content model flag</span> is set to one of the
- PCDATA or RCDATA states and the <span>escape flag</span> is
- false: switch to the <span>character reference in data
+ <dd>Switch to the <span>character reference in data
state</span>.</dd>
- <dd>Otherwise: treat it as per the "anything else" entry
- below.</dd>
- <dt>U+002D HYPHEN-MINUS (-)</dt>
- <dd>
+ <dt>U+003C LESS-THAN SIGN (<)</dt>
+ <dd>Switch to the <span>tag open state</span>.</dd>
- <p>If the <span>content model flag</span> is set to either the
- RCDATA state or the RAWTEXT state, and the <span>escape flag</span>
- is false, and there are at least three characters before this
- one in the input stream, and the last four characters in the
- input stream, including this one, are U+003C LESS-THAN SIGN,
- U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, and U+002D
- HYPHEN-MINUS ("<!--"), then set the <span>escape flag</span>
- to true.</p>
+ <dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>In any case, emit the input character as a character
- token. Stay in the <span>data state</span>.</p>
+ <dt>Anything else</dt>
+ <dd>Emit the <span>current input character</span> as a character
+ token. Stay in the <span>data state</span>.</dd>
- </dd>
+ </dl>
+
+ <h5><dfn>RCDATA state</dfn></h5>
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+0026 AMPERSAND (&)</dt>
+ <dd>Switch to the <span>character reference in data
+ state</span>.</dd>
+
<dt>U+003C LESS-THAN SIGN (<)</dt>
- <dd>When the <span>content model flag</span> is set to the PCDATA
- state: switch to the <span>tag open state</span>.</dd>
- <dd>When the <span>content model flag</span> is set to either the
- RCDATA state or the RAWTEXT state, and the <span>escape flag</span>
- is false: switch to the <span>tag open state</span>.</dd>
- <dd>Otherwise: treat it as per the "anything else" entry
- below.</dd>
+ <dd>Switch to the <span>RCDATA less-than sign state</span>.</dd>
- <dt>U+003E GREATER-THAN SIGN (>)</dt>
- <dd>
+ <dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>If the <span>content model flag</span> is set to either the
- RCDATA state or the RAWTEXT state, and the <span>escape
- flag</span> is true, and the last three characters in the input
- stream including this one are U+002D HYPHEN-MINUS, U+002D
- HYPHEN-MINUS, U+003E GREATER-THAN SIGN ("-->"), set the
- <span>escape flag</span> to false.</p> <!-- no need to check
- that there are enough characters, since you can only run into
- this if the flag is true in the first place, which requires four
- characters. -->
+ <dt>Anything else</dt>
+ <dd>Emit the <span>current input character</span> as a character
+ token. Stay in the <span>RCDATA state</span>.</dd>
- <p>In any case, emit the input character as a character
- token. Stay in the <span>data state</span>.</p>
+ </dl>
- </dd>
+ <h5><dfn>RAWTEXT state</dfn></h5>
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+003C LESS-THAN SIGN (<)</dt>
+ <dd>Switch to the <span>RAWTEXT less-than sign state</span>.</dd>
+
<dt>EOF</dt>
<dd>Emit an end-of-file token.</dd>
<dt>Anything else</dt>
- <dd>Emit the input character as a character token. Stay in the
- <span>data state</span>.</dd>
+ <dd>Emit the <span>current input character</span> as a character
+ token. Stay in the <span>RAWTEXT state</span>.</dd>
</dl>
- <h5><dfn>Character reference in data state</dfn></h5>
+ <h5><dfn>Script data state</dfn></h5>
- <p><i>(This cannot happen if the <span>content model flag</span>
- is set to the RAWTEXT state.)</i></p>
+ <p>Consume the <span>next input character</span>:</p>
- <p>Attempt to <span>consume a character reference</span>, with no
- <span>additional allowed character</span>.</p>
+ <dl class="switch">
- <p>If nothing is returned, emit a U+0026 AMPERSAND character
- token.</p>
+ <dt>U+003C LESS-THAN SIGN (<)</dt>
+ <dd>Switch to the <span>script data less-than sign state</span>.</dd>
- <p>Otherwise, emit the character token that was returned.</p>
+ <dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>Finally, switch to the <span>data state</span>.</p>
+ <dt>Anything else</dt>
+ <dd>Emit the <span>current input character</span> as a character
+ token. Stay in the <span>script data state</span>.</dd>
+ </dl>
- <h5><dfn>Tag open state</dfn></h5>
- <p>The behavior of this state depends on the <span>content model
- flag</span>.</p>
+ <h5><dfn>PLAINTEXT state</dfn></h5>
- <dl>
+ <p>Consume the <span>next input character</span>:</p>
- <dt>If the <span>content model flag</span> is set to the RCDATA
- or RAWTEXT states</dt>
+ <dl class="switch">
- <dd>
+ <dt>EOF</dt>
+ <dd>Emit an end-of-file token.</dd>
- <p>Consume the <span>next input character</span>. If it is a
- U+002F SOLIDUS character (/), switch to the <span>close tag open
- state</span>. Otherwise, emit a U+003C LESS-THAN SIGN character
- token and reconsume the <span>current input character</span> in the
- <span>data state</span>.</p>
+ <dt>Anything else</dt>
+ <dd>Emit the <span>current input character</span> as a character
+ token. Stay in the <span>PLAINTEXT state</span>.</dd>
- </dd>
+ </dl>
- <dt>If the <span>content model flag</span> is set to the PCDATA
- state</dt>
- <dd>
+ <h5><dfn>Character reference in data state</dfn></h5>
- <p>Consume the <span>next input character</span>:</p>
+ <p>Attempt to <span>consume a character reference</span>, with no
+ <span>additional allowed character</span>.</p>
- <dl class="switch">
+ <p>If nothing is returned, emit a U+0026 AMPERSAND character
+ token.</p>
- <dt>U+0021 EXCLAMATION MARK (!)</dt>
- <dd>Switch to the <span>markup declaration open state</span>.</dd>
+ <p>Otherwise, emit the character token that was returned.</p>
- <dt>U+002F SOLIDUS (/)</dt>
- <dd>Switch to the <span>close tag open state</span>.</dd>
+ <p>Finally, switch to the <span>data state</span>.</p>
- <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Create a new start tag token, set its tag name to the
- lowercase version of the input character (add 0x0020 to the
- character's code point), then switch to the <span>tag name
- state</span>. (Don't emit the token yet; further details will
- be filled in before it is emitted.)</dd>
- <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
- <dd>Create a new start tag token, set its tag name to the input
- character, then switch to the <span>tag name
- state</span>. (Don't emit the token yet; further details will
- be filled in before it is emitted.)</dd>
+ <h5><dfn>Tag open state</dfn></h5>
- <dt>U+003E GREATER-THAN SIGN (>)</dt>
- <dd><span>Parse error</span>. Emit a U+003C LESS-THAN SIGN
- character token and a U+003E GREATER-THAN SIGN character
- token. Switch to the <span>data state</span>.</dd>
+ <p>Consume the <span>next input character</span>:</p>
- <dt>U+003F QUESTION MARK (?)</dt>
- <dd><span>Parse error</span>. Switch to the <span>bogus
- comment state</span>.</dd>
+ <dl class="switch">
- <dt>Anything else</dt>
- <dd><span>Parse error</span>. Emit a U+003C LESS-THAN SIGN
- character token and reconsume the <span>current input character</span> in the
- <span>data state</span>.</dd>
+ <dt>U+0021 EXCLAMATION MARK (!)</dt>
+ <dd>Switch to the <span>markup declaration open state</span>.</dd>
- </dl>
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>Switch to the <span>close tag open state</span>.</dd>
- </dd>
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new start tag token, set its tag name to the
+ lowercase version of the <span>current input character</span> (add 0x0020 to the
+ character's code point), then switch to the <span>tag name
+ state</span>. (Don't emit the token yet; further details will
+ be filled in before it is emitted.)</dd>
- </dl>
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new start tag token, set its tag name to the
+ <span>current input character</span>, then switch to the <span>tag
+ name state</span>. (Don't emit the token yet; further details will
+ be filled in before it is emitted.)</dd>
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd><span>Parse error</span>. Emit a U+003C LESS-THAN SIGN
+ character token and a U+003E GREATER-THAN SIGN character
+ token. Switch to the <span>data state</span>.</dd>
- <h5><dfn>Close tag open state</dfn></h5>
+ <dt>U+003F QUESTION MARK (?)</dt>
+ <dd><span>Parse error</span>. Switch to the <span>bogus
+ comment state</span>.</dd>
- <p>If the <span>content model flag</span> is set to the RCDATA or
- RAWTEXT states but no start tag token has ever been emitted by this
- instance of the tokenizer (<span>fragment case</span>), or, if the
- <span>content model flag</span> is set to the RCDATA or RAWTEXT states
- and the next few characters do not match the tag name of the last
- start tag token emitted (compared in an <span>ASCII
- case-insensitive</span> manner), or if they do but they are not
- immediately followed by one of the following characters:</p>
+ <dt>Anything else</dt>
+ <dd><span>Parse error</span>. Emit a U+003C LESS-THAN SIGN
+ character token and reconsume the <span>current input
+ character</span> in the <span>data state</span>.</dd>
- <ul class="brief">
- <li>U+0009 CHARACTER TABULATION</li>
- <li>U+000A LINE FEED (LF)</li>
- <li>U+000C FORM FEED (FF)</li>
- <!--<li>U+000D CARRIAGE RETURN (CR)</li>-->
- <li>U+0020 SPACE</li>
- <li>U+003E GREATER-THAN SIGN (>)</li>
- <li>U+002F SOLIDUS (/)</li>
- <li>EOF</li>
- </ul>
+ </dl>
- <p>...then emit a U+003C LESS-THAN SIGN character token, a U+002F
- SOLIDUS character token, and switch to the <span>data state</span>
- to process the <span>next input character</span>.</p>
- <p>Otherwise, if the <span>content model flag</span> is set to the
- PCDATA state, or if the next few characters <em>do</em> match that tag
- name, consume the <span>next input character</span>:</p>
+ <h5><dfn>Close tag open state</dfn></h5>
+ <p>Consume the <span>next input character</span>:</p>
+
<dl class="switch">
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
<dd>Create a new end tag token, set its tag name to the lowercase
- version of the input character (add 0x0020 to the character's
- code point), then switch to the <span>tag name
+ version of the <span>current input character</span> (add 0x0020 to
+ the character's code point), then switch to the <span>tag name
state</span>. (Don't emit the token yet; further details will be
filled in before it is emitted.)</dd>
<dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
- <dd>Create a new end tag token, set its tag name to the input
- character, then switch to the <span>tag name state</span>. (Don't
- emit the token yet; further details will be filled in before it
- is emitted.)</dd>
+ <dd>Create a new end tag token, set its tag name to the
+ <span>current input character</span>, then switch to the <span>tag
+ name state</span>. (Don't emit the token yet; further details will
+ be filled in before it is emitted.)</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd><span>Parse error</span>. Switch to the <span>data
@@ -79506,21 +79482,436 @@
state</span>.</dd>
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Append the lowercase version of the <span>current input character</span>
- (add 0x0020 to the character's code point) to the current tag
- token's tag name. Stay in the <span>tag name state</span>.</dd>
+ <dd>Append the lowercase version of the <span>current input
+ character</span> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Stay in the <span>tag name
+ state</span>.</dd>
<dt>EOF</dt>
<dd><span>Parse error</span>. Reconsume the EOF character in the
<span>data state</span>.</dd>
<dt>Anything else</dt>
- <dd>Append the <span>current input character</span> to the current tag token's
- tag name. Stay in the <span>tag name state</span>.</dd>
+ <dd>Append the <span>current input character</span> to the current
+ tag token's tag name. Stay in the <span>tag name state</span>.</dd>
</dl>
+ <h5><dfn>RCDATA less-than sign state</dfn></h5>
+ <!-- identical to the RAWTEXT less-than sign state, except s/RAWTEXT/RCDATA/g -->
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>Set the <var>temporary buffer</var> to the empty string. Switch
+ to the <span>RCDATA end tag open state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and reconsume the
+ <span>current input character</span> in the <span>RCDATA
+ state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>RCDATA end tag open state</dfn></h5>
+ <!-- identical to the RAWTEXT (and Script data) end tag open state, except s/RAWTEXT/RCDATA/g -->
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ lowercase version of the <span>current input character</span> (add
+ 0x0020 to the character's code point). Append the <span>current
+ input character</span> to the <var>temporary buffer</var>. Finally,
+ switch to the <span>RCDATA end tag name state</span>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ <span>current input character</span>. Append the <span>current
+ input character</span> to the <var>temporary buffer</var>. Finally,
+ switch to the <span>RCDATA end tag name state</span>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, and reconsume the <span>current input
+ character</span> in the <span>RCDATA state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>RCDATA end tag name state</dfn></h5>
+ <!-- identical to the RAWTEXT (and Script data) end tag name state, except s/RAWTEXT/RCDATA/g -->
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+0009 CHARACTER TABULATION</dt>
+ <dt>U+000A LINE FEED (LF)</dt>
+ <dt>U+000C FORM FEED (FF)</dt>
+ <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+ <dt>U+0020 SPACE</dt>
+ <dd>If the current end tag token is an <span>appropriate end tag
+ token</span>, then switch to the <span>before attribute name
+ state</span>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>If the current end tag token is an <span>appropriate end tag
+ token</span>, then switch to the <span>self-closing start tag
+ state</span>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>If the current end tag token is an <span>appropriate end tag
+ token</span>, then emit the current tag token and switch to the
+ <span>data state</span>. Otherwise, treat it as per the "anything
+ else" entry below.</dd>
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Append the lowercase version of the <span>current input
+ character</span> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Append the <span>current input
+ character</span> to the <var>temporary buffer</var>. Stay in the
+ <span>RCDATA end tag name state</span>.</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Append the <span>current input character</span> to the current
+ tag token's tag name. Append the <span>current input
+ character</span> to the <var>temporary buffer</var>. Stay in the
+ <span>RCDATA end tag name state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, a character token for each of the characters in
+ the <var>temporary buffer</var> (in the order they were added to
+ the buffer), and reconsume the <span>current input character</span>
+ in the <span>RCDATA state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>RAWTEXT less-than sign state</dfn></h5>
+ <!-- identical to the RCDATA less-than sign state, except s/RCDATA/RAWTEXT/g -->
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>Set the <var>temporary buffer</var> to the empty string. Switch
+ to the <span>RAWTEXT end tag open state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and reconsume the
+ <span>current input character</span> in the <span>RAWTEXT
+ state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>RAWTEXT end tag open state</dfn></h5>
+ <!-- identical to the RCDATA (and Script data) end tag open state, except s/RCDATA/RAWTEXT/g -->
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ lowercase version of the <span>current input character</span> (add
+ 0x0020 to the character's code point). Append the <span>current
+ input character</span> to the <var>temporary buffer</var>. Finally,
+ switch to the <span>RAWTEXT end tag name state</span>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ <span>current input character</span>. Append the <span>current
+ input character</span> to the <var>temporary buffer</var>. Finally,
+ switch to the <span>RAWTEXT end tag name state</span>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, and reconsume the <span>current input
+ character</span> in the <span>RAWTEXT state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>RAWTEXT end tag name state</dfn></h5>
+ <!-- identical to the RCDATA (and Script data) end tag name state, except s/RCDATA/RAWTEXT/g -->
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+0009 CHARACTER TABULATION</dt>
+ <dt>U+000A LINE FEED (LF)</dt>
+ <dt>U+000C FORM FEED (FF)</dt>
+ <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+ <dt>U+0020 SPACE</dt>
+ <dd>If the current end tag token is an <span>appropriate end tag
+ token</span>, then switch to the <span>before attribute name
+ state</span>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>If the current end tag token is an <span>appropriate end tag
+ token</span>, then switch to the <span>self-closing start tag
+ state</span>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>If the current end tag token is an <span>appropriate end tag
+ token</span>, then emit the current tag token and switch to the
+ <span>data state</span>. Otherwise, treat it as per the "anything
+ else" entry below.</dd>
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Append the lowercase version of the <span>current input
+ character</span> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Append the <span>current input
+ character</span> to the <var>temporary buffer</var>. Stay in the
+ <span>RAWTEXT end tag name state</span>.</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Append the <span>current input character</span> to the current
+ tag token's tag name. Append the <span>current input
+ character</span> to the <var>temporary buffer</var>. Stay in the
+ <span>RAWTEXT end tag name state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, a character token for each of the characters in
+ the <var>temporary buffer</var> (in the order they were added to
+ the buffer), and reconsume the <span>current input character</span>
+ in the <span>RAWTEXT state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>Script data less-than sign state</dfn></h5>
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>Set the <var>temporary buffer</var> to the empty string. Switch
+ to the <span>script data end tag open state</span>.</dd>
+
+ <dt>U+0021 EXCLAMATION MARK (!)</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and a U+0021
+ EXCLAMATION MARK character token. Switch to the <span>script data
+ escape start state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token and reconsume the
+ <span>current input character</span> in the <span>script data
+ state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>Script data end tag open state</dfn></h5>
+ <!-- identical to the RCDATA (and RAWTEXT) end tag open state, except s/RCDATA/Script data/g -->
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ lowercase version of the <span>current input character</span> (add
+ 0x0020 to the character's code point). Append the <span>current
+ input character</span> to the <var>temporary buffer</var>. Finally,
+ switch to the <span>script data end tag name state</span>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Create a new end tag token, and set its tag name to the
+ <span>current input character</span>. Append the <span>current
+ input character</span> to the <var>temporary buffer</var>. Finally,
+ switch to the <span>script data end tag name state</span>. (Don't emit
+ the token yet; further details will be filled in before it is
+ emitted.)</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, and reconsume the <span>current input
+ character</span> in the <span>script data state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>Script data end tag name state</dfn></h5>
+ <!-- identical to the RCDATA (and RAWTEXT) end tag name state, except s/RCDATA/Script data/g -->
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+0009 CHARACTER TABULATION</dt>
+ <dt>U+000A LINE FEED (LF)</dt>
+ <dt>U+000C FORM FEED (FF)</dt>
+ <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+ <dt>U+0020 SPACE</dt>
+ <dd>If the current end tag token is an <span>appropriate end tag
+ token</span>, then switch to the <span>before attribute name
+ state</span>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+002F SOLIDUS (/)</dt>
+ <dd>If the current end tag token is an <span>appropriate end tag
+ token</span>, then switch to the <span>self-closing start tag
+ state</span>. Otherwise, treat it as per the "anything else" entry
+ below.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>If the current end tag token is an <span>appropriate end tag
+ token</span>, then emit the current tag token and switch to the
+ <span>data state</span>. Otherwise, treat it as per the "anything
+ else" entry below.</dd>
+
+ <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
+ <dd>Append the lowercase version of the <span>current input
+ character</span> (add 0x0020 to the character's code point) to the
+ current tag token's tag name. Append the <span>current input
+ character</span> to the <var>temporary buffer</var>. Stay in the
+ <span>Script data end tag name state</span>.</dd>
+
+ <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
+ <dd>Append the <span>current input character</span> to the current
+ tag token's tag name. Append the <span>current input
+ character</span> to the <var>temporary buffer</var>. Stay in the
+ <span>Script data end tag name state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS
+ character token, a character token for each of the characters in
+ the <var>temporary buffer</var> (in the order they were added to
+ the buffer), and reconsume the <span>current input character</span>
+ in the <span>script data state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>Script data escape start state</dfn></h5>
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <span>script data escape start dash state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Reconsume the <span>current input character</span> in the
+ <span>script data state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>Script data escape start dash state</dfn></h5>
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <span>script data escaped dash dash state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Reconsume the <span>current input character</span> in the
+ <span>script data state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>Script data escaped state</dfn></h5>
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <span>script data escaped dash state</span>.</dd>
+
+ <dt>EOF</dt>
+ <dd><span>Parse error</span>. Reconsume the EOF character in the
+ <span>data state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit the current input character as a character token. Stay in
+ the <span>script data escaped state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>Script data escaped dash state</dfn></h5>
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Switch to the
+ <span>script data escaped dash dash state</span>.</dd>
+
+ <dt>EOF</dt>
+ <dd><span>Parse error</span>. Reconsume the EOF character in the
+ <span>data state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit the current input character as a character token. Switch
+ to the <span>script data escaped state</span>.</dd>
+
+ </dl>
+
+
+ <h5><dfn>Script data escaped dash dash state</dfn></h5>
+
+ <p>Consume the <span>next input character</span>:</p>
+
+ <dl class="switch">
+
+ <dt>U+002D HYPHEN-MINUS (-)</dt>
+ <dd>Emit a U+002D HYPHEN-MINUS character token. Stay in the
+ <span>script data escaped dash dash state</span>.</dd>
+
+ <dt>U+003E GREATER-THAN SIGN (>)</dt>
+ <dd>Emit a U+003E GREATER-THAN SIGN character token. Switch to the
+ <span>script data state</span>.</dd>
+
+ <dt>EOF</dt>
+ <dd><span>Parse error</span>. Reconsume the EOF character in the
+ <span>data state</span>.</dd>
+
+ <dt>Anything else</dt>
+ <dd>Emit the current input character as a character token. Switch
+ to the <span>script data escaped state</span>.</dd>
+
+ </dl>
+
+
<h5><dfn>Before attribute name state</dfn></h5>
<p>Consume the <span>next input character</span>:</p>
@@ -79592,9 +79983,9 @@
state</span>.</dd>
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Append the lowercase version of the <span>current input character</span>
- (add 0x0020 to the character's code point) to the current
- attribute's name. Stay in the <span>attribute name
+ <dd>Append the lowercase version of the <span>current input
+ character</span> (add 0x0020 to the character's code point) to the
+ current attribute's name. Stay in the <span>attribute name
state</span>.</dd>
<dt>U+0022 QUOTATION MARK (")</dt>
@@ -79608,8 +79999,9 @@
<span>data state</span>.</dd>
<dt>Anything else</dt>
- <dd>Append the <span>current input character</span> to the current attribute's
- name. Stay in the <span>attribute name state</span>.</dd>
+ <dd>Append the <span>current input character</span> to the current
+ attribute's name. Stay in the <span>attribute name
+ state</span>.</dd>
</dl>
@@ -79647,10 +80039,10 @@
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
<dd>Start a new attribute in the current tag token. Set that
- attribute's name to the lowercase version of the <span>current input character</span>
- (add 0x0020 to the character's code point), and its value to
- the empty string. Switch to the <span>attribute name
- state</span>.</dd>
+ attribute's name to the lowercase version of the <span>current
+ input character</span> (add 0x0020 to the character's code point),
+ and its value to the empty string. Switch to the <span>attribute
+ name state</span>.</dd>
<dt>U+0022 QUOTATION MARK (")</dt>
<dt>U+0027 APOSTROPHE (')</dt>
@@ -79664,8 +80056,8 @@
<dt>Anything else</dt>
<dd>Start a new attribute in the current tag token. Set that
- attribute's name to the <span>current input character</span>, and its value to
- the empty string. Switch to the <span>attribute name
+ attribute's name to the <span>current input character</span>, and
+ its value to the empty string. Switch to the <span>attribute name
state</span>.</dd>
</dl>
@@ -79689,7 +80081,7 @@
<dt>U+0026 AMPERSAND (&)</dt>
<dd>Switch to the <span>attribute value (unquoted) state</span>
- and reconsume this input character.</dd>
+ and reconsume this <span>current input character</span>.</dd>
<dt>U+0027 APOSTROPHE (')</dt>
<dd>Switch to the <span>attribute value (single-quoted) state</span>.</dd>
@@ -79736,9 +80128,9 @@
<span>data state</span>.</dd>
<dt>Anything else</dt>
- <dd>Append the <span>current input character</span> to the current attribute's
- value. Stay in the <span>attribute value (double-quoted)
- state</span>.</dd>
+ <dd>Append the <span>current input character</span> to the current
+ attribute's value. Stay in the <span>attribute value
+ (double-quoted) state</span>.</dd>
</dl>
@@ -79763,9 +80155,9 @@
<span>data state</span>.</dd>
<dt>Anything else</dt>
- <dd>Append the <span>current input character</span> to the current attribute's
- value. Stay in the <span>attribute value (single-quoted)
- state</span>.</dd>
+ <dd>Append the <span>current input character</span> to the current
+ attribute's value. Stay in the <span>attribute value
+ (single-quoted) state</span>.</dd>
</dl>
@@ -79805,8 +80197,8 @@
<span>data state</span>.</dd>
<dt>Anything else</dt>
- <dd>Append the <span>current input character</span> to the current attribute's
- value. Stay in the <span>attribute value (unquoted)
+ <dd>Append the <span>current input character</span> to the current
+ attribute's value. Stay in the <span>attribute value (unquoted)
state</span>.</dd>
</dl>
@@ -79881,9 +80273,6 @@
<h5><dfn>Bogus comment state</dfn></h5>
- <p><i>(This can only happen if the <span>content model
- flag</span> is set to the PCDATA state.)</i></p>
-
<p>Consume every character up to and including the first U+003E
GREATER-THAN SIGN character (>) or the end of the file (EOF),
whichever comes first. Emit a comment token whose data is the
@@ -79902,9 +80291,6 @@
<h5><dfn>Markup declaration open state</dfn></h5>
- <p><i>(This can only happen if the <span>content model
- flag</span> is set to the PCDATA state.)</i></p>
-
<p>If the next two characters are both U+002D HYPHEN-MINUS (-)
characters, consume those two characters, create a comment token
whose data is the empty string, and switch to the <span>comment
@@ -79948,8 +80334,8 @@
the EOF character in the <span>data state</span>.</dd>
<dt>Anything else</dt>
- <dd>Append the input character to the comment token's
- data. Switch to the <span>comment state</span>.</dd>
+ <dd>Append the <span>current input character</span> to the comment
+ token's data. Switch to the <span>comment state</span>.</dd>
</dl>
@@ -79973,9 +80359,9 @@
in comment end state -->
<dt>Anything else</dt>
- <dd>Append a U+002D HYPHEN-MINUS character (-) and the input
- character to the comment token's data. Switch to the
- <span>comment state</span>.</dd>
+ <dd>Append a U+002D HYPHEN-MINUS character (-) and the
+ <span>current input character</span> to the comment token's
+ data. Switch to the <span>comment state</span>.</dd>
</dl>
@@ -79995,8 +80381,8 @@
in comment end state -->
<dt>Anything else</dt>
- <dd>Append the input character to the comment token's data. Stay
- in the <span>comment state</span>.</dd>
+ <dd>Append the <span>current input character</span> to the comment
+ token's data. Stay in the <span>comment state</span>.</dd>
</dl>
@@ -80016,9 +80402,9 @@
in comment end state -->
<dt>Anything else</dt>
- <dd>Append a U+002D HYPHEN-MINUS character (-) and the input
- character to the comment token's data. Switch to the
- <span>comment state</span>.</dd>
+ <dd>Append a U+002D HYPHEN-MINUS character (-) and the
+ <span>current input character</span> to the comment token's
+ data. Switch to the <span>comment state</span>.</dd>
</dl>
@@ -80039,8 +80425,9 @@
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
<dd><span>Parse error</span>. Append two U+002D HYPHEN-MINUS (-)
- characters and the input character to the comment token's
- data. Switch to the <span>comment end space state</span>.</dd>
+ characters and the <span>current input character</span> to the
+ comment token's data. Switch to the <span>comment end space
+ state</span>.</dd>
<dt>U+0021 EXCLAMATION MARK (!)</dt>
<dd><span>Parse error</span>. Switch to the <span>comment end bang
@@ -80061,8 +80448,9 @@
<dt>Anything else</dt>
<dd><span>Parse error</span>. Append two U+002D HYPHEN-MINUS (-)
- characters and the input character to the comment token's
- data. Switch to the <span>comment state</span>.</dd>
+ characters and the <span>current input character</span> to the
+ comment token's data. Switch to the <span>comment
+ state</span>.</dd>
</dl>
@@ -80089,9 +80477,9 @@
<dt>Anything else</dt>
<dd>Append two U+002D HYPHEN-MINUS (-) characters, a U+0021
- EXCLAMATION MARK character (!), and the input character to the
- comment token's data. Switch to the <span>comment
- state</span>.</dd>
+ EXCLAMATION MARK character (!), and the <span>current input
+ character</span> to the comment token's data. Switch to the
+ <span>comment state</span>.</dd>
</dl>
@@ -80107,7 +80495,7 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Append the input character to the comment token's data. Stay in
+ <dd>Append the <span>current input character</span> to the comment token's data. Stay in
the <span>comment end space state</span>.</dd>
<dt>U+002D HYPHEN-MINUS (-)</dt>
@@ -80123,7 +80511,7 @@
comment in comment end state -->
<dt>Anything else</dt>
- <dd>Append the input character to the comment token's data. Switch
+ <dd>Append the <span>current input character</span> to the comment token's data. Switch
to the <span>comment state</span>.</dd>
</dl>
@@ -80169,7 +80557,7 @@
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
<dd>Create a new DOCTYPE token. Set the token's name to the
- lowercase version of the input character (add 0x0020 to the
+ lowercase version of the <span>current input character</span> (add 0x0020 to the
character's code point). Switch to the <span>DOCTYPE name
state</span>.</dd>
@@ -80209,9 +80597,10 @@
state</span>.</dd>
<dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
- <dd>Append the lowercase version of the input character (add 0x0020
- to the character's code point) to the current DOCTYPE token's
- name. Stay in the <span>DOCTYPE name state</span>.</dd>
+ <dd>Append the lowercase version of the <span>current input
+ character</span> (add 0x0020 to the character's code point) to the
+ current DOCTYPE token's name. Stay in the <span>DOCTYPE name
+ state</span>.</dd>
<dt>EOF</dt>
<dd><span>Parse error</span>. Set the DOCTYPE token's
@@ -80219,8 +80608,9 @@
Reconsume the EOF character in the <span>data state</span>.</dd>
<dt>Anything else</dt>
- <dd>Append the <span>current input character</span> to the current DOCTYPE
- token's name. Stay in the <span>DOCTYPE name state</span>.</dd>
+ <dd>Append the <span>current input character</span> to the current
+ DOCTYPE token's name. Stay in the <span>DOCTYPE name
+ state</span>.</dd>
</dl>
@@ -80402,7 +80792,8 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Switch to the <span>between DOCTYPE public and system identifiers state</span>.</dd>
+ <dd>Switch to the <span>between DOCTYPE public and system
+ identifiers state</span>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd>Emit the current DOCTYPE token. Switch to the <span>data
@@ -80442,7 +80833,8 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Stay in the <span>between DOCTYPE public and system identifiers state</span>.</dd>
+ <dd>Stay in the <span>between DOCTYPE public and system identifiers
+ state</span>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd>Emit the current DOCTYPE token. Switch to the <span>data
@@ -80545,7 +80937,8 @@
<dl class="switch">
<dt>U+0022 QUOTATION MARK (")</dt>
- <dd>Switch to the <span>after DOCTYPE system identifier state</span>.</dd>
+ <dd>Switch to the <span>after DOCTYPE system identifier
+ state</span>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd><span>Parse error</span>. Set the DOCTYPE token's
@@ -80558,8 +80951,8 @@
Reconsume the EOF character in the <span>data state</span>.</dd>
<dt>Anything else</dt>
- <dd>Append the <span>current input character</span> to the current DOCTYPE
- token's system identifier. Stay in the <span>DOCTYPE system
+ <dd>Append the <span>current input character</span> to the current
+ DOCTYPE token's system identifier. Stay in the <span>DOCTYPE system
identifier (double-quoted) state</span>.</dd>
</dl>
@@ -80572,7 +80965,8 @@
<dl class="switch">
<dt>U+0027 APOSTROPHE (')</dt>
- <dd>Switch to the <span>after DOCTYPE system identifier state</span>.</dd>
+ <dd>Switch to the <span>after DOCTYPE system identifier
+ state</span>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd><span>Parse error</span>. Set the DOCTYPE token's
@@ -80585,8 +80979,8 @@
Reconsume the EOF character in the <span>data state</span>.</dd>
<dt>Anything else</dt>
- <dd>Append the <span>current input character</span> to the current DOCTYPE
- token's system identifier. Stay in the <span>DOCTYPE system
+ <dd>Append the <span>current input character</span> to the current
+ DOCTYPE token's system identifier. Stay in the <span>DOCTYPE system
identifier (single-quoted) state</span>.</dd>
</dl>
@@ -80603,7 +80997,8 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd>Stay in the <span>after DOCTYPE system identifier state</span>.</dd>
+ <dd>Stay in the <span>after DOCTYPE system identifier
+ state</span>.</dd>
<dt>U+003E GREATER-THAN SIGN (>)</dt>
<dd>Emit the current DOCTYPE token. Switch to the <span>data
@@ -80644,9 +81039,6 @@
<h5><dfn>CDATA section state</dfn></h5>
- <p><i>(This can only happen if the <span>content model
- flag</span> is set to the PCDATA state.)</i></p>
-
<p>Consume every character up to the next occurrence of the three
character sequence U+005D RIGHT SQUARE BRACKET U+005D RIGHT SQUARE
BRACKET U+003E GREATER-THAN SIGN (<code title="">]]></code>), or the
@@ -81162,11 +81554,10 @@
<li><p><span>Insert an HTML element</span> for the token.</p></li>
<li><p>If the algorithm that was invoked is the <span>generic raw
- text element parsing algorithm</span>, switch the tokenizer's
- <span>content model flag</span> to the RAWTEXT state; otherwise the
- algorithm invoked was the <span>generic RCDATA element parsing
- algorithm</span>, switch the tokenizer's <span>content model
- flag</span> to the RCDATA state.</p></li>
+ text element parsing algorithm</span>, switch the tokenizer to the
+ <span>RAWTEXT state</span>; otherwise the algorithm invoked
+ was the <span>generic RCDATA element parsing algorithm</span>,
+ switch the tokenizer to the <span>RCDATA state</span>.</p></li>
<li><p>Let the <span>original insertion mode</span> be the current
<span>insertion mode</span>.</p>
@@ -81744,8 +82135,8 @@
and push it onto the <span>stack of open
elements</span>.</p></li>
- <li><p>Switch the tokenizer's <span>content model flag</span> to
- the RAWTEXT state.</p></li>
+ <li><p>Switch the tokenizer to the <span>script data
+ state</span>.</p></li>
<li><p>Let the <span>original insertion mode</span> be the current
<span>insertion mode</span>.</p>
@@ -82328,14 +82719,12 @@
<p><span>Insert an HTML element</span> for the token.</p>
- <p>Switch the <span>content model flag</span> to the PLAINTEXT
- state.</p>
+ <p>Switch the tokenizer to the <span>PLAINTEXT state</span>.</p>
- <p class="note">Once a start tag with the tag name "plaintext"
- has been seen, that will be the last token ever seen other
- than character tokens (and the end-of-file token), because
- there is no way to switch the <span>content model flag</span>
- out of the PLAINTEXT state.</p>
+ <p class="note">Once a start tag with the tag name "plaintext" has
+ been seen, that will be the last token ever seen other than
+ character tokens (and the end-of-file token), because there is no
+ way to switch out of the <span>PLAINTEXT state</span>.</p>
</dd>
@@ -82990,8 +83379,8 @@
one. (Newlines at the start of <code>textarea</code> elements are
ignored as an authoring convenience.)</p></li>
- <li><p>Switch the tokenizer's <span>content model flag</span> to
- the RCDATA state.</p></li>
+ <li><p>Switch the tokenizer to the the <span>RCDATA
+ state</span>.</p></li>
<li><p>Let the <span>original insertion mode</span> be the
current <span>insertion mode</span>.</p>
@@ -85633,45 +86022,40 @@
<li>
- <p>Set the <span>HTML parser</span>'s <span>tokenization</span>
- stage's <span>content model flag</span> according to the <var
- title="">context</var> element, as follows:</p>
+ <p>Set the state of the <span>HTML parser</span>'s
+ <span>tokenization</span> stage as follows:</p>
<dl class="switch">
<dt>If it is a <code>title</code> or <code>textarea</code>
element</dt>
- <dd>Set the <span>content model flag</span> to
- the RCDATA state.</dd>
+ <dd>Switch the tokenizer to the <span>RCDATA state</span>.</dd>
<dt>If it is a <code>style</code>, <code>script</code>,
<code>xmp</code>, <code>iframe</code>, <code>noembed</code>, or
<code>noframes</code> element</dt>
- <dd>Set the <span>content model flag</span> to
- the RAWTEXT state.</dd>
+ <dd>Switch the tokenizer to the <span>RAWTEXT state</span>.</dd>
<dt>If it is a <code>noscript</code> element</dt>
- <dd>If the <span>scripting flag</span> is enabled, set the
- <span>content model flag</span> to the RAWTEXT
- state. Otherwise, set the <span>content model flag</span> to the
- PCDATA state.</dd>
+ <dd>If the <span>scripting flag</span> is enabled, switch the
+ tokenizer to the <span>RAWTEXT state</span>. Otherwise,
+ leave the tokenizer in the <span>data state</span>.</dd>
<dt>If it is a <code>plaintext</code> element</dt>
- <dd>Set the <span>content model flag</span> to
- PLAINTEXT.</dd>
+ <dd>Switch the tokenizer to the <span>PLAINTEXT
+ state</span>.</dd>
<dt>Otherwise</dt>
- <dd>Leave the <span>content model flag</span> in the PCDATA
- state.</dd>
+ <dd>Leave the tokenizer in the <span>data state</span>.</dd>
</dl>
More information about the Commit-Watchers
mailing list