[html5] r1303 - /

Sun Mar 2 04:18:29 PST 2008

Author: ianh
Date: 2008-03-02 04:18:28 -0800 (Sun, 02 Mar 2008)
New Revision: 1303

Modified:
   index
   source
Log:
[act] (2) Ban attribute names containing single quotes and double quotes, ban unquoted attribute values containing single quotes and double quotes, require spaces between attributes.

Modified: index
===================================================================

--- index	2008-03-02 12:02:46 UTC (rev 1302)
+++ index	2008-03-02 12:18:28 UTC (rev 1303)
@@ -37470,10 +37470,11 @@
   <p>Attributes have a name and a value. <dfn id=attribute
    title=syntax-attribute-name>Attribute names</dfn> must consist of one or
    more characters other than the <a href="#space" title="space
-   character">space characters</a>, U+003E GREATER-THAN SIGN (>), U+002F
-   SOLIDUS (/), U+003D EQUALS SIGN (=), the U+0000 NULL character, the
-   control characters, and any characters that are not defined by Unicode. In
-   the HTML syntax, attribute names may be written with any mix of lower- and
+   character">space characters</a>, U+0000 NULL, U+0022 QUOTATION MARK
+   (&#x22;), U+0027 APOSTROPHE (&#x27;), U+003E GREATER-THAN SIGN (>),
+   U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
+   characters, and any characters that are not defined by Unicode. In the
+   HTML syntax, attribute names may be written with any mix of lower- and
    uppercase letters that, when converted to
    all-lowercase<!-- ASCII case-insensitive -->, matches the attribute's
    name; attribute names are case-insensitive.
@@ -37516,11 +37517,11 @@
      character">space characters</a>, followed by the <a href="#attribute0"
      title=syntax-attribute-value>attribute value</a>, which, in addition to
      the requirements given above for attribute values, must not contain any
-     literal <a href="#space" title="space character">space characters</a>,
-     U+003D EQUALS SIGN (=) characters, or U+003E GREATER-THAN SIGN
-     (<code>></code>) characters, and must not, furthermore, start with
-     either a literal U+0022 QUOTATION MARK (<code>&#x22;</code>) character
-     or a literal U+0027 APOSTROPHE (<code>&#x27;</code>) character.</p>
+     literal <a href="#space" title="space character">space characters</a>, a
+     U+0022 QUOTATION MARK (<code>&#x22;</code>) characters, U+0027
+     APOSTROPHE (<code>&#x27;</code>) characters, U+003D EQUALS SIGN
+     (<code>=</code>) characters, or U+003E GREATER-THAN SIGN
+     (<code>></code>) characters.</p>
 
     <div class=example>
      <p>In the following example, the <code
@@ -37558,6 +37559,10 @@
      <pre><input <em>type='checkbox'</em>></pre>
     </div>
 
+    <p>If an attribute using the single-quoted attribute syntax is to be
+     followed by another attribute, then there must be a <a
+     href="#space">space character</a> separating the two.</p>
+
    <dt>Double-quoted attribute value syntax
 
    <dd>
@@ -37579,6 +37584,10 @@
 
      <pre><input <em>name="be evil"</em>></pre>
     </div>
+
+    <p>If an attribute using the double-quoted attribute syntax is to be
+     followed by another attribute, then there must be a <a
+     href="#space">space character</a> separating the two.</p>
   </dl>
 
   <h5 id=optional><span class=secno>8.1.2.4. </span>Optional tags</h5>
@@ -38659,7 +38668,10 @@
    be <a href="#executing0" title="executing a script block">executed</a> and
    removed from its list.
 
-  <p>The tokeniser state machine is as follows:
+  <p>The tokeniser state machine is as follows:</p>
+  <!-- XXX should go through these reordering the entries so that
+  they're in some consistent order, like, by Unicode, errors last, or
+  something -->
 
   <dl>
    <dt><dfn id=data-state>Data state</dfn>
@@ -38969,12 +38981,14 @@
       href="#permitted">permitted slash</a>. Stay in the <a
       href="#before">before attribute name state</a>.
 
+     <dt>U+0022 QUOTATION MARK (")
+
+     <dt>U+0027 APOSTROPHE (')
+
      <dt>U+003D EQUALS SIGN (=)
 
-     <dd><a href="#parse0">Parse error</a>. Start a new attribute in the
-      current tag token. Set that attribute's name to the current input
-      character, and its value to the empty string. Switch to the <a
-      href="#attribute1">attribute name state</a>.
+     <dd><a href="#parse0">Parse error</a>. Treat it as per the "anything
+      else" entry below.
 
      <dt>EOF
 
@@ -39030,6 +39044,13 @@
       href="#permitted">permitted slash</a>. Switch to the <a
       href="#before">before attribute name state</a>.
 
+     <dt>U+0022 QUOTATION MARK (")
+
+     <dt>U+0027 APOSTROPHE (')
+
+     <dd><a href="#parse0">Parse error</a>. Treat it as per the "anything
+      else" entry below.
+
      <dt>EOF
 
      <dd><a href="#parse0">Parse error</a>. Emit the current tag token.
@@ -39145,9 +39166,8 @@
 
      <dt>U+003D EQUALS SIGN (=)
 
-     <dd><a href="#parse0">Parse error</a>. Append the current input
-      character to the current attribute's value. Switch to the <a
-      href="#attribute4">attribute value (unquoted) state</a>.
+     <dd><a href="#parse0">Parse error</a>. Treat it as per the "anything
+      else" entry below.
 
      <dt>EOF
 
@@ -39169,7 +39189,8 @@
     <dl class=switch>
      <dt>U+0022 QUOTATION MARK (")
 
-     <dd>Switch to the <a href="#before">before attribute name state</a>.
+     <dd>Switch to the <a href="#after0">after attribute value (quoted)
+      state</a>.
 
      <dt>U+0026 AMPERSAND (&)
 
@@ -39197,7 +39218,8 @@
     <dl class=switch>
      <dt>U+0027 APOSTROPHE (')
 
-     <dd>Switch to the <a href="#before">before attribute name state</a>.
+     <dd>Switch to the <a href="#after0">after attribute value (quoted)
+      state</a>.
 
      <dt>U+0026 AMPERSAND (&)
 
@@ -39247,11 +39269,14 @@
      <dd>Emit the current tag token. Switch to the <a href="#data-state">data
       state</a>.
 
+     <dt>U+0022 QUOTATION MARK (")
+
+     <dt>U+0027 APOSTROPHE (')
+
      <dt>U+003D EQUALS SIGN (=)
 
-     <dd><a href="#parse0">Parse error</a>. Append the current input
-      character to the current attribute's value. Stay in the <a
-      href="#attribute4">attribute value (unquoted) state</a>.
+     <dd><a href="#parse0">Parse error</a>. Treat it as per the "anything
+      else" entry below.
 
      <dt>EOF
 
@@ -39278,6 +39303,42 @@
     <p>Finally, switch back to the attribute value state that you were in
      when were switched into this state.</p>
 
+   <dt><dfn id=after0>After attribute value (quoted) state</dfn>
+
+   <dd>
+    <p>Consume the <a href="#next-input">next input character</a>:</p>
+
+    <dl class=switch>
+     <dt>U+0009 CHARACTER TABULATION
+
+     <dt>U+000A LINE FEED (LF)
+
+     <dt>U+000B LINE TABULATION
+
+     <dt>U+000C FORM FEED (FF)</dt>
+     <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+
+     <dt>U+0020 SPACE
+
+     <dd>Switch to the <a href="#before">before attribute name state</a>.
+
+     <dt>U+003E GREATER-THAN SIGN (>)
+
+     <dd>Emit the current tag token. Switch to the <a href="#data-state">data
+      state</a>.
+
+     <dt>U+002F SOLIDUS (/)
+
+     <dd><a href="#parse0">Parse error</a> unless this is a <a
+      href="#permitted">permitted slash</a>. Switch to the <a
+      href="#before">before attribute name state</a>.
+
+     <dt>Anything else
+
+     <dd><a href="#parse0">Parse error</a>. Reconsume the character in the<a
+      href="#before">before attribute name state</a>.
+    </dl>
+
    <dt><dfn id=bogus>Bogus comment state</dfn>
 
    <dd>
@@ -39537,7 +39598,7 @@
 
      <dt>U+0020 SPACE
 
-     <dd>Switch to the <a href="#after0">after DOCTYPE name state</a>.
+     <dd>Switch to the <a href="#after1">after DOCTYPE name state</a>.
 
      <dt>U+003E GREATER-THAN SIGN (>)
 
@@ -39557,7 +39618,7 @@
       name. Stay in the <a href="#doctype1">DOCTYPE name state</a>.
     </dl>
 
-   <dt><dfn id=after0>After DOCTYPE name state</dfn>
+   <dt><dfn id=after1>After DOCTYPE name state</dfn>
 
    <dd>
     <p>Consume the <a href="#next-input">next input character</a>:</p>
@@ -39574,7 +39635,7 @@
 
      <dt>U+0020 SPACE
 
-     <dd>Stay in the <a href="#after0">after DOCTYPE name state</a>.
+     <dd>Stay in the <a href="#after1">after DOCTYPE name state</a>.
 
      <dt>U+003E GREATER-THAN SIGN (>)
 
@@ -39664,7 +39725,7 @@
     <dl class=switch>
      <dt>U+0022 QUOTATION MARK (")
 
-     <dd>Switch to the <a href="#after1">after DOCTYPE public identifier
+     <dd>Switch to the <a href="#after2">after DOCTYPE public identifier
       state</a>.
 
      <dt>U+003E GREATER-THAN SIGN (>)
@@ -39695,7 +39756,7 @@
     <dl class=switch>
      <dt>U+0027 APOSTROPHE (')
 
-     <dd>Switch to the <a href="#after1">after DOCTYPE public identifier
+     <dd>Switch to the <a href="#after2">after DOCTYPE public identifier
       state</a>.
 
      <dt>U+003E GREATER-THAN SIGN (>)
@@ -39718,7 +39779,7 @@
       identifier (single-quoted) state</a>.
     </dl>
 
-   <dt><dfn id=after1>After DOCTYPE public identifier state</dfn>
+   <dt><dfn id=after2>After DOCTYPE public identifier state</dfn>
 
    <dd>
     <p>Consume the <a href="#next-input">next input character</a>:</p>
@@ -39735,7 +39796,7 @@
 
      <dt>U+0020 SPACE
 
-     <dd>Stay in the <a href="#after1">after DOCTYPE public identifier
+     <dd>Stay in the <a href="#after2">after DOCTYPE public identifier
       state</a>.
 
      <dt>U+0022 QUOTATION MARK (")
@@ -39827,7 +39888,7 @@
     <dl class=switch>
      <dt>U+0022 QUOTATION MARK (")
 
-     <dd>Switch to the <a href="#after2">after DOCTYPE system identifier
+     <dd>Switch to the <a href="#after3">after DOCTYPE system identifier
       state</a>.
 
      <dt>U+003E GREATER-THAN SIGN (>)
@@ -39858,7 +39919,7 @@
     <dl class=switch>
      <dt>U+0027 APOSTROPHE (')
 
-     <dd>Switch to the <a href="#after2">after DOCTYPE system identifier
+     <dd>Switch to the <a href="#after3">after DOCTYPE system identifier
       state</a>.
 
      <dt>U+003E GREATER-THAN SIGN (>)
@@ -39881,7 +39942,7 @@
       identifier (single-quoted) state</a>.
     </dl>
 
-   <dt><dfn id=after2>After DOCTYPE system identifier state</dfn>
+   <dt><dfn id=after3>After DOCTYPE system identifier state</dfn>
 
    <dd>
     <p>Consume the <a href="#next-input">next input character</a>:</p>
@@ -39898,7 +39959,7 @@
 
      <dt>U+0020 SPACE
 
-     <dd>Stay in the <a href="#after2">after DOCTYPE system identifier
+     <dd>Stay in the <a href="#after3">after DOCTYPE system identifier
       state</a>.
 
      <dt>U+003E GREATER-THAN SIGN (>)
@@ -41137,7 +41198,7 @@
    href="#before4" title="insertion mode: before head">before head</a>". It
    can change to "<a href="#in-head" title="insertion mode: in head">in
    head</a>", "<a href="#in-head0" title="insertion mode: in head
-   noscript">in head noscript</a>", "<a href="#after3" title="insertion mode:
+   noscript">in head noscript</a>", "<a href="#after4" title="insertion mode:
    after head">after head</a>", "<a href="#in-body" title="insertion mode: in
    body">in body</a>", "<a href="#in-table" title="insertion mode: in
    table">in table</a>", "<a href="#in-caption" title="insertion mode: in
@@ -41146,10 +41207,10 @@
    mode: in table body">in table body</a>", "<a href="#in-row"
    title="insertion mode: in row">in row</a>", "<a href="#in-cell"
    title="insertion mode: in cell">in cell</a>", "<a href="#in-select"
-   title="insertion mode: in select">in select</a>", "<a href="#after4"
+   title="insertion mode: in select">in select</a>", "<a href="#after5"
    title="insertion mode: after body">after body</a>", "<a
    href="#in-frameset" title="insertion mode: in frameset">in frameset</a>",
-   and "<a href="#after5" title="insertion mode: after frameset">after
+   and "<a href="#after6" title="insertion mode: after frameset">after
    frameset</a>" during the course of the parsing, as described below. It
    affects how certain tokens are processed.
 
@@ -41247,7 +41308,7 @@
     null, switch the <a href="#insertion0">insertion mode</a> to "<a
     href="#before4" title="insertion mode: before head">before head</a>",
     otherwise, switch the <a href="#insertion0">insertion mode</a> to "<a
-    href="#after3" title="insertion mode: after head">after head</a>". In
+    href="#after4" title="insertion mode: after head">after head</a>". In
     either case, abort these steps. (<a href="#fragment">fragment case</a>)</li>
    <!-- XXX
    can the head element pointer ever be non-null when we're going
@@ -41436,7 +41497,7 @@
 
         <p class=note>This will result in an empty <code><a
          href="#head">head</a></code> element being generated, with the
-         current token being reprocessed in the "<a href="#after3"
+         current token being reprocessed in the "<a href="#after4"
          title="insertion mode: after head">after head</a>" <a
          href="#insertion0">insertion mode</a>.</p>
       </dl>
@@ -41657,7 +41718,7 @@
          href="#stack">stack of open elements</a>.</p>
 
         <p>Change the <a href="#insertion0">insertion mode</a> to "<a
-         href="#after3" title="insertion mode: after head">after head</a>".</p>
+         href="#after4" title="insertion mode: after head">after head</a>".</p>
 
        <dt>An end tag whose tag name is one of: "body", "html", "p", "br"
 
@@ -41733,7 +41794,7 @@
          tag name "noscript" had been seen and reprocess the current token.</p>
       </dl>
 
-     <dt>If the <a href="#insertion0">insertion mode</a> is "<dfn id=after3
+     <dt>If the <a href="#insertion0">insertion mode</a> is "<dfn id=after4
       title="insertion mode: after head">after head</dfn>"
 
      <dd>
@@ -41882,7 +41943,7 @@
          href="#parse0">parse error</a>.
 
         <p>Change the <a href="#insertion0">insertion mode</a> to "<a
-         href="#after4" title="insertion mode: after body">after body</a>".</p>
+         href="#after5" title="insertion mode: after body">after body</a>".</p>
 
        <dt>An end tag whose tag name is "html"
 
@@ -43444,7 +43505,7 @@
       </dl>
 
      <dt id=parsing-main-afterbody>If the <a href="#insertion0">insertion
-      mode</a> is "<dfn id=after4 title="insertion mode: after body">after
+      mode</a> is "<dfn id=after5 title="insertion mode: after body">after
       body</dfn>"
 
      <dd>
@@ -43536,7 +43597,7 @@
          href="#fragment">fragment case</a>), and the <a
          href="#current4">current node</a> is no longer a
          <code>frameset</code> element, then change the <a
-         href="#insertion0">insertion mode</a> to "<a href="#after5"
+         href="#insertion0">insertion mode</a> to "<a href="#after6"
          title="insertion mode: after frameset">after frameset</a>".</p>
 
        <dt>A start tag whose tag name is "frame"
@@ -43560,7 +43621,7 @@
       </dl>
 
      <dt id=parsing-main-afterframeset>If the <a href="#insertion0">insertion
-      mode</a> is "<dfn id=after5 title="insertion mode: after
+      mode</a> is "<dfn id=after6 title="insertion mode: after
       frameset">after frameset</dfn>"
 
      <dd>

Modified: source
===================================================================
--- source	2008-03-02 12:02:46 UTC (rev 1302)
+++ source	2008-03-02 12:18:28 UTC (rev 1303)
@@ -34969,13 +34969,14 @@
   <p>Attributes have a name and a value. <dfn
   title="syntax-attribute-name">Attribute names</dfn> must consist of
   one or more characters other than the <span title="space
-  character">space characters</span>, U+003E GREATER-THAN SIGN (>),
-  U+002F SOLIDUS (/), U+003D EQUALS SIGN (=), the U+0000 NULL
-  character, the control characters, and any characters that are not
-  defined by Unicode. In the HTML syntax, attribute names may be
-  written with any mix of lower- and uppercase letters that, when
-  converted to all-lowercase<!-- ASCII case-insensitive -->, matches
-  the attribute's name; attribute names are case-insensitive.</p>
+  character">space characters</span>, U+0000 NULL, U+0022 QUOTATION
+  MARK (&#x22;), U+0027 APOSTROPHE (&#x27;), U+003E GREATER-THAN SIGN
+  (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters,
+  the control characters, and any characters that are not defined by
+  Unicode. In the HTML syntax, attribute names may be written with any
+  mix of lower- and uppercase letters that, when converted to
+  all-lowercase<!-- ASCII case-insensitive -->, matches the
+  attribute's name; attribute names are case-insensitive.</p>
 
   <p><dfn title="syntax-attribute-value">Attribute values</dfn> are a
   mixture of <span title="syntax-text">text</span> and <span
@@ -35023,11 +35024,10 @@
     title="syntax-attribute-value">attribute value</span>, which, in
     addition to the requirements given above for attribute values,
     must not contain any literal <span title="space character">space
-    characters</span>, U+003D EQUALS SIGN (=) characters, or U+003E
-    GREATER-THAN SIGN (<code>></code>) characters, and must not,
-    furthermore, start with either a literal U+0022 QUOTATION MARK
-    (<code>&#x22;</code>) character or a literal U+0027 APOSTROPHE
-    (<code>&#x27;</code>) character.</p>
+    characters</span>, a U+0022 QUOTATION MARK (<code>&#x22;</code>)
+    characters, U+0027 APOSTROPHE (<code>&#x27;</code>) characters,
+    U+003D EQUALS SIGN (<code>=</code>) characters, or U+003E
+    GREATER-THAN SIGN (<code>></code>) characters.</p>
 
     <div class="example">
 
@@ -35073,6 +35073,10 @@
 
     </div>
 
+    <p>If an attribute using the single-quoted attribute syntax is to
+    be followed by another attribute, then there must be a <span>space
+    character</span> separating the two.</p>
+
    </dd>
 
    <dt>Double-quoted attribute value syntax</dt>
@@ -35101,6 +35105,10 @@
 
     </div>
 
+    <p>If an attribute using the double-quoted attribute syntax is to
+    be followed by another attribute, then there must be a <span>space
+    character</span> separating the two.</p>
+
    </dd>
 
   </dl>
@@ -36226,6 +36234,10 @@
 
   <p>The tokeniser state machine is as follows:</p>
 
+  <!-- XXX should go through these reordering the entries so that
+  they're in some consistent order, like, by Unicode, errors last, or
+  something -->
+
   <dl>
 
    <dt><dfn>Data state</dfn></dt>
@@ -36528,11 +36540,11 @@
      slash</span>. Stay in the <span>before attribute name
      state</span>.</dd>
 
+     <dt>U+0022 QUOTATION MARK (")</dt>
+     <dt>U+0027 APOSTROPHE (')</dt>
      <dt>U+003D EQUALS SIGN (=)</dt>
-     <dd><span>Parse error</span>. Start a new attribute in the
-     current tag token. Set that attribute's name to the current input
-     character, and its value to the empty string. Switch to the
-     <span>attribute name state</span>.</dd>
+     <dd><span>Parse error</span>. Treat it as per the "anything else"
+     entry below.</dd>
 
      <dt>EOF</dt>
      <dd><span>Parse error</span>. Emit the current tag
@@ -36583,6 +36595,11 @@
      slash</span>. Switch to the <span>before attribute name
      state</span>.</dd>
 
+     <dt>U+0022 QUOTATION MARK (")</dt>
+     <dt>U+0027 APOSTROPHE (')</dt>
+     <dd><span>Parse error</span>. Treat it as per the "anything else"
+     entry below.</dd>
+
      <dt>EOF</dt>
      <dd><span>Parse error</span>. Emit the current tag
      token. Reconsume the EOF character in the <span>data
@@ -36685,9 +36702,8 @@
      state</span>.</dd>
 
      <dt>U+003D EQUALS SIGN (=)</dt>
-     <dd><span>Parse error</span>. Append the current input character
-     to the current attribute's value. Switch to the <span>attribute
-     value (unquoted) state</span>.</dd>
+     <dd><span>Parse error</span>. Treat it as per the "anything else"
+     entry below.</dd>
 
      <dt>EOF</dt>
      <dd><span>Parse error</span>. Emit the current tag
@@ -36712,7 +36728,8 @@
     <dl class="switch">
 
      <dt>U+0022 QUOTATION MARK (")</dt>
-     <dd>Switch to the <span>before attribute name state</span>.</dd>
+     <dd>Switch to the <span>after attribute value (quoted)
+     state</span>.</dd>
 
      <dt>U+0026 AMPERSAND (&)</dt>
      <dd>Switch to the <span>entity in attribute value state</span>,
@@ -36742,7 +36759,8 @@
     <dl class="switch">
 
      <dt>U+0027 APOSTROPHE (')</dt>
-     <dd>Switch to the <span>before attribute name state</span>.</dd>
+     <dd>Switch to the <span>after attribute value (quoted)
+     state</span>.</dd>
 
      <dt>U+0026 AMPERSAND (&)</dt>
      <dd>Switch to the <span>entity in attribute value state</span>,
@@ -36787,10 +36805,11 @@
      <dd>Emit the current tag token. Switch to the <span>data
      state</span>.</dd>
 
+     <dt>U+0022 QUOTATION MARK (")</dt>
+     <dt>U+0027 APOSTROPHE (')</dt>
      <dt>U+003D EQUALS SIGN (=)</dt>
-     <dd><span>Parse error</span>. Append the current input character
-     to the current attribute's value. Stay in the <span>attribute
-     value (unquoted) state</span>.</dd>
+     <dd><span>Parse error</span>. Treat it as per the "anything else"
+     entry below.</dd>
 
      <dt>EOF</dt>
      <dd><span>Parse error</span>. Emit the current tag
@@ -36823,6 +36842,39 @@
 
    </dd>
 
+   <dt><dfn>After attribute value (quoted) state</dfn></dt>
+
+   <dd>
+
+    <p>Consume the <span>next input character</span>:</p>
+
+    <dl class="switch">
+
+     <dt>U+0009 CHARACTER TABULATION</dt>
+     <dt>U+000A LINE FEED (LF)</dt>
+     <dt>U+000B LINE TABULATION</dt>
+     <dt>U+000C FORM FEED (FF)</dt>
+     <!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
+     <dt>U+0020 SPACE</dt>
+     <dd>Switch to the <span>before attribute name state</span>.</dd>
+
+     <dt>U+003E GREATER-THAN SIGN (>)</dt>
+     <dd>Emit the current tag token. Switch to the <span>data
+     state</span>.</dd>
+
+     <dt>U+002F SOLIDUS (/)</dt>
+     <dd><span>Parse error</span> unless this is a <span>permitted
+     slash</span>. Switch to the <span>before attribute name
+     state</span>.</dd>
+
+     <dt>Anything else</dt>
+     <dd><span>Parse error</span>. Reconsume the character in
+     the<span>before attribute name state</span>.</dd>
+
+    </dl>
+
+   </dd>
+
    <dt><dfn>Bogus comment state</dfn></dt>
 
    <dd>