[html5] r4960 - [c] (0) Allow a few more unescaped &s. Fixing http://www.w3.org/Bugs/Public/show [...]

Fri Apr 2 16:18:06 PDT 2010

Author: ianh
Date: 2010-04-02 16:18:05 -0700 (Fri, 02 Apr 2010)
New Revision: 4960

Modified:
   complete.html
   index
   source
Log:
[c] (0) Allow a few more unescaped &s.
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=9352

Modified: complete.html
===================================================================

--- complete.html	2010-04-02 22:39:32 UTC (rev 4959)
+++ complete.html	2010-04-02 23:18:05 UTC (rev 4960)
@@ -2106,14 +2106,14 @@
      <pre class=bad><a href="?original=1&copy=2">Compare</a></pre>
 
      <p>To avoid this problem, all named character references are
-     required to end with a semicolon, and any ampersands followed by
-     letters are required to be escaped.</p>
+     required to end with a semicolon, and uses of named character
+     references without a semicolon are flagged as errors.</p>
 
      <p>Thus, the correct way to express the above cases is as
      follows:</p>
 
-     <pre><a href="?hello=1&amp;world=2">Demo</a></pre>
-     <pre><a href="?original=1&amp;copy=2">Compare</a></pre>
+     <pre><a href="?hello=1&world=2">Demo</a> <!-- &world is ok, since it's not a named character reference --></pre>
+     <pre><a href="?original=1&amp;copy=2">Compare</a> <!-- the & has to be escaped, since &copy <em>is</em> a named character reference --></pre>
 
     </div>
 
@@ -73494,9 +73494,12 @@
 
   <p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous
   ampersand</dfn> is a U+0026 AMPERSAND character (&) that is
-  followed by some <a href=#syntax-text title=syntax-text>text</a> other than a
-  <a href=#space-character>space character</a>, a U+003C LESS-THAN SIGN character
-  (<), or another U+0026 AMPERSAND character (&).</p>
+  followed by one or more characters in the range U+0030 DIGIT ZERO
+  (0) to U+0039 DIGIT NINE (9), U+0061 LATIN SMALL LETTER A to U+007A
+  LATIN SMALL LETTER Z, and U+0041 LATIN CAPITAL LETTER A to U+005A
+  LATIN CAPITAL LETTER Z, followed by a U+003B SEMICOLON character
+  (;), where these characters do not match any of the names given in
+  the <a href=#named-character-references>named character references</a> section.</p>
 
 
   <h4 id=cdata-sections><span class=secno>12.1.5 </span>CDATA sections</h4>
@@ -76888,13 +76891,15 @@
     column of the <a href=#named-character-references>named character references</a> table (in a
     <a href=#case-sensitive>case-sensitive</a> manner).</p>
 
-    <p>If no match can be made, then this is a <a href=#parse-error>parse
-    error</a>. No characters are consumed, and nothing is
-    returned.</p>
+    <p>If no match can be made, then no characters are consumed, and
+    nothing is returned. In this case, if the characters after the
+    U+0026 AMPERSAND character (&) consist of a sequence of one or
+    more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT
+    NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER
+    Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL
+    LETTER Z, followed by a U+003B SEMICOLON character (;), then this
+    is a <a href=#parse-error>parse error</a>.</p>
 
-    <p>If the last character matched is not a U+003B SEMICOLON
-    character (;), there is a <a href=#parse-error>parse error</a>.</p>
-
     <p>If the character reference is being consumed <a href=#character-reference-in-attribute-value-state title="character reference in attribute value state">as part of an
     attribute</a>, and the last character matched is not a U+003B
     SEMICOLON character (;), and the next character is either a U+003D
@@ -76906,19 +76911,23 @@
     (&) must be unconsumed, and nothing is returned.</p>
     <!-- "=" added because of http://www.w3.org/Bugs/Public/show_bug.cgi?id=9207#c5 -->
 
-    <p>Otherwise, return a character token for the character
-    corresponding to the character reference name (as given by the
-    second column of the <a href=#named-character-references>named character references</a>
-    table).</p>
+    <p>Otherwise, a character reference is parsed. If the last
+    character matched is not a U+003B SEMICOLON character (;), there
+    is a <a href=#parse-error>parse error</a>.</p>
 
+    <p>Return a character token for the character corresponding to the
+    character reference name (as given by the second column of the
+    <a href=#named-character-references>named character references</a> table).</p>
+
     <div class=example>
 
-     <p>If the markup contains <code title="">I'm &notit; I tell
-     you</code>, the character reference is parsed as "not", as in,
-     <code title="">I'm ¬it; I tell you</code>. But if the markup
+     <p>If the markup contains (not in an attribute) the string <code title="">I'm &notit; I tell you</code>, the character
+     reference is parsed as "not", as in, <code title="">I'm ¬it;
+     I tell you</code> (and this is a parse error). But if the markup
      was <code title="">I'm &notin; I tell you</code>, the
      character reference would be parsed as "notin;", resulting in
-     <code title="">I'm ∉ I tell you</code>.</p>
+     <code title="">I'm ∉ I tell you</code> (and no parse
+     error).</p>
 
     </div>
 

Modified: index
===================================================================
--- index	2010-04-02 22:39:32 UTC (rev 4959)
+++ index	2010-04-02 23:18:05 UTC (rev 4960)
@@ -2004,14 +2004,14 @@
      <pre class=bad><a href="?original=1&copy=2">Compare</a></pre>
 
      <p>To avoid this problem, all named character references are
-     required to end with a semicolon, and any ampersands followed by
-     letters are required to be escaped.</p>
+     required to end with a semicolon, and uses of named character
+     references without a semicolon are flagged as errors.</p>
 
      <p>Thus, the correct way to express the above cases is as
      follows:</p>
 
-     <pre><a href="?hello=1&amp;world=2">Demo</a></pre>
-     <pre><a href="?original=1&amp;copy=2">Compare</a></pre>
+     <pre><a href="?hello=1&world=2">Demo</a> <!-- &world is ok, since it's not a named character reference --></pre>
+     <pre><a href="?original=1&amp;copy=2">Compare</a> <!-- the & has to be escaped, since &copy <em>is</em> a named character reference --></pre>
 
     </div>
 
@@ -66766,9 +66766,12 @@
 
   <p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous
   ampersand</dfn> is a U+0026 AMPERSAND character (&) that is
-  followed by some <a href=#syntax-text title=syntax-text>text</a> other than a
-  <a href=#space-character>space character</a>, a U+003C LESS-THAN SIGN character
-  (<), or another U+0026 AMPERSAND character (&).</p>
+  followed by one or more characters in the range U+0030 DIGIT ZERO
+  (0) to U+0039 DIGIT NINE (9), U+0061 LATIN SMALL LETTER A to U+007A
+  LATIN SMALL LETTER Z, and U+0041 LATIN CAPITAL LETTER A to U+005A
+  LATIN CAPITAL LETTER Z, followed by a U+003B SEMICOLON character
+  (;), where these characters do not match any of the names given in
+  the <a href=#named-character-references>named character references</a> section.</p>
 
 
   <h4 id=cdata-sections><span class=secno>10.1.5 </span>CDATA sections</h4>
@@ -70160,13 +70163,15 @@
     column of the <a href=#named-character-references>named character references</a> table (in a
     <a href=#case-sensitive>case-sensitive</a> manner).</p>
 
-    <p>If no match can be made, then this is a <a href=#parse-error>parse
-    error</a>. No characters are consumed, and nothing is
-    returned.</p>
+    <p>If no match can be made, then no characters are consumed, and
+    nothing is returned. In this case, if the characters after the
+    U+0026 AMPERSAND character (&) consist of a sequence of one or
+    more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT
+    NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER
+    Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL
+    LETTER Z, followed by a U+003B SEMICOLON character (;), then this
+    is a <a href=#parse-error>parse error</a>.</p>
 
-    <p>If the last character matched is not a U+003B SEMICOLON
-    character (;), there is a <a href=#parse-error>parse error</a>.</p>
-
     <p>If the character reference is being consumed <a href=#character-reference-in-attribute-value-state title="character reference in attribute value state">as part of an
     attribute</a>, and the last character matched is not a U+003B
     SEMICOLON character (;), and the next character is either a U+003D
@@ -70178,19 +70183,23 @@
     (&) must be unconsumed, and nothing is returned.</p>
     <!-- "=" added because of http://www.w3.org/Bugs/Public/show_bug.cgi?id=9207#c5 -->
 
-    <p>Otherwise, return a character token for the character
-    corresponding to the character reference name (as given by the
-    second column of the <a href=#named-character-references>named character references</a>
-    table).</p>
+    <p>Otherwise, a character reference is parsed. If the last
+    character matched is not a U+003B SEMICOLON character (;), there
+    is a <a href=#parse-error>parse error</a>.</p>
 
+    <p>Return a character token for the character corresponding to the
+    character reference name (as given by the second column of the
+    <a href=#named-character-references>named character references</a> table).</p>
+
     <div class=example>
 
-     <p>If the markup contains <code title="">I'm &notit; I tell
-     you</code>, the character reference is parsed as "not", as in,
-     <code title="">I'm ¬it; I tell you</code>. But if the markup
+     <p>If the markup contains (not in an attribute) the string <code title="">I'm &notit; I tell you</code>, the character
+     reference is parsed as "not", as in, <code title="">I'm ¬it;
+     I tell you</code> (and this is a parse error). But if the markup
      was <code title="">I'm &notin; I tell you</code>, the
      character reference would be parsed as "notin;", resulting in
-     <code title="">I'm ∉ I tell you</code>.</p>
+     <code title="">I'm ∉ I tell you</code> (and no parse
+     error).</p>
 
     </div>
 

Modified: source
===================================================================
--- source	2010-04-02 22:39:32 UTC (rev 4959)
+++ source	2010-04-02 23:18:05 UTC (rev 4960)
@@ -937,14 +937,14 @@
      <pre class="bad"><a href="?original=1&copy=2">Compare</a></pre>
 
      <p>To avoid this problem, all named character references are
-     required to end with a semicolon, and any ampersands followed by
-     letters are required to be escaped.</p>
+     required to end with a semicolon, and uses of named character
+     references without a semicolon are flagged as errors.</p>
 
      <p>Thus, the correct way to express the above cases is as
      follows:</p>
 
-     <pre><a href="?hello=1&amp;world=2">Demo</a></pre>
-     <pre><a href="?original=1&amp;copy=2">Compare</a></pre>
+     <pre><a href="?hello=1&world=2">Demo</a> <!-- &world is ok, since it's not a named character reference --></pre>
+     <pre><a href="?original=1&amp;copy=2">Compare</a> <!-- the & has to be escaped, since &copy <em>is</em> a named character reference --></pre>
 
     </div>
 
@@ -83737,9 +83737,12 @@
 
   <p>An <dfn title="syntax-ambiguous-ampersand">ambiguous
   ampersand</dfn> is a U+0026 AMPERSAND character (&) that is
-  followed by some <span title="syntax-text">text</span> other than a
-  <span>space character</span>, a U+003C LESS-THAN SIGN character
-  (<), or another U+0026 AMPERSAND character (&).</p>
+  followed by one or more characters in the range U+0030 DIGIT ZERO
+  (0) to U+0039 DIGIT NINE (9), U+0061 LATIN SMALL LETTER A to U+007A
+  LATIN SMALL LETTER Z, and U+0041 LATIN CAPITAL LETTER A to U+005A
+  LATIN CAPITAL LETTER Z, followed by a U+003B SEMICOLON character
+  (;), where these characters do not match any of the names given in
+  the <span>named character references</span> section.</p>
 
 
   <h4>CDATA sections</h4>
@@ -87684,13 +87687,15 @@
     column of the <span>named character references</span> table (in a
     <span>case-sensitive</span> manner).</p>
 
-    <p>If no match can be made, then this is a <span>parse
-    error</span>. No characters are consumed, and nothing is
-    returned.</p>
+    <p>If no match can be made, then no characters are consumed, and
+    nothing is returned. In this case, if the characters after the
+    U+0026 AMPERSAND character (&) consist of a sequence of one or
+    more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT
+    NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER
+    Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL
+    LETTER Z, followed by a U+003B SEMICOLON character (;), then this
+    is a <span>parse error</span>.</p>
 
-    <p>If the last character matched is not a U+003B SEMICOLON
-    character (;), there is a <span>parse error</span>.</p>
-
     <p>If the character reference is being consumed <span
     title="character reference in attribute value state">as part of an
     attribute</span>, and the last character matched is not a U+003B
@@ -87703,19 +87708,24 @@
     (&) must be unconsumed, and nothing is returned.</p>
     <!-- "=" added because of http://www.w3.org/Bugs/Public/show_bug.cgi?id=9207#c5 -->
 
-    <p>Otherwise, return a character token for the character
-    corresponding to the character reference name (as given by the
-    second column of the <span>named character references</span>
-    table).</p>
+    <p>Otherwise, a character reference is parsed. If the last
+    character matched is not a U+003B SEMICOLON character (;), there
+    is a <span>parse error</span>.</p>
 
+    <p>Return a character token for the character corresponding to the
+    character reference name (as given by the second column of the
+    <span>named character references</span> table).</p>
+
     <div class="example">
 
-     <p>If the markup contains <code title="">I'm &notit; I tell
-     you</code>, the character reference is parsed as "not", as in,
-     <code title="">I'm ¬it; I tell you</code>. But if the markup
+     <p>If the markup contains (not in an attribute) the string <code
+     title="">I'm &notit; I tell you</code>, the character
+     reference is parsed as "not", as in, <code title="">I'm ¬it;
+     I tell you</code> (and this is a parse error). But if the markup
      was <code title="">I'm &notin; I tell you</code>, the
      character reference would be parsed as "notin;", resulting in
-     <code title="">I'm ∉ I tell you</code>.</p>
+     <code title="">I'm ∉ I tell you</code> (and no parse
+     error).</p>
 
     </div>