[html5] r869 - /

Tue Jun 5 17:31:37 PDT 2007

Author: ianh
Date: 2007-06-05 17:31:36 -0700 (Tue, 05 Jun 2007)
New Revision: 869

Modified:
   index
   source
Log:
[act] (2) Handle entities in the range 128 to 159 (0x80 to 0x9F) as per legacy requirements.

Modified: index
===================================================================

--- index	2007-06-05 23:30:52 UTC (rev 868)
+++ index	2007-06-06 00:31:36 UTC (rev 869)
@@ -22,7 +22,7 @@
 
    <h1 id=html-5>HTML 5</h1>
 
-   <h2 class="no-num no-toc" id=working>Working Draft — 5 June 2007</h2>
+   <h2 class="no-num no-toc" id=working>Working Draft — 6 June 2007</h2>
 
    <p>You can take part in this work. <a
     href="http://www.whatwg.org/mailing-list">Join the working group's
@@ -32726,9 +32726,9 @@
    <dd>The ampersand must be followed by a U+0023 NUMBER SIGN
     (<code>#</code>) character, followed by one or more digits in the range
     U+0030 DIGIT ZERO .. U+0039 DIGIT NINE, representing a base-ten integer
-    that itself is a valid Unicode code point that isn't U+0000. The digits
-    must then be followed by a U+003B SEMICOLON character (<code
-    title="">;</code>).
+    that itself is a valid Unicode code point that is neither U+0000 nor a
+    character in the range U+0080 .. U+009F. The digits must then be followed
+    by a U+003B SEMICOLON character (<code title="">;</code>).
 
    <dt>Hexadecimal numeric entities
 
@@ -32739,8 +32739,9 @@
     ZERO .. U+0039 DIGIT NINE, U+0061 LATIN SMALL LETTER A .. U+0066 LATIN
     SMALL LETTER F, and U+0041 LATIN CAPITAL LETTER A .. U+0046 LATIN CAPITAL
     LETTER F, representing a base-sixteen integer that itself is a valid
-    Unicode code point that isn't U+0000. The digits must then be followed by
-    a U+003B SEMICOLON character (<code title="">;</code>).
+    Unicode code point that is neither U+0000 nor a character in the range
+    U+0080 .. U+009F. The digits must then be followed by a U+003B SEMICOLON
+    character (<code title="">;</code>).
   </dl>
 
   <h4 id=comments><span class=secno>8.1.5. </span>Comments</h4>
@@ -34370,12 +34371,189 @@
 
     <p>If one or more characters match the range, then take them all and
      interpret the string of characters as a number (either hexadecimal or
-     decimal as appropriate), and return a character token for the Unicode
-     character whose code point is that number. If the number is not a valid
-     Unicode character (e.g. if the number is higher than 1114111), or if the
-     number is zero, then return a character token for the U+FFFD REPLACEMENT
-     CHARACTER character instead.</p>
+     decimal as appropriate).
 
+    <p>If that number is in the range 128 to 159 (0x80 to 0x9F), then this is
+     a <a href="#parse">parse error</a>. In the following table, find the row
+     with that number in the first column, and return a character token for
+     the Unicode character given in the second column of that row.</p>
+
+    <table>
+     <thead>
+      <tr>
+       <th>Number
+
+       <th>Unicode character
+
+     <tbody>
+      <tr>
+       <td>0x80
+
+       <td>U+20AC EURO SIGN ('&x20AC')
+
+      <tr>
+       <td>0x81
+
+       <td>U+FFFD REPLACEMENT CHARACTER
+
+      <tr>
+       <td>0x82
+
+       <td>U+201A SINGLE LOW-9 QUOTATION MARK ('&x201A')
+
+      <tr>
+       <td>0x83
+
+       <td>U+0192 LATIN SMALL LETTER F WITH HOOK ('&x0192')
+
+      <tr>
+       <td>0x84
+
+       <td>U+201E DOUBLE LOW-9 QUOTATION MARK ('&x201E')
+
+      <tr>
+       <td>0x85
+
+       <td>U+2026 HORIZONTAL ELLIPSIS ('&x2026')
+
+      <tr>
+       <td>0x86
+
+       <td>U+2020 DAGGER ('&x2020')
+
+      <tr>
+       <td>0x87
+
+       <td>U+2021 DOUBLE DAGGER ('&x2021')
+
+      <tr>
+       <td>0x88
+
+       <td>U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT ('&x02C6')
+
+      <tr>
+       <td>0x89
+
+       <td>U+2030 PER MILLE SIGN ('&x2030')
+
+      <tr>
+       <td>0x8A
+
+       <td>U+0160 LATIN CAPITAL LETTER S WITH CARON ('&x0160')
+
+      <tr>
+       <td>0x8B
+
+       <td>U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK ('&x2039')
+
+      <tr>
+       <td>0x8C
+
+       <td>U+0152 LATIN CAPITAL LIGATURE OE ('&x0152')
+
+      <tr>
+       <td>0x8D
+
+       <td>U+FFFD REPLACEMENT CHARACTER
+
+      <tr>
+       <td>0x8E
+
+       <td>U+017D LATIN CAPITAL LETTER Z WITH CARON ('&x017D')
+
+      <tr>
+       <td>0x8F
+
+       <td>U+FFFD REPLACEMENT CHARACTER
+
+      <tr>
+       <td>0x90
+
+       <td>U+FFFD REPLACEMENT CHARACTER
+
+      <tr>
+       <td>0x91
+
+       <td>U+2018 LEFT SINGLE QUOTATION MARK ('&x2018')
+
+      <tr>
+       <td>0x92
+
+       <td>U+2019 RIGHT SINGLE QUOTATION MARK ('&x2019')
+
+      <tr>
+       <td>0x93
+
+       <td>U+201C LEFT DOUBLE QUOTATION MARK ('&x201C')
+
+      <tr>
+       <td>0x94
+
+       <td>U+201D RIGHT DOUBLE QUOTATION MARK ('&x201D')
+
+      <tr>
+       <td>0x95
+
+       <td>U+2022 BULLET ('&x2022')
+
+      <tr>
+       <td>0x96
+
+       <td>U+2013 EN DASH ('&x2013')
+
+      <tr>
+       <td>0x97
+
+       <td>U+2014 EM DASH ('&x2014')
+
+      <tr>
+       <td>0x98
+
+       <td>U+02DC SMALL TILDE ('&x02DC')
+
+      <tr>
+       <td>0x99
+
+       <td>U+2122 TRADE MARK SIGN ('&x2122')
+
+      <tr>
+       <td>0x9A
+
+       <td>U+0161 LATIN SMALL LETTER S WITH CARON ('&x0161')
+
+      <tr>
+       <td>0x9B
+
+       <td>U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ('&x203A')
+
+      <tr>
+       <td>0x9C
+
+       <td>U+0153 LATIN SMALL LIGATURE OE ('&x0153')
+
+      <tr>
+       <td>0x9D
+
+       <td>U+FFFD REPLACEMENT CHARACTER
+
+      <tr>
+       <td>0x9E
+
+       <td>U+017E LATIN SMALL LETTER Z WITH CARON ('&x017E')
+
+      <tr>
+       <td>0x9F
+
+       <td>U+0178 LATIN CAPITAL LETTER Y WITH DIAERESIS ('&x0178')
+    </table>
+
+    <p>Otherwise, if the number is not a valid Unicode character (e.g. if the
+     number is higher than 1114111), or if the number is zero, then return a
+     character token for the U+FFFD REPLACEMENT CHARACTER character instead.</p>
+
+    <p>Otherwise, return a character token for the Unicode character whose
+     code point is that number.
+
    <dt>Anything else
 
    <dd>

Modified: source
===================================================================
--- source	2007-06-05 23:30:52 UTC (rev 868)
+++ source	2007-06-06 00:31:36 UTC (rev 869)
@@ -30265,9 +30265,10 @@
    <dd>The ampersand must be followed by a U+0023 NUMBER SIGN
    (<code>#</code>) character, followed by one or more digits in the
    range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE, representing a
-   base-ten integer that itself is a valid Unicode code point that
-   isn't U+0000. The digits must then be followed by a U+003B
-   SEMICOLON character (<code title="">;</code>).</dd>
+   base-ten integer that itself is a valid Unicode code point that is
+   neither U+0000 nor a character in the range U+0080 .. U+009F. The
+   digits must then be followed by a U+003B SEMICOLON character (<code
+   title="">;</code>).</dd>
 
 
    <dt>Hexadecimal numeric entities</dt>
@@ -30280,8 +30281,9 @@
    LETTER A .. U+0066 LATIN SMALL LETTER F, and U+0041 LATIN CAPITAL
    LETTER A .. U+0046 LATIN CAPITAL LETTER F, representing a
    base-sixteen integer that itself is a valid Unicode code point that
-   isn't U+0000. The digits must then be followed by a U+003B
-   SEMICOLON character (<code title="">;</code>).</dd>
+   is neither U+0000 nor a character in the range U+0080
+   .. U+009F. The digits must then be followed by a U+003B SEMICOLON
+   character (<code title="">;</code>).</dd>
 
   </dl>
 
@@ -31935,13 +31937,60 @@
 
     <p>If one or more characters match the range, then take them all
     and interpret the string of characters as a number (either
-    hexadecimal or decimal as appropriate), and return a character
-    token for the Unicode character whose code point is that number. If
-    the number is not a valid Unicode character (e.g. if the number is
-    higher than 1114111), or if the number is zero, then return a
-    character token for the U+FFFD REPLACEMENT CHARACTER character
-    instead.</p>
+    hexadecimal or decimal as appropriate).
 
+    <p>If that number is in the range 128 to 159 (0x80 to 0x9F), then
+    this is a <span>parse error</span>. In the following table, find
+    the row with that number in the first column, and return a
+    character token for the Unicode character given in the second
+    column of that row.</p>
+
+    <table>
+     <thead>
+      <tr><th>Number <th>Unicode character
+     <tbody>
+      <tr><td>0x80 <td>U+20AC EURO SIGN ('&x20AC')
+      <tr><td>0x81 <td>U+FFFD REPLACEMENT CHARACTER
+      <tr><td>0x82 <td>U+201A SINGLE LOW-9 QUOTATION MARK ('&x201A')
+      <tr><td>0x83 <td>U+0192 LATIN SMALL LETTER F WITH HOOK ('&x0192')
+      <tr><td>0x84 <td>U+201E DOUBLE LOW-9 QUOTATION MARK ('&x201E')
+      <tr><td>0x85 <td>U+2026 HORIZONTAL ELLIPSIS ('&x2026')
+      <tr><td>0x86 <td>U+2020 DAGGER ('&x2020')
+      <tr><td>0x87 <td>U+2021 DOUBLE DAGGER ('&x2021')
+      <tr><td>0x88 <td>U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT ('&x02C6')
+      <tr><td>0x89 <td>U+2030 PER MILLE SIGN ('&x2030')
+      <tr><td>0x8A <td>U+0160 LATIN CAPITAL LETTER S WITH CARON ('&x0160')
+      <tr><td>0x8B <td>U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK ('&x2039')
+      <tr><td>0x8C <td>U+0152 LATIN CAPITAL LIGATURE OE ('&x0152')
+      <tr><td>0x8D <td>U+FFFD REPLACEMENT CHARACTER
+      <tr><td>0x8E <td>U+017D LATIN CAPITAL LETTER Z WITH CARON ('&x017D')
+      <tr><td>0x8F <td>U+FFFD REPLACEMENT CHARACTER
+      <tr><td>0x90 <td>U+FFFD REPLACEMENT CHARACTER
+      <tr><td>0x91 <td>U+2018 LEFT SINGLE QUOTATION MARK ('&x2018')
+      <tr><td>0x92 <td>U+2019 RIGHT SINGLE QUOTATION MARK ('&x2019')
+      <tr><td>0x93 <td>U+201C LEFT DOUBLE QUOTATION MARK ('&x201C')
+      <tr><td>0x94 <td>U+201D RIGHT DOUBLE QUOTATION MARK ('&x201D')
+      <tr><td>0x95 <td>U+2022 BULLET ('&x2022')
+      <tr><td>0x96 <td>U+2013 EN DASH ('&x2013')
+      <tr><td>0x97 <td>U+2014 EM DASH ('&x2014')
+      <tr><td>0x98 <td>U+02DC SMALL TILDE ('&x02DC')
+      <tr><td>0x99 <td>U+2122 TRADE MARK SIGN ('&x2122')
+      <tr><td>0x9A <td>U+0161 LATIN SMALL LETTER S WITH CARON ('&x0161')
+      <tr><td>0x9B <td>U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ('&x203A')
+      <tr><td>0x9C <td>U+0153 LATIN SMALL LIGATURE OE ('&x0153')
+      <tr><td>0x9D <td>U+FFFD REPLACEMENT CHARACTER
+      <tr><td>0x9E <td>U+017E LATIN SMALL LETTER Z WITH CARON ('&x017E')
+      <tr><td>0x9F <td>U+0178 LATIN CAPITAL LETTER Y WITH DIAERESIS ('&x0178')
+    </table>
+
+    <p>Otherwise, if the number is not a valid Unicode character
+    (e.g. if the number is higher than 1114111), or if the number is
+    zero, then return a character token for the U+FFFD REPLACEMENT
+    CHARACTER character instead.</p>
+
+    <p>Otherwise, return a character token for the Unicode character
+    whose code point is that number.
+
    </dd>
 
    <dt>Anything else</dt>