[html5] r869 - /
whatwg at whatwg.org
whatwg at whatwg.org
Tue Jun 5 17:31:37 PDT 2007
Author: ianh
Date: 2007-06-05 17:31:36 -0700 (Tue, 05 Jun 2007)
New Revision: 869
Modified:
index
source
Log:
[act] (2) Handle entities in the range 128 to 159 (0x80 to 0x9F) as per legacy requirements.
Modified: index
===================================================================
--- index 2007-06-05 23:30:52 UTC (rev 868)
+++ index 2007-06-06 00:31:36 UTC (rev 869)
@@ -22,7 +22,7 @@
<h1 id=html-5>HTML 5</h1>
- <h2 class="no-num no-toc" id=working>Working Draft — 5 June 2007</h2>
+ <h2 class="no-num no-toc" id=working>Working Draft — 6 June 2007</h2>
<p>You can take part in this work. <a
href="http://www.whatwg.org/mailing-list">Join the working group's
@@ -32726,9 +32726,9 @@
<dd>The ampersand must be followed by a U+0023 NUMBER SIGN
(<code>#</code>) character, followed by one or more digits in the range
U+0030 DIGIT ZERO .. U+0039 DIGIT NINE, representing a base-ten integer
- that itself is a valid Unicode code point that isn't U+0000. The digits
- must then be followed by a U+003B SEMICOLON character (<code
- title="">;</code>).
+ that itself is a valid Unicode code point that is neither U+0000 nor a
+ character in the range U+0080 .. U+009F. The digits must then be followed
+ by a U+003B SEMICOLON character (<code title="">;</code>).
<dt>Hexadecimal numeric entities
@@ -32739,8 +32739,9 @@
ZERO .. U+0039 DIGIT NINE, U+0061 LATIN SMALL LETTER A .. U+0066 LATIN
SMALL LETTER F, and U+0041 LATIN CAPITAL LETTER A .. U+0046 LATIN CAPITAL
LETTER F, representing a base-sixteen integer that itself is a valid
- Unicode code point that isn't U+0000. The digits must then be followed by
- a U+003B SEMICOLON character (<code title="">;</code>).
+ Unicode code point that is neither U+0000 nor a character in the range
+ U+0080 .. U+009F. The digits must then be followed by a U+003B SEMICOLON
+ character (<code title="">;</code>).
</dl>
<h4 id=comments><span class=secno>8.1.5. </span>Comments</h4>
@@ -34370,12 +34371,189 @@
<p>If one or more characters match the range, then take them all and
interpret the string of characters as a number (either hexadecimal or
- decimal as appropriate), and return a character token for the Unicode
- character whose code point is that number. If the number is not a valid
- Unicode character (e.g. if the number is higher than 1114111), or if the
- number is zero, then return a character token for the U+FFFD REPLACEMENT
- CHARACTER character instead.</p>
+ decimal as appropriate).
+ <p>If that number is in the range 128 to 159 (0x80 to 0x9F), then this is
+ a <a href="#parse">parse error</a>. In the following table, find the row
+ with that number in the first column, and return a character token for
+ the Unicode character given in the second column of that row.</p>
+
+ <table>
+ <thead>
+ <tr>
+ <th>Number
+
+ <th>Unicode character
+
+ <tbody>
+ <tr>
+ <td>0x80
+
+ <td>U+20AC EURO SIGN ('&x20AC')
+
+ <tr>
+ <td>0x81
+
+ <td>U+FFFD REPLACEMENT CHARACTER
+
+ <tr>
+ <td>0x82
+
+ <td>U+201A SINGLE LOW-9 QUOTATION MARK ('&x201A')
+
+ <tr>
+ <td>0x83
+
+ <td>U+0192 LATIN SMALL LETTER F WITH HOOK ('&x0192')
+
+ <tr>
+ <td>0x84
+
+ <td>U+201E DOUBLE LOW-9 QUOTATION MARK ('&x201E')
+
+ <tr>
+ <td>0x85
+
+ <td>U+2026 HORIZONTAL ELLIPSIS ('&x2026')
+
+ <tr>
+ <td>0x86
+
+ <td>U+2020 DAGGER ('&x2020')
+
+ <tr>
+ <td>0x87
+
+ <td>U+2021 DOUBLE DAGGER ('&x2021')
+
+ <tr>
+ <td>0x88
+
+ <td>U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT ('&x02C6')
+
+ <tr>
+ <td>0x89
+
+ <td>U+2030 PER MILLE SIGN ('&x2030')
+
+ <tr>
+ <td>0x8A
+
+ <td>U+0160 LATIN CAPITAL LETTER S WITH CARON ('&x0160')
+
+ <tr>
+ <td>0x8B
+
+ <td>U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK ('&x2039')
+
+ <tr>
+ <td>0x8C
+
+ <td>U+0152 LATIN CAPITAL LIGATURE OE ('&x0152')
+
+ <tr>
+ <td>0x8D
+
+ <td>U+FFFD REPLACEMENT CHARACTER
+
+ <tr>
+ <td>0x8E
+
+ <td>U+017D LATIN CAPITAL LETTER Z WITH CARON ('&x017D')
+
+ <tr>
+ <td>0x8F
+
+ <td>U+FFFD REPLACEMENT CHARACTER
+
+ <tr>
+ <td>0x90
+
+ <td>U+FFFD REPLACEMENT CHARACTER
+
+ <tr>
+ <td>0x91
+
+ <td>U+2018 LEFT SINGLE QUOTATION MARK ('&x2018')
+
+ <tr>
+ <td>0x92
+
+ <td>U+2019 RIGHT SINGLE QUOTATION MARK ('&x2019')
+
+ <tr>
+ <td>0x93
+
+ <td>U+201C LEFT DOUBLE QUOTATION MARK ('&x201C')
+
+ <tr>
+ <td>0x94
+
+ <td>U+201D RIGHT DOUBLE QUOTATION MARK ('&x201D')
+
+ <tr>
+ <td>0x95
+
+ <td>U+2022 BULLET ('&x2022')
+
+ <tr>
+ <td>0x96
+
+ <td>U+2013 EN DASH ('&x2013')
+
+ <tr>
+ <td>0x97
+
+ <td>U+2014 EM DASH ('&x2014')
+
+ <tr>
+ <td>0x98
+
+ <td>U+02DC SMALL TILDE ('&x02DC')
+
+ <tr>
+ <td>0x99
+
+ <td>U+2122 TRADE MARK SIGN ('&x2122')
+
+ <tr>
+ <td>0x9A
+
+ <td>U+0161 LATIN SMALL LETTER S WITH CARON ('&x0161')
+
+ <tr>
+ <td>0x9B
+
+ <td>U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ('&x203A')
+
+ <tr>
+ <td>0x9C
+
+ <td>U+0153 LATIN SMALL LIGATURE OE ('&x0153')
+
+ <tr>
+ <td>0x9D
+
+ <td>U+FFFD REPLACEMENT CHARACTER
+
+ <tr>
+ <td>0x9E
+
+ <td>U+017E LATIN SMALL LETTER Z WITH CARON ('&x017E')
+
+ <tr>
+ <td>0x9F
+
+ <td>U+0178 LATIN CAPITAL LETTER Y WITH DIAERESIS ('&x0178')
+ </table>
+
+ <p>Otherwise, if the number is not a valid Unicode character (e.g. if the
+ number is higher than 1114111), or if the number is zero, then return a
+ character token for the U+FFFD REPLACEMENT CHARACTER character instead.</p>
+
+ <p>Otherwise, return a character token for the Unicode character whose
+ code point is that number.
+
<dt>Anything else
<dd>
Modified: source
===================================================================
--- source 2007-06-05 23:30:52 UTC (rev 868)
+++ source 2007-06-06 00:31:36 UTC (rev 869)
@@ -30265,9 +30265,10 @@
<dd>The ampersand must be followed by a U+0023 NUMBER SIGN
(<code>#</code>) character, followed by one or more digits in the
range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE, representing a
- base-ten integer that itself is a valid Unicode code point that
- isn't U+0000. The digits must then be followed by a U+003B
- SEMICOLON character (<code title="">;</code>).</dd>
+ base-ten integer that itself is a valid Unicode code point that is
+ neither U+0000 nor a character in the range U+0080 .. U+009F. The
+ digits must then be followed by a U+003B SEMICOLON character (<code
+ title="">;</code>).</dd>
<dt>Hexadecimal numeric entities</dt>
@@ -30280,8 +30281,9 @@
LETTER A .. U+0066 LATIN SMALL LETTER F, and U+0041 LATIN CAPITAL
LETTER A .. U+0046 LATIN CAPITAL LETTER F, representing a
base-sixteen integer that itself is a valid Unicode code point that
- isn't U+0000. The digits must then be followed by a U+003B
- SEMICOLON character (<code title="">;</code>).</dd>
+ is neither U+0000 nor a character in the range U+0080
+ .. U+009F. The digits must then be followed by a U+003B SEMICOLON
+ character (<code title="">;</code>).</dd>
</dl>
@@ -31935,13 +31937,60 @@
<p>If one or more characters match the range, then take them all
and interpret the string of characters as a number (either
- hexadecimal or decimal as appropriate), and return a character
- token for the Unicode character whose code point is that number. If
- the number is not a valid Unicode character (e.g. if the number is
- higher than 1114111), or if the number is zero, then return a
- character token for the U+FFFD REPLACEMENT CHARACTER character
- instead.</p>
+ hexadecimal or decimal as appropriate).
+ <p>If that number is in the range 128 to 159 (0x80 to 0x9F), then
+ this is a <span>parse error</span>. In the following table, find
+ the row with that number in the first column, and return a
+ character token for the Unicode character given in the second
+ column of that row.</p>
+
+ <table>
+ <thead>
+ <tr><th>Number <th>Unicode character
+ <tbody>
+ <tr><td>0x80 <td>U+20AC EURO SIGN ('&x20AC')
+ <tr><td>0x81 <td>U+FFFD REPLACEMENT CHARACTER
+ <tr><td>0x82 <td>U+201A SINGLE LOW-9 QUOTATION MARK ('&x201A')
+ <tr><td>0x83 <td>U+0192 LATIN SMALL LETTER F WITH HOOK ('&x0192')
+ <tr><td>0x84 <td>U+201E DOUBLE LOW-9 QUOTATION MARK ('&x201E')
+ <tr><td>0x85 <td>U+2026 HORIZONTAL ELLIPSIS ('&x2026')
+ <tr><td>0x86 <td>U+2020 DAGGER ('&x2020')
+ <tr><td>0x87 <td>U+2021 DOUBLE DAGGER ('&x2021')
+ <tr><td>0x88 <td>U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT ('&x02C6')
+ <tr><td>0x89 <td>U+2030 PER MILLE SIGN ('&x2030')
+ <tr><td>0x8A <td>U+0160 LATIN CAPITAL LETTER S WITH CARON ('&x0160')
+ <tr><td>0x8B <td>U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK ('&x2039')
+ <tr><td>0x8C <td>U+0152 LATIN CAPITAL LIGATURE OE ('&x0152')
+ <tr><td>0x8D <td>U+FFFD REPLACEMENT CHARACTER
+ <tr><td>0x8E <td>U+017D LATIN CAPITAL LETTER Z WITH CARON ('&x017D')
+ <tr><td>0x8F <td>U+FFFD REPLACEMENT CHARACTER
+ <tr><td>0x90 <td>U+FFFD REPLACEMENT CHARACTER
+ <tr><td>0x91 <td>U+2018 LEFT SINGLE QUOTATION MARK ('&x2018')
+ <tr><td>0x92 <td>U+2019 RIGHT SINGLE QUOTATION MARK ('&x2019')
+ <tr><td>0x93 <td>U+201C LEFT DOUBLE QUOTATION MARK ('&x201C')
+ <tr><td>0x94 <td>U+201D RIGHT DOUBLE QUOTATION MARK ('&x201D')
+ <tr><td>0x95 <td>U+2022 BULLET ('&x2022')
+ <tr><td>0x96 <td>U+2013 EN DASH ('&x2013')
+ <tr><td>0x97 <td>U+2014 EM DASH ('&x2014')
+ <tr><td>0x98 <td>U+02DC SMALL TILDE ('&x02DC')
+ <tr><td>0x99 <td>U+2122 TRADE MARK SIGN ('&x2122')
+ <tr><td>0x9A <td>U+0161 LATIN SMALL LETTER S WITH CARON ('&x0161')
+ <tr><td>0x9B <td>U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ('&x203A')
+ <tr><td>0x9C <td>U+0153 LATIN SMALL LIGATURE OE ('&x0153')
+ <tr><td>0x9D <td>U+FFFD REPLACEMENT CHARACTER
+ <tr><td>0x9E <td>U+017E LATIN SMALL LETTER Z WITH CARON ('&x017E')
+ <tr><td>0x9F <td>U+0178 LATIN CAPITAL LETTER Y WITH DIAERESIS ('&x0178')
+ </table>
+
+ <p>Otherwise, if the number is not a valid Unicode character
+ (e.g. if the number is higher than 1114111), or if the number is
+ zero, then return a character token for the U+FFFD REPLACEMENT
+ CHARACTER character instead.</p>
+
+ <p>Otherwise, return a character token for the Unicode character
+ whose code point is that number.
+
</dd>
<dt>Anything else</dt>
More information about the Commit-Watchers
mailing list