[html5] r5530 - [giow] (2) Tighten up UTF-8 error handling definitions Fixing http://www.w3.org/ [...]

whatwg at whatwg.org whatwg at whatwg.org
Tue Sep 28 12:16:17 PDT 2010


Author: ianh
Date: 2010-09-28 12:16:16 -0700 (Tue, 28 Sep 2010)
New Revision: 5530

Modified:
   complete.html
   index
   source
Log:
[giow] (2) Tighten up UTF-8 error handling definitions
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=9663

Modified: complete.html
===================================================================
--- complete.html	2010-09-28 18:31:45 UTC (rev 5529)
+++ complete.html	2010-09-28 19:16:16 UTC (rev 5530)
@@ -313,61 +313,62 @@
      <li><a href=#dependencies><span class=secno>2.2.1 </span>Dependencies</a></li>
      <li><a href=#extensibility><span class=secno>2.2.2 </span>Extensibility</a></ol></li>
    <li><a href=#case-sensitivity-and-string-comparison><span class=secno>2.3 </span>Case-sensitivity and string comparison</a></li>
-   <li><a href=#common-microsyntaxes><span class=secno>2.4 </span>Common microsyntaxes</a>
+   <li><a href=#utf-8><span class=secno>2.4 </span>UTF-8</a></li>
+   <li><a href=#common-microsyntaxes><span class=secno>2.5 </span>Common microsyntaxes</a>
     <ol>
-     <li><a href=#common-parser-idioms><span class=secno>2.4.1 </span>Common parser idioms</a></li>
-     <li><a href=#boolean-attributes><span class=secno>2.4.2 </span>Boolean attributes</a></li>
-     <li><a href=#keywords-and-enumerated-attributes><span class=secno>2.4.3 </span>Keywords and enumerated attributes</a></li>
-     <li><a href=#numbers><span class=secno>2.4.4 </span>Numbers</a>
+     <li><a href=#common-parser-idioms><span class=secno>2.5.1 </span>Common parser idioms</a></li>
+     <li><a href=#boolean-attributes><span class=secno>2.5.2 </span>Boolean attributes</a></li>
+     <li><a href=#keywords-and-enumerated-attributes><span class=secno>2.5.3 </span>Keywords and enumerated attributes</a></li>
+     <li><a href=#numbers><span class=secno>2.5.4 </span>Numbers</a>
       <ol>
-       <li><a href=#non-negative-integers><span class=secno>2.4.4.1 </span>Non-negative integers</a></li>
-       <li><a href=#signed-integers><span class=secno>2.4.4.2 </span>Signed integers</a></li>
-       <li><a href=#real-numbers><span class=secno>2.4.4.3 </span>Real numbers</a></li>
-       <li><a href=#percentages-and-dimensions><span class=secno>2.4.4.4 </span>Percentages and lengths</a></li>
-       <li><a href=#lists-of-integers><span class=secno>2.4.4.5 </span>Lists of integers</a></li>
-       <li><a href=#lists-of-dimensions><span class=secno>2.4.4.6 </span>Lists of dimensions</a></ol></li>
-     <li><a href=#dates-and-times><span class=secno>2.4.5 </span>Dates and times</a>
+       <li><a href=#non-negative-integers><span class=secno>2.5.4.1 </span>Non-negative integers</a></li>
+       <li><a href=#signed-integers><span class=secno>2.5.4.2 </span>Signed integers</a></li>
+       <li><a href=#real-numbers><span class=secno>2.5.4.3 </span>Real numbers</a></li>
+       <li><a href=#percentages-and-dimensions><span class=secno>2.5.4.4 </span>Percentages and lengths</a></li>
+       <li><a href=#lists-of-integers><span class=secno>2.5.4.5 </span>Lists of integers</a></li>
+       <li><a href=#lists-of-dimensions><span class=secno>2.5.4.6 </span>Lists of dimensions</a></ol></li>
+     <li><a href=#dates-and-times><span class=secno>2.5.5 </span>Dates and times</a>
       <ol>
-       <li><a href=#months><span class=secno>2.4.5.1 </span>Months</a></li>
-       <li><a href=#dates><span class=secno>2.4.5.2 </span>Dates</a></li>
-       <li><a href=#times><span class=secno>2.4.5.3 </span>Times</a></li>
-       <li><a href=#local-dates-and-times><span class=secno>2.4.5.4 </span>Local dates and times</a></li>
-       <li><a href=#global-dates-and-times><span class=secno>2.4.5.5 </span>Global dates and times</a></li>
-       <li><a href=#weeks><span class=secno>2.4.5.6 </span>Weeks</a></li>
-       <li><a href=#vaguer-moments-in-time><span class=secno>2.4.5.7 </span>Vaguer moments in time</a></ol></li>
-     <li><a href=#colors><span class=secno>2.4.6 </span>Colors</a></li>
-     <li><a href=#space-separated-tokens><span class=secno>2.4.7 </span>Space-separated tokens</a></li>
-     <li><a href=#comma-separated-tokens><span class=secno>2.4.8 </span>Comma-separated tokens</a></li>
-     <li><a href=#syntax-references><span class=secno>2.4.9 </span>References</a></li>
-     <li><a href=#mq><span class=secno>2.4.10 </span>Media queries</a></ol></li>
-   <li><a href=#urls><span class=secno>2.5 </span>URLs</a>
+       <li><a href=#months><span class=secno>2.5.5.1 </span>Months</a></li>
+       <li><a href=#dates><span class=secno>2.5.5.2 </span>Dates</a></li>
+       <li><a href=#times><span class=secno>2.5.5.3 </span>Times</a></li>
+       <li><a href=#local-dates-and-times><span class=secno>2.5.5.4 </span>Local dates and times</a></li>
+       <li><a href=#global-dates-and-times><span class=secno>2.5.5.5 </span>Global dates and times</a></li>
+       <li><a href=#weeks><span class=secno>2.5.5.6 </span>Weeks</a></li>
+       <li><a href=#vaguer-moments-in-time><span class=secno>2.5.5.7 </span>Vaguer moments in time</a></ol></li>
+     <li><a href=#colors><span class=secno>2.5.6 </span>Colors</a></li>
+     <li><a href=#space-separated-tokens><span class=secno>2.5.7 </span>Space-separated tokens</a></li>
+     <li><a href=#comma-separated-tokens><span class=secno>2.5.8 </span>Comma-separated tokens</a></li>
+     <li><a href=#syntax-references><span class=secno>2.5.9 </span>References</a></li>
+     <li><a href=#mq><span class=secno>2.5.10 </span>Media queries</a></ol></li>
+   <li><a href=#urls><span class=secno>2.6 </span>URLs</a>
     <ol>
-     <li><a href=#terminology-0><span class=secno>2.5.1 </span>Terminology</a></li>
-     <li><a href=#dynamic-changes-to-base-urls><span class=secno>2.5.2 </span>Dynamic changes to base URLs</a></li>
-     <li><a href=#interfaces-for-url-manipulation><span class=secno>2.5.3 </span>Interfaces for URL manipulation</a></ol></li>
-   <li><a href=#fetching-resources><span class=secno>2.6 </span>Fetching resources</a>
+     <li><a href=#terminology-0><span class=secno>2.6.1 </span>Terminology</a></li>
+     <li><a href=#dynamic-changes-to-base-urls><span class=secno>2.6.2 </span>Dynamic changes to base URLs</a></li>
+     <li><a href=#interfaces-for-url-manipulation><span class=secno>2.6.3 </span>Interfaces for URL manipulation</a></ol></li>
+   <li><a href=#fetching-resources><span class=secno>2.7 </span>Fetching resources</a>
     <ol>
-     <li><a href=#concept-http-equivalent><span class=secno>2.6.1 </span>Protocol concepts</a></li>
-     <li><a href=#encrypted-http-and-related-security-concerns><span class=secno>2.6.2 </span>Encrypted HTTP and related security concerns</a></li>
-     <li><a href=#content-type-sniffing><span class=secno>2.6.3 </span>Determining the type of a resource</a></ol></li>
-   <li><a href=#common-dom-interfaces><span class=secno>2.7 </span>Common DOM interfaces</a>
+     <li><a href=#concept-http-equivalent><span class=secno>2.7.1 </span>Protocol concepts</a></li>
+     <li><a href=#encrypted-http-and-related-security-concerns><span class=secno>2.7.2 </span>Encrypted HTTP and related security concerns</a></li>
+     <li><a href=#content-type-sniffing><span class=secno>2.7.3 </span>Determining the type of a resource</a></ol></li>
+   <li><a href=#common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</a>
     <ol>
-     <li><a href=#reflecting-content-attributes-in-idl-attributes><span class=secno>2.7.1 </span>Reflecting content attributes in IDL attributes</a></li>
-     <li><a href=#collections-0><span class=secno>2.7.2 </span>Collections</a>
+     <li><a href=#reflecting-content-attributes-in-idl-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in IDL attributes</a></li>
+     <li><a href=#collections-0><span class=secno>2.8.2 </span>Collections</a>
       <ol>
-       <li><a href=#htmlcollection-0><span class=secno>2.7.2.1 </span>HTMLCollection</a></li>
-       <li><a href=#htmlallcollection-0><span class=secno>2.7.2.2 </span>HTMLAllCollection</a></li>
-       <li><a href=#htmlformcontrolscollection-0><span class=secno>2.7.2.3 </span>HTMLFormControlsCollection</a></li>
-       <li><a href=#htmloptionscollection-0><span class=secno>2.7.2.4 </span>HTMLOptionsCollection</a></li>
-       <li><a href=#htmlpropertiescollection-0><span class=secno>2.7.2.5 </span>HTMLPropertiesCollection</a></ol></li>
-     <li><a href=#domtokenlist-0><span class=secno>2.7.3 </span>DOMTokenList</a></li>
-     <li><a href=#domsettabletokenlist-0><span class=secno>2.7.4 </span>DOMSettableTokenList</a></li>
-     <li><a href=#safe-passing-of-structured-data><span class=secno>2.7.5 </span>Safe passing of structured data</a></li>
-     <li><a href=#domstringmap-0><span class=secno>2.7.6 </span>DOMStringMap</a></li>
-     <li><a href=#dom-feature-strings><span class=secno>2.7.7 </span>DOM feature strings</a></li>
-     <li><a href=#exceptions><span class=secno>2.7.8 </span>Exceptions</a></li>
-     <li><a href=#garbage-collection><span class=secno>2.7.9 </span>Garbage collection</a></ol></li>
-   <li><a href=#namespaces><span class=secno>2.8 </span>Namespaces</a></ol></li>
+       <li><a href=#htmlcollection-0><span class=secno>2.8.2.1 </span>HTMLCollection</a></li>
+       <li><a href=#htmlallcollection-0><span class=secno>2.8.2.2 </span>HTMLAllCollection</a></li>
+       <li><a href=#htmlformcontrolscollection-0><span class=secno>2.8.2.3 </span>HTMLFormControlsCollection</a></li>
+       <li><a href=#htmloptionscollection-0><span class=secno>2.8.2.4 </span>HTMLOptionsCollection</a></li>
+       <li><a href=#htmlpropertiescollection-0><span class=secno>2.8.2.5 </span>HTMLPropertiesCollection</a></ol></li>
+     <li><a href=#domtokenlist-0><span class=secno>2.8.3 </span>DOMTokenList</a></li>
+     <li><a href=#domsettabletokenlist-0><span class=secno>2.8.4 </span>DOMSettableTokenList</a></li>
+     <li><a href=#safe-passing-of-structured-data><span class=secno>2.8.5 </span>Safe passing of structured data</a></li>
+     <li><a href=#domstringmap-0><span class=secno>2.8.6 </span>DOMStringMap</a></li>
+     <li><a href=#dom-feature-strings><span class=secno>2.8.7 </span>DOM feature strings</a></li>
+     <li><a href=#exceptions><span class=secno>2.8.8 </span>Exceptions</a></li>
+     <li><a href=#garbage-collection><span class=secno>2.8.9 </span>Garbage collection</a></ol></li>
+   <li><a href=#namespaces><span class=secno>2.9 </span>Namespaces</a></ol></li>
  <li><a href=#dom><span class=secno>3 </span>Semantics, structure, and APIs of HTML documents</a>
   <ol>
    <li><a href=#documents><span class=secno>3.1 </span>Documents</a>
@@ -3502,8 +3503,58 @@
   two strings as matches of each other.</p>
 
 
-  <h3 id=common-microsyntaxes><span class=secno>2.4 </span>Common microsyntaxes</h3>
+  <h3 id=utf-8><span class=secno>2.4 </span>UTF-8</h3>
 
+  <p>When a user agent is required to <dfn id=decoded-as-utf-8,-with-error-handling title="decoded as UTF-8,
+  with error handling">decode a byte string as UTF-8, with error
+  handling</dfn>, it means that the byte stream must be converted to a
+  Unicode string by interpreting it as UTF-8, except that any errors
+  must be handled as described in the following list. Bytes in the
+  following list are represented in hexadecimal. <a href=#refsRFC3629>[RFC3629]</a>
+
+  <dl class=switch><dt>One byte in the range FE to FF</dt>
+
+   <dt>Overlong forms (e.g. F0 80 80 A0)</dt>
+
+   <dt>One byte in the range C0 to C1, followed by one byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range F0 to F4, followed by three bytes in the range 80 to BF that represent a code point above U+10FFFF</dt>
+
+   <dt>One byte in the range F5 to F7, followed by three bytes in the range 80 to BF</dt>
+
+   <dt>One byte in the range F8 to FB, followed by four bytes in the range 80 to BF</dt>
+
+   <dt>One byte in the range FC to FD, followed by five bytes in the range 80 to BF</dt>
+
+   <dt>One byte in the range E0 to FD, followed by a byte in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range F0 to FD, followed by two bytes in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range F5 to FD, followed by three bytes in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range FC to FD, followed by four bytes in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+
+   <dd>The whole sequence must be replaced by a single U+FFFD
+   REPLACEMENT CHARACTER.</dd>
+
+
+   <dt>One byte in the range 80 to BF not preceded by a byte in the range 80 to FD</dt>
+
+   <dt>A sequence of bytes in the range 80 to BF that does not follow a byte in the range C0 to FD</dt>
+
+   <dt>One byte in the range C0 to FD not followed by a byte in the range 80 to BF</dt>
+
+
+   <dd>Each byte must be replace with a U+FFFD REPLACEMENT CHARACTER.</dd>
+
+  </dl><p class=example>For example, the byte string "41 98 BA 42 E2 98
+  43 E2 98 BA E2 98" would be converted to the string
+  "A��B�C☺�".</p>
+
+
+  <h3 id=common-microsyntaxes><span class=secno>2.5 </span>Common microsyntaxes</h3>
+
   <p>There are various places in HTML that accept particular data
   types, such as dates or numbers. This section describes what the
   conformance criteria for content in those formats is, and how to
@@ -3525,7 +3576,7 @@
 
   <div class=impl>
 
-  <h4 id=common-parser-idioms><span class=secno>2.4.1 </span>Common parser idioms</h4>
+  <h4 id=common-parser-idioms><span class=secno>2.5.1 </span>Common parser idioms</h4>
 
   <p>The <dfn id=space-character title="space character">space characters</dfn>, for the
   purposes of this specification, are U+0020 SPACE, U+0009 CHARACTER
@@ -3587,7 +3638,7 @@
 
 
 
-  <h4 id=boolean-attributes><span class=secno>2.4.2 </span>Boolean attributes</h4>
+  <h4 id=boolean-attributes><span class=secno>2.5.2 </span>Boolean attributes</h4>
 
   <p>A number of attributes are <dfn id=boolean-attribute title="boolean attribute">boolean
   attributes</dfn>. The presence of a boolean attribute on an element
@@ -3623,7 +3674,7 @@
 
 
 
-  <h4 id=keywords-and-enumerated-attributes><span class=secno>2.4.3 </span>Keywords and enumerated attributes</h4>
+  <h4 id=keywords-and-enumerated-attributes><span class=secno>2.5.3 </span>Keywords and enumerated attributes</h4>
 
   <p>Some attributes are defined as taking one of a finite set of
   keywords. Such attributes are called <dfn id=enumerated-attribute title="enumerated
@@ -3660,9 +3711,9 @@
   <p class=note>The empty string can be a valid keyword.</p>
 
 
-  <h4 id=numbers><span class=secno>2.4.4 </span>Numbers</h4>
+  <h4 id=numbers><span class=secno>2.5.4 </span>Numbers</h4>
 
-  <h5 id=non-negative-integers><span class=secno>2.4.4.1 </span>Non-negative integers</h5>
+  <h5 id=non-negative-integers><span class=secno>2.5.4.1 </span>Non-negative integers</h5>
 
   <p>A string is a <dfn id=valid-non-negative-integer>valid non-negative integer</dfn> if it
   consists of one or more characters in the range U+0030 DIGIT ZERO
@@ -3712,7 +3763,7 @@
   </ol></div>
 
 
-  <h5 id=signed-integers><span class=secno>2.4.4.2 </span>Signed integers</h5>
+  <h5 id=signed-integers><span class=secno>2.5.4.2 </span>Signed integers</h5>
 
   <p>A string is a <dfn id=valid-integer>valid integer</dfn> if it consists of one or
   more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT
@@ -3791,7 +3842,7 @@
   </ol></div>
 
 
-  <h5 id=real-numbers><span class=secno>2.4.4.3 </span>Real numbers</h5>
+  <h5 id=real-numbers><span class=secno>2.5.4.3 </span>Real numbers</h5>
 
   <p>A string is a <dfn id=valid-floating-point-number>valid floating point number</dfn> if it
   consists of:</p>
@@ -3996,7 +4047,7 @@
 
 
 <div class=impl>
-  <h5 id=percentages-and-dimensions><span class=secno>2.4.4.4 </span>Percentages and lengths</h5>
+  <h5 id=percentages-and-dimensions><span class=secno>2.5.4.4 </span>Percentages and lengths</h5>
 <!--(percentages are not used in valid html anymore)
   <p>A string is a <dfn>valid dimension value</dfn> if it consists of
   a character in the range U+0031 DIGIT ONE (1) to U+0039 DIGIT NINE
@@ -4099,7 +4150,7 @@
   </ol></div>
 
 
-  <h5 id=lists-of-integers><span class=secno>2.4.4.5 </span>Lists of integers</h5>
+  <h5 id=lists-of-integers><span class=secno>2.5.4.5 </span>Lists of integers</h5>
 
   <p>A <dfn id=valid-list-of-integers>valid list of integers</dfn> is a number of <a href=#valid-integer title="valid integer">valid integers</a> separated by U+002C
   COMMA characters, with no other characters (e.g. no <a href=#space-character title="space character">space characters</a>). In addition, there
@@ -4370,7 +4421,7 @@
 
   <div class=impl>
 
-  <h5 id=lists-of-dimensions><span class=secno>2.4.4.6 </span>Lists of dimensions</h5>
+  <h5 id=lists-of-dimensions><span class=secno>2.5.4.6 </span>Lists of dimensions</h5>
 
   <!-- no definition of a type since no conforming feature uses this
   syntax (it's only used in cols="" and rows="" on <frameset> -->
@@ -4474,7 +4525,7 @@
   </ol></div>
 
 
-  <h4 id=dates-and-times><span class=secno>2.4.5 </span>Dates and times</h4>
+  <h4 id=dates-and-times><span class=secno>2.5.5 </span>Dates and times</h4>
 
   <p>In the algorithms below, the <dfn id=number-of-days-in-month-month-of-year-year>number of days in month <var title="">month</var> of year <var title="">year</var></dfn> is:
   <em>31</em> if <var title="">month</var> is 1, 3, 5, 7, 8, 10, or
@@ -4501,7 +4552,7 @@
   </div>
 
 
-  <h5 id=months><span class=secno>2.4.5.1 </span>Months</h5>
+  <h5 id=months><span class=secno>2.5.5.1 </span>Months</h5>
 
   <p>A <dfn id=concept-month title=concept-month>month</dfn> consists of a specific
   proleptic Gregorian date with no time-zone information and no date
@@ -4573,7 +4624,7 @@
   </ol></div>
 
 
-  <h5 id=dates><span class=secno>2.4.5.2 </span>Dates</h5>
+  <h5 id=dates><span class=secno>2.5.5.2 </span>Dates</h5>
 
   <p>A <dfn id=concept-date title=concept-date>date</dfn> consists of a specific
   proleptic Gregorian date with no time-zone information, consisting
@@ -4645,7 +4696,7 @@
   </ol></div>
 
 
-  <h5 id=times><span class=secno>2.4.5.3 </span>Times</h5>
+  <h5 id=times><span class=secno>2.5.5.3 </span>Times</h5>
 
   <p>A <dfn id=concept-time title=concept-time>time</dfn> consists of a specific
   time with no time-zone information, consisting of an hour, a minute,
@@ -4782,7 +4833,7 @@
   </ol></div>
 
 
-  <h5 id=local-dates-and-times><span class=secno>2.4.5.4 </span>Local dates and times</h5>
+  <h5 id=local-dates-and-times><span class=secno>2.5.5.4 </span>Local dates and times</h5>
 
   <p>A <dfn id=concept-datetime-local title=concept-datetime-local>local date and time</dfn>
   consists of a specific proleptic Gregorian date, consisting of a
@@ -4834,7 +4885,7 @@
 
 
 
-  <h5 id=global-dates-and-times><span class=secno>2.4.5.5 </span>Global dates and times</h5>
+  <h5 id=global-dates-and-times><span class=secno>2.5.5.5 </span>Global dates and times</h5>
 
   <p>A <dfn id=concept-datetime title=concept-datetime>global date and time</dfn>
   consists of a specific proleptic Gregorian date, consisting of a
@@ -5050,7 +5101,7 @@
   </ol></div>
 
 
-  <h5 id=weeks><span class=secno>2.4.5.6 </span>Weeks</h5>
+  <h5 id=weeks><span class=secno>2.5.5.6 </span>Weeks</h5>
 
   <p>A <dfn id=concept-week title=concept-week>week</dfn> consists of a week-year
   number and a week number representing a seven-day period starting on
@@ -5145,7 +5196,7 @@
   </ol></div>
 
 
-  <h5 id=vaguer-moments-in-time><span class=secno>2.4.5.7 </span>Vaguer moments in time</h5>
+  <h5 id=vaguer-moments-in-time><span class=secno>2.5.5.7 </span>Vaguer moments in time</h5>
 
   <p>A string is a <dfn id=valid-date-or-time-string>valid date or time string</dfn> if it is also
   one of the following:</p>
@@ -5255,7 +5306,7 @@
   </ol></div>
 
 
-  <h4 id=colors><span class=secno>2.4.6 </span>Colors</h4>
+  <h4 id=colors><span class=secno>2.5.6 </span>Colors</h4>
 
   <p>A <dfn id=simple-color>simple color</dfn> consists of three 8-bit numbers in the
   range 0..255, representing the red, green, and blue components of
@@ -5459,7 +5510,7 @@
   <!--2DCANVAS-->
 
 
-  <h4 id=space-separated-tokens><span class=secno>2.4.7 </span>Space-separated tokens</h4>
+  <h4 id=space-separated-tokens><span class=secno>2.5.7 </span>Space-separated tokens</h4>
 
   <p>A <dfn id=set-of-space-separated-tokens>set of space-separated tokens</dfn> is a string containing
   zero or more words separated by one or more <a href=#space-character title="space
@@ -5579,7 +5630,7 @@
 
 
 
-  <h4 id=comma-separated-tokens><span class=secno>2.4.8 </span>Comma-separated tokens</h4>
+  <h4 id=comma-separated-tokens><span class=secno>2.5.8 </span>Comma-separated tokens</h4>
 
   <p>A <dfn id=set-of-comma-separated-tokens>set of comma-separated tokens</dfn> is a string containing
   zero or more tokens each separated from the next by a single U+002C
@@ -5636,7 +5687,7 @@
 
 
 
-  <h4 id=syntax-references><span class=secno>2.4.9 </span>References</h4>
+  <h4 id=syntax-references><span class=secno>2.5.9 </span>References</h4>
 
   <p>A <dfn id=valid-hash-name-reference>valid hash-name reference</dfn> to an element of type <var title="">type</var> is a string consisting of a U+0023 NUMBER SIGN
   character (#) followed by a string which exactly matches the value
@@ -5675,7 +5726,7 @@
   </ol></div>
 
 
-  <h4 id=mq><span class=secno>2.4.10 </span>Media queries</h4>
+  <h4 id=mq><span class=secno>2.5.10 </span>Media queries</h4>
 
   <p>A string is a <dfn id=valid-media-query>valid media query</dfn> if it matches the
   <code title="">media_query_list</code> production of the Media
@@ -5690,9 +5741,9 @@
 
 
 
-  <h3 id=urls><span class=secno>2.5 </span>URLs</h3>
+  <h3 id=urls><span class=secno>2.6 </span>URLs</h3>
 
-  <h4 id=terminology-0><span class=secno>2.5.1 </span>Terminology</h4>
+  <h4 id=terminology-0><span class=secno>2.6.1 </span>Terminology</h4>
 
   <!-- see also: svn diff -r3244:3245 source -->
 
@@ -5882,7 +5933,7 @@
 
   <div class=impl>
 
-  <h4 id=dynamic-changes-to-base-urls><span class=secno>2.5.2 </span>Dynamic changes to base URLs</h4>
+  <h4 id=dynamic-changes-to-base-urls><span class=secno>2.6.2 </span>Dynamic changes to base URLs</h4>
 
   <p>When an <code title=attr-xml-base><a href=#the-xml:base-attribute-(xml-only)>xml:base</a></code> attribute
   changes, the attribute's element, and all descendant elements, are
@@ -5955,7 +6006,7 @@
 
 
 
-  <h4 id=interfaces-for-url-manipulation><span class=secno>2.5.3 </span>Interfaces for URL manipulation</h4>
+  <h4 id=interfaces-for-url-manipulation><span class=secno>2.6.3 </span>Interfaces for URL manipulation</h4>
 
   <p>An interface that has a complement of <dfn id=url-decomposition-idl-attributes>URL decomposition IDL
   attributes</dfn> will have seven attributes with the following
@@ -6156,7 +6207,7 @@
 
   <div class=impl>
 
-  <h3 id=fetching-resources><span class=secno>2.6 </span>Fetching resources</h3>
+  <h3 id=fetching-resources><span class=secno>2.7 </span>Fetching resources</h3>
 
   <p>When a user agent is to <dfn id=fetch>fetch</dfn> a resource or
   <a href=#url>URL</a>, optionally from an origin <i title="">origin</i>,
@@ -6369,7 +6420,7 @@
   applicable.</p>
 
 
-  <h4 id=concept-http-equivalent><span class=secno>2.6.1 </span>Protocol concepts</h4>
+  <h4 id=concept-http-equivalent><span class=secno>2.7.1 </span>Protocol concepts</h4>
 
   <p>User agents can implement a variety of transfer protocols, but
   this specification mostly defines behavior in terms of HTTP. <a href=#refsHTTP>[HTTP]</a></p>
@@ -6392,7 +6443,7 @@
   protocol.</p>
 
 
-  <h4 id=encrypted-http-and-related-security-concerns><span class=secno>2.6.2 </span>Encrypted HTTP and related security concerns</h4>
+  <h4 id=encrypted-http-and-related-security-concerns><span class=secno>2.7.2 </span>Encrypted HTTP and related security concerns</h4>
 
   <p>Anything in this specification that refers to HTTP also applies
   to HTTP-over-TLS, as represented by <a href=#url title=url>URLs</a>
@@ -6438,7 +6489,7 @@
   </div>
 
 
-  <h4 id=content-type-sniffing><span class=secno>2.6.3 </span>Determining the type of a resource</h4>
+  <h4 id=content-type-sniffing><span class=secno>2.7.3 </span>Determining the type of a resource</h4>
 
   <p>The <dfn id=content-type title=Content-Type>Content-Type metadata</dfn> of a
   resource must be obtained and interpreted in a manner consistent
@@ -6513,9 +6564,9 @@
 
 
 
-  <h3 id=common-dom-interfaces><span class=secno>2.7 </span>Common DOM interfaces</h3>
+  <h3 id=common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</h3>
 
-  <h4 id=reflecting-content-attributes-in-idl-attributes><span class=secno>2.7.1 </span>Reflecting content attributes in IDL attributes</h4>
+  <h4 id=reflecting-content-attributes-in-idl-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in IDL attributes</h4>
 
   <p>Some IDL attributes are defined to <dfn id=reflect>reflect</dfn> a
   particular content attribute. This means that on getting, the IDL
@@ -6717,7 +6768,7 @@
   </div>
 
 
-  <h4 id=collections-0><span class=secno>2.7.2 </span>Collections</h4>
+  <h4 id=collections-0><span class=secno>2.8.2 </span>Collections</h4>
 
   <p>The <code><a href=#htmlcollection>HTMLCollection</a></code>, <code><a href=#htmlallcollection>HTMLAllCollection</a></code>,
   <code><a href=#htmlformcontrolscollection>HTMLFormControlsCollection</a></code>,
@@ -6753,7 +6804,7 @@
   </div>
 
 
-  <h5 id=htmlcollection-0><span class=secno>2.7.2.1 </span>HTMLCollection</h5>
+  <h5 id=htmlcollection-0><span class=secno>2.8.2.1 </span>HTMLCollection</h5>
 
   <p>The <code><a href=#htmlcollection>HTMLCollection</a></code> interface represents a generic
   <a href=#collections title=collections>collection</a> of elements.</p>
@@ -6833,7 +6884,7 @@
   </div>
 
 
-  <h5 id=htmlallcollection-0><span class=secno>2.7.2.2 </span>HTMLAllCollection</h5>
+  <h5 id=htmlallcollection-0><span class=secno>2.8.2.2 </span>HTMLAllCollection</h5>
 
   <p>The <code><a href=#htmlallcollection>HTMLAllCollection</a></code> interface represents a generic
   <a href=#collections title=collections>collection</a> of elements just like
@@ -6932,7 +6983,7 @@
   </div>
 
 
-  <h5 id=htmlformcontrolscollection-0><span class=secno>2.7.2.3 </span>HTMLFormControlsCollection</h5>
+  <h5 id=htmlformcontrolscollection-0><span class=secno>2.8.2.3 </span>HTMLFormControlsCollection</h5>
 
   <p>The <code><a href=#htmlformcontrolscollection>HTMLFormControlsCollection</a></code> interface represents
   a <a href=#collections title=collections>collection</a> of <a href=#category-listed title=category-listed>listed elements</a> in <code><a href=#the-form-element>form</a></code>
@@ -7049,7 +7100,7 @@
 --></div>
 
 
-  <h5 id=htmloptionscollection-0><span class=secno>2.7.2.4 </span>HTMLOptionsCollection</h5>
+  <h5 id=htmloptionscollection-0><span class=secno>2.8.2.4 </span>HTMLOptionsCollection</h5>
 
   <p>The <code><a href=#htmloptionscollection>HTMLOptionsCollection</a></code> interface represents a
   list of <code><a href=#the-option-element>option</a></code> elements. It is always rooted on a
@@ -7228,7 +7279,7 @@
 <!--MD-->
   <div data-component="HTML Microdata (editor: Ian Hickson)">
 
-  <h5 id=htmlpropertiescollection-0><span class=secno>2.7.2.5 </span>HTMLPropertiesCollection</h5>
+  <h5 id=htmlpropertiescollection-0><span class=secno>2.8.2.5 </span>HTMLPropertiesCollection</h5>
 
   <p>The <code><a href=#htmlpropertiescollection>HTMLPropertiesCollection</a></code> interface represents a
   <a href=#collections title=collections>collection</a> of elements that add
@@ -7319,7 +7370,7 @@
 <!--MD-->
 
 
-  <h4 id=domtokenlist-0><span class=secno>2.7.3 </span>DOMTokenList</h4>
+  <h4 id=domtokenlist-0><span class=secno>2.8.3 </span>DOMTokenList</h4>
 
   <p>The <code><a href=#domtokenlist>DOMTokenList</a></code> interface represents an interface
   to an underlying string that consists of a <a href=#set-of-space-separated-tokens>set of
@@ -7505,7 +7556,7 @@
   </div>
 
 
-  <h4 id=domsettabletokenlist-0><span class=secno>2.7.4 </span>DOMSettableTokenList</h4>
+  <h4 id=domsettabletokenlist-0><span class=secno>2.8.4 </span>DOMSettableTokenList</h4>
 
   <p>The <code><a href=#domsettabletokenlist>DOMSettableTokenList</a></code> interface is the same as the
   <code><a href=#domtokenlist>DOMTokenList</a></code> interface, except that it allows the
@@ -7537,7 +7588,7 @@
 
   <div class=impl>
 
-  <h4 id=safe-passing-of-structured-data><span class=secno>2.7.5 </span>Safe passing of structured data</h4>
+  <h4 id=safe-passing-of-structured-data><span class=secno>2.8.5 </span>Safe passing of structured data</h4>
 
   <p>When a user agent is required to obtain a <dfn id=structured-clone>structured
   clone</dfn> of an object, it must run the following algorithm, which
@@ -7662,7 +7713,7 @@
   </dl></div>
 
 
-  <h4 id=domstringmap-0><span class=secno>2.7.6 </span>DOMStringMap</h4>
+  <h4 id=domstringmap-0><span class=secno>2.8.6 </span>DOMStringMap</h4>
 
   <p>The <code><a href=#domstringmap>DOMStringMap</a></code> interface represents a set of
   name-value pairs. It exposes these using the scripting language's
@@ -7745,7 +7796,7 @@
   </div>
 
 
-  <h4 id=dom-feature-strings><span class=secno>2.7.7 </span>DOM feature strings</h4>
+  <h4 id=dom-feature-strings><span class=secno>2.8.7 </span>DOM feature strings</h4>
 
   <p>DOM3 Core defines mechanisms for checking for interface support,
   and for obtaining implementations of interfaces, using <a href=http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMFeatures>feature
@@ -7767,7 +7818,7 @@
   </div>
 
 
-  <h4 id=exceptions><span class=secno>2.7.8 </span>Exceptions</h4>
+  <h4 id=exceptions><span class=secno>2.8.8 </span>Exceptions</h4>
 
   <p>The following are <code><a href=#domexception>DOMException</a></code> codes. <a href=#refsDOMCORE>[DOMCORE]</a></p>
 
@@ -7815,7 +7866,7 @@
 
   <div class=impl>
 
-  <h4 id=garbage-collection><span class=secno>2.7.9 </span>Garbage collection</h4>
+  <h4 id=garbage-collection><span class=secno>2.8.9 </span>Garbage collection</h4>
 
   <p>There is an <dfn id=implied-strong-reference>implied strong reference</dfn> from any IDL
   attribute that returns a pre-existing object to that object.</p>
@@ -7834,7 +7885,7 @@
   </div>
 
 
-  <h3 id=namespaces><span class=secno>2.8 </span>Namespaces</h3>
+  <h3 id=namespaces><span class=secno>2.9 </span>Namespaces</h3>
 
   <p>The <dfn id=html-namespace-0>HTML namespace</dfn> is: <code>http://www.w3.org/1999/xhtml</code></p>
 
@@ -8127,9 +8178,8 @@
   <code><a href=#security_err>SECURITY_ERR</a></code> exception. Otherwise, the user agent must
   first <a href=#obtain-the-storage-mutex>obtain the storage mutex</a> and then return the
   cookie-string for <a href="#the-document's-address">the document's address</a> for a
-  "non-HTTP" API, decoded as UTF-8, with bytes or sequences of bytes
-  that are not valid UTF-8 sequences interpreted as U+FFFD REPLACEMENT
-  CHARACTERs. <a href=#refsCOOKIES>[COOKIES]</a> <a href=#refsRFC3629>[RFC3629]</a></p>
+  "non-HTTP" API, <a href=#decoded-as-utf-8,-with-error-handling>decoded as UTF-8, with error handling</a>.
+  <a href=#refsCOOKIES>[COOKIES]</a></p>
 
   <p>On setting, if the document is a <a href=#cookie-free-document-object>cookie-free
   <code>Document</code> object</a>, then the user agent must do
@@ -28668,10 +28718,10 @@
   <ul class=brief><li><code><a href=#text/srt>text/srt</a></code></li>
   </ul><!--<p class="note">Not all of these MIME types are valid registered
   types.</p>--><p>When converting the bytes into Unicode characters, if the
-  encoding used is UTF-8, bytes or sequences of bytes that are not
-  valid UTF-8 sequences must be interpreted as a U+FFFD REPLACEMENT
-  CHARACTER, and all U+0000 NULL characters must be replaced by U+FFFD
-  REPLACEMENT CHARACTERs.</p>
+  encoding used is UTF-8, the bytes must be <a href=#decoded-as-utf-8,-with-error-handling title="decoded as
+  UTF-8, with error handling">decoded with the error handling</a>
+  defined in this specification, and all U+0000 NULL characters must
+  be replaced by U+FFFD REPLACEMENT CHARACTERs.</p>
 
   <p>The <dfn id=websrt-parser-algorithm>WebSRT parser algorithm</dfn> is as follows:</p>
 
@@ -62700,13 +62750,13 @@
   <p>When a user agent is to <dfn id=parse-a-manifest>parse a manifest</dfn>, it means
   that the user agent must run the following steps:</p>
 
-  <ol><li><p>The user agent must decode the byte stream corresponding with
-   the manifest to be parsed, treating it as UTF-8. Bytes or sequences
-   of bytes that are not valid UTF-8 sequences must be interpreted as
-   a U+FFFD REPLACEMENT CHARACTER. <!--All U+0000 NULL characters must
-   be replaced by U+FFFD REPLACEMENT CHARACTERs. (this isn't black-box
-   testable since neither U+0000 nor U+FFFD are valid anywhere in the
-   syntax and thus both will be treated the same anyway)--> <a href=#refsRFC3629>[RFC3629]</a></li>
+  <ol><li><p>The user agent must decode the byte stream corresponding
+   with the manifest to be parsed <a href=#decoded-as-utf-8,-with-error-handling title="decoded as UTF-8, with
+   error handling">as UTF-8, with error handling</a>. <!--All
+   U+0000 NULL characters must be replaced by U+FFFD REPLACEMENT
+   CHARACTERs. (this isn't black-box testable since neither U+0000 nor
+   U+FFFD are valid anywhere in the syntax and thus both will be
+   treated the same anyway)--></li>
 
    <li><p>Let <var title="">base URL</var> be the <a href=#absolute-url>absolute
    URL</a> representing the manifest.</li>
@@ -71724,8 +71774,10 @@
     <a href=#fire-a-simple-event>fire a simple event</a> named <code title=event-error>error</code> at that object. Abort these
     steps.</p>
 
-    <p>If the attempt succeeds, then convert the script resource to
-    Unicode by assuming it was encoded as UTF-8, to obtain its <var title="">source</var>. <a href=#refsRFC3629>[RFC3629]</a></p>
+    <p>If the attempt succeeds, then let <var title="">source</var> be
+    the script resource <a href=#decoded-as-utf-8,-with-error-handling>decoded as UTF-8, with error
+    handling</a>.
+    </p>
 
     <p>Let <var title="">language</var> be JavaScript.</p>
 
@@ -72469,8 +72521,10 @@
       <code><a href=#network_err>NETWORK_ERR</a></code> exception and abort all these
       steps.</p>
 
-      <p>If the attempt succeeds, then convert the script resource to
-      Unicode by assuming it was encoded as UTF-8, to obtain its <var title="">source</var>. <a href=#refsRFC3629>[RFC3629]</a></p>
+      <p>If the attempt succeeds, then let <var title="">source</var> be
+      the script resource <a href=#decoded-as-utf-8,-with-error-handling>decoded as UTF-8, with error
+      handling</a>.
+      </p>
 
       <p>Let <var title="">language</var> be JavaScript.</p>
 
@@ -73060,9 +73114,9 @@
 
   <h4 id=event-stream-interpretation><span class=secno>10.2.5 </span>Interpreting an event stream</h4>
 
-  <p>Streams must be decoded as UTF-8 text. Bytes or sequences of
-  bytes that are not valid UTF-8 sequences must be interpreted as the
-  U+FFFD REPLACEMENT CHARACTER. <a href=#refsRFC3629>[RFC3629]</a></p>
+  <p>Streams must be <a href=#decoded-as-utf-8,-with-error-handling>decoded as UTF-8, with error
+  handling</a>.
+  </p>
 
   <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
   any are present.</p>
@@ -77002,7 +77056,10 @@
 
   <p>Bytes or sequences of bytes in the original byte stream that
   could not be converted to Unicode code points must be converted to
-  U+FFFD REPLACEMENT CHARACTERs.</p>
+  U+FFFD REPLACEMENT CHARACTERs. Specifically, if the encoding is
+  UTF-8, the bytes must be <a href=#decoded-as-utf-8,-with-error-handling title="decoded as UTF-8, with error
+  handling">decoded with the error handling</a> defined in this
+  specification.</p>
 
   <p class=note>Bytes or sequences of bytes in the original byte
   stream that did not conform to the encoding specification

Modified: index
===================================================================
--- index	2010-09-28 18:31:45 UTC (rev 5529)
+++ index	2010-09-28 19:16:16 UTC (rev 5530)
@@ -320,61 +320,62 @@
      <li><a href=#dependencies><span class=secno>2.2.1 </span>Dependencies</a></li>
      <li><a href=#extensibility><span class=secno>2.2.2 </span>Extensibility</a></ol></li>
    <li><a href=#case-sensitivity-and-string-comparison><span class=secno>2.3 </span>Case-sensitivity and string comparison</a></li>
-   <li><a href=#common-microsyntaxes><span class=secno>2.4 </span>Common microsyntaxes</a>
+   <li><a href=#utf-8><span class=secno>2.4 </span>UTF-8</a></li>
+   <li><a href=#common-microsyntaxes><span class=secno>2.5 </span>Common microsyntaxes</a>
     <ol>
-     <li><a href=#common-parser-idioms><span class=secno>2.4.1 </span>Common parser idioms</a></li>
-     <li><a href=#boolean-attributes><span class=secno>2.4.2 </span>Boolean attributes</a></li>
-     <li><a href=#keywords-and-enumerated-attributes><span class=secno>2.4.3 </span>Keywords and enumerated attributes</a></li>
-     <li><a href=#numbers><span class=secno>2.4.4 </span>Numbers</a>
+     <li><a href=#common-parser-idioms><span class=secno>2.5.1 </span>Common parser idioms</a></li>
+     <li><a href=#boolean-attributes><span class=secno>2.5.2 </span>Boolean attributes</a></li>
+     <li><a href=#keywords-and-enumerated-attributes><span class=secno>2.5.3 </span>Keywords and enumerated attributes</a></li>
+     <li><a href=#numbers><span class=secno>2.5.4 </span>Numbers</a>
       <ol>
-       <li><a href=#non-negative-integers><span class=secno>2.4.4.1 </span>Non-negative integers</a></li>
-       <li><a href=#signed-integers><span class=secno>2.4.4.2 </span>Signed integers</a></li>
-       <li><a href=#real-numbers><span class=secno>2.4.4.3 </span>Real numbers</a></li>
-       <li><a href=#percentages-and-dimensions><span class=secno>2.4.4.4 </span>Percentages and lengths</a></li>
-       <li><a href=#lists-of-integers><span class=secno>2.4.4.5 </span>Lists of integers</a></li>
-       <li><a href=#lists-of-dimensions><span class=secno>2.4.4.6 </span>Lists of dimensions</a></ol></li>
-     <li><a href=#dates-and-times><span class=secno>2.4.5 </span>Dates and times</a>
+       <li><a href=#non-negative-integers><span class=secno>2.5.4.1 </span>Non-negative integers</a></li>
+       <li><a href=#signed-integers><span class=secno>2.5.4.2 </span>Signed integers</a></li>
+       <li><a href=#real-numbers><span class=secno>2.5.4.3 </span>Real numbers</a></li>
+       <li><a href=#percentages-and-dimensions><span class=secno>2.5.4.4 </span>Percentages and lengths</a></li>
+       <li><a href=#lists-of-integers><span class=secno>2.5.4.5 </span>Lists of integers</a></li>
+       <li><a href=#lists-of-dimensions><span class=secno>2.5.4.6 </span>Lists of dimensions</a></ol></li>
+     <li><a href=#dates-and-times><span class=secno>2.5.5 </span>Dates and times</a>
       <ol>
-       <li><a href=#months><span class=secno>2.4.5.1 </span>Months</a></li>
-       <li><a href=#dates><span class=secno>2.4.5.2 </span>Dates</a></li>
-       <li><a href=#times><span class=secno>2.4.5.3 </span>Times</a></li>
-       <li><a href=#local-dates-and-times><span class=secno>2.4.5.4 </span>Local dates and times</a></li>
-       <li><a href=#global-dates-and-times><span class=secno>2.4.5.5 </span>Global dates and times</a></li>
-       <li><a href=#weeks><span class=secno>2.4.5.6 </span>Weeks</a></li>
-       <li><a href=#vaguer-moments-in-time><span class=secno>2.4.5.7 </span>Vaguer moments in time</a></ol></li>
-     <li><a href=#colors><span class=secno>2.4.6 </span>Colors</a></li>
-     <li><a href=#space-separated-tokens><span class=secno>2.4.7 </span>Space-separated tokens</a></li>
-     <li><a href=#comma-separated-tokens><span class=secno>2.4.8 </span>Comma-separated tokens</a></li>
-     <li><a href=#syntax-references><span class=secno>2.4.9 </span>References</a></li>
-     <li><a href=#mq><span class=secno>2.4.10 </span>Media queries</a></ol></li>
-   <li><a href=#urls><span class=secno>2.5 </span>URLs</a>
+       <li><a href=#months><span class=secno>2.5.5.1 </span>Months</a></li>
+       <li><a href=#dates><span class=secno>2.5.5.2 </span>Dates</a></li>
+       <li><a href=#times><span class=secno>2.5.5.3 </span>Times</a></li>
+       <li><a href=#local-dates-and-times><span class=secno>2.5.5.4 </span>Local dates and times</a></li>
+       <li><a href=#global-dates-and-times><span class=secno>2.5.5.5 </span>Global dates and times</a></li>
+       <li><a href=#weeks><span class=secno>2.5.5.6 </span>Weeks</a></li>
+       <li><a href=#vaguer-moments-in-time><span class=secno>2.5.5.7 </span>Vaguer moments in time</a></ol></li>
+     <li><a href=#colors><span class=secno>2.5.6 </span>Colors</a></li>
+     <li><a href=#space-separated-tokens><span class=secno>2.5.7 </span>Space-separated tokens</a></li>
+     <li><a href=#comma-separated-tokens><span class=secno>2.5.8 </span>Comma-separated tokens</a></li>
+     <li><a href=#syntax-references><span class=secno>2.5.9 </span>References</a></li>
+     <li><a href=#mq><span class=secno>2.5.10 </span>Media queries</a></ol></li>
+   <li><a href=#urls><span class=secno>2.6 </span>URLs</a>
     <ol>
-     <li><a href=#terminology-0><span class=secno>2.5.1 </span>Terminology</a></li>
-     <li><a href=#dynamic-changes-to-base-urls><span class=secno>2.5.2 </span>Dynamic changes to base URLs</a></li>
-     <li><a href=#interfaces-for-url-manipulation><span class=secno>2.5.3 </span>Interfaces for URL manipulation</a></ol></li>
-   <li><a href=#fetching-resources><span class=secno>2.6 </span>Fetching resources</a>
+     <li><a href=#terminology-0><span class=secno>2.6.1 </span>Terminology</a></li>
+     <li><a href=#dynamic-changes-to-base-urls><span class=secno>2.6.2 </span>Dynamic changes to base URLs</a></li>
+     <li><a href=#interfaces-for-url-manipulation><span class=secno>2.6.3 </span>Interfaces for URL manipulation</a></ol></li>
+   <li><a href=#fetching-resources><span class=secno>2.7 </span>Fetching resources</a>
     <ol>
-     <li><a href=#concept-http-equivalent><span class=secno>2.6.1 </span>Protocol concepts</a></li>
-     <li><a href=#encrypted-http-and-related-security-concerns><span class=secno>2.6.2 </span>Encrypted HTTP and related security concerns</a></li>
-     <li><a href=#content-type-sniffing><span class=secno>2.6.3 </span>Determining the type of a resource</a></ol></li>
-   <li><a href=#common-dom-interfaces><span class=secno>2.7 </span>Common DOM interfaces</a>
+     <li><a href=#concept-http-equivalent><span class=secno>2.7.1 </span>Protocol concepts</a></li>
+     <li><a href=#encrypted-http-and-related-security-concerns><span class=secno>2.7.2 </span>Encrypted HTTP and related security concerns</a></li>
+     <li><a href=#content-type-sniffing><span class=secno>2.7.3 </span>Determining the type of a resource</a></ol></li>
+   <li><a href=#common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</a>
     <ol>
-     <li><a href=#reflecting-content-attributes-in-idl-attributes><span class=secno>2.7.1 </span>Reflecting content attributes in IDL attributes</a></li>
-     <li><a href=#collections-0><span class=secno>2.7.2 </span>Collections</a>
+     <li><a href=#reflecting-content-attributes-in-idl-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in IDL attributes</a></li>
+     <li><a href=#collections-0><span class=secno>2.8.2 </span>Collections</a>
       <ol>
-       <li><a href=#htmlcollection-0><span class=secno>2.7.2.1 </span>HTMLCollection</a></li>
-       <li><a href=#htmlallcollection-0><span class=secno>2.7.2.2 </span>HTMLAllCollection</a></li>
-       <li><a href=#htmlformcontrolscollection-0><span class=secno>2.7.2.3 </span>HTMLFormControlsCollection</a></li>
-       <li><a href=#htmloptionscollection-0><span class=secno>2.7.2.4 </span>HTMLOptionsCollection</a></li>
-       <li><a href=#htmlpropertiescollection-0><span class=secno>2.7.2.5 </span>HTMLPropertiesCollection</a></ol></li>
-     <li><a href=#domtokenlist-0><span class=secno>2.7.3 </span>DOMTokenList</a></li>
-     <li><a href=#domsettabletokenlist-0><span class=secno>2.7.4 </span>DOMSettableTokenList</a></li>
-     <li><a href=#safe-passing-of-structured-data><span class=secno>2.7.5 </span>Safe passing of structured data</a></li>
-     <li><a href=#domstringmap-0><span class=secno>2.7.6 </span>DOMStringMap</a></li>
-     <li><a href=#dom-feature-strings><span class=secno>2.7.7 </span>DOM feature strings</a></li>
-     <li><a href=#exceptions><span class=secno>2.7.8 </span>Exceptions</a></li>
-     <li><a href=#garbage-collection><span class=secno>2.7.9 </span>Garbage collection</a></ol></li>
-   <li><a href=#namespaces><span class=secno>2.8 </span>Namespaces</a></ol></li>
+       <li><a href=#htmlcollection-0><span class=secno>2.8.2.1 </span>HTMLCollection</a></li>
+       <li><a href=#htmlallcollection-0><span class=secno>2.8.2.2 </span>HTMLAllCollection</a></li>
+       <li><a href=#htmlformcontrolscollection-0><span class=secno>2.8.2.3 </span>HTMLFormControlsCollection</a></li>
+       <li><a href=#htmloptionscollection-0><span class=secno>2.8.2.4 </span>HTMLOptionsCollection</a></li>
+       <li><a href=#htmlpropertiescollection-0><span class=secno>2.8.2.5 </span>HTMLPropertiesCollection</a></ol></li>
+     <li><a href=#domtokenlist-0><span class=secno>2.8.3 </span>DOMTokenList</a></li>
+     <li><a href=#domsettabletokenlist-0><span class=secno>2.8.4 </span>DOMSettableTokenList</a></li>
+     <li><a href=#safe-passing-of-structured-data><span class=secno>2.8.5 </span>Safe passing of structured data</a></li>
+     <li><a href=#domstringmap-0><span class=secno>2.8.6 </span>DOMStringMap</a></li>
+     <li><a href=#dom-feature-strings><span class=secno>2.8.7 </span>DOM feature strings</a></li>
+     <li><a href=#exceptions><span class=secno>2.8.8 </span>Exceptions</a></li>
+     <li><a href=#garbage-collection><span class=secno>2.8.9 </span>Garbage collection</a></ol></li>
+   <li><a href=#namespaces><span class=secno>2.9 </span>Namespaces</a></ol></li>
  <li><a href=#dom><span class=secno>3 </span>Semantics, structure, and APIs of HTML documents</a>
   <ol>
    <li><a href=#documents><span class=secno>3.1 </span>Documents</a>
@@ -3479,8 +3480,58 @@
   two strings as matches of each other.</p>
 
 
-  <h3 id=common-microsyntaxes><span class=secno>2.4 </span>Common microsyntaxes</h3>
+  <h3 id=utf-8><span class=secno>2.4 </span>UTF-8</h3>
 
+  <p>When a user agent is required to <dfn id=decoded-as-utf-8,-with-error-handling title="decoded as UTF-8,
+  with error handling">decode a byte string as UTF-8, with error
+  handling</dfn>, it means that the byte stream must be converted to a
+  Unicode string by interpreting it as UTF-8, except that any errors
+  must be handled as described in the following list. Bytes in the
+  following list are represented in hexadecimal. <a href=#refsRFC3629>[RFC3629]</a>
+
+  <dl class=switch><dt>One byte in the range FE to FF</dt>
+
+   <dt>Overlong forms (e.g. F0 80 80 A0)</dt>
+
+   <dt>One byte in the range C0 to C1, followed by one byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range F0 to F4, followed by three bytes in the range 80 to BF that represent a code point above U+10FFFF</dt>
+
+   <dt>One byte in the range F5 to F7, followed by three bytes in the range 80 to BF</dt>
+
+   <dt>One byte in the range F8 to FB, followed by four bytes in the range 80 to BF</dt>
+
+   <dt>One byte in the range FC to FD, followed by five bytes in the range 80 to BF</dt>
+
+   <dt>One byte in the range E0 to FD, followed by a byte in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range F0 to FD, followed by two bytes in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range F5 to FD, followed by three bytes in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range FC to FD, followed by four bytes in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+
+   <dd>The whole sequence must be replaced by a single U+FFFD
+   REPLACEMENT CHARACTER.</dd>
+
+
+   <dt>One byte in the range 80 to BF not preceded by a byte in the range 80 to FD</dt>
+
+   <dt>A sequence of bytes in the range 80 to BF that does not follow a byte in the range C0 to FD</dt>
+
+   <dt>One byte in the range C0 to FD not followed by a byte in the range 80 to BF</dt>
+
+
+   <dd>Each byte must be replace with a U+FFFD REPLACEMENT CHARACTER.</dd>
+
+  </dl><p class=example>For example, the byte string "41 98 BA 42 E2 98
+  43 E2 98 BA E2 98" would be converted to the string
+  "A��B�C☺�".</p>
+
+
+  <h3 id=common-microsyntaxes><span class=secno>2.5 </span>Common microsyntaxes</h3>
+
   <p>There are various places in HTML that accept particular data
   types, such as dates or numbers. This section describes what the
   conformance criteria for content in those formats is, and how to
@@ -3502,7 +3553,7 @@
 
   <div class=impl>
 
-  <h4 id=common-parser-idioms><span class=secno>2.4.1 </span>Common parser idioms</h4>
+  <h4 id=common-parser-idioms><span class=secno>2.5.1 </span>Common parser idioms</h4>
 
   <p>The <dfn id=space-character title="space character">space characters</dfn>, for the
   purposes of this specification, are U+0020 SPACE, U+0009 CHARACTER
@@ -3564,7 +3615,7 @@
 
 
 
-  <h4 id=boolean-attributes><span class=secno>2.4.2 </span>Boolean attributes</h4>
+  <h4 id=boolean-attributes><span class=secno>2.5.2 </span>Boolean attributes</h4>
 
   <p>A number of attributes are <dfn id=boolean-attribute title="boolean attribute">boolean
   attributes</dfn>. The presence of a boolean attribute on an element
@@ -3600,7 +3651,7 @@
 
 
 
-  <h4 id=keywords-and-enumerated-attributes><span class=secno>2.4.3 </span>Keywords and enumerated attributes</h4>
+  <h4 id=keywords-and-enumerated-attributes><span class=secno>2.5.3 </span>Keywords and enumerated attributes</h4>
 
   <p>Some attributes are defined as taking one of a finite set of
   keywords. Such attributes are called <dfn id=enumerated-attribute title="enumerated
@@ -3637,9 +3688,9 @@
   <p class=note>The empty string can be a valid keyword.</p>
 
 
-  <h4 id=numbers><span class=secno>2.4.4 </span>Numbers</h4>
+  <h4 id=numbers><span class=secno>2.5.4 </span>Numbers</h4>
 
-  <h5 id=non-negative-integers><span class=secno>2.4.4.1 </span>Non-negative integers</h5>
+  <h5 id=non-negative-integers><span class=secno>2.5.4.1 </span>Non-negative integers</h5>
 
   <p>A string is a <dfn id=valid-non-negative-integer>valid non-negative integer</dfn> if it
   consists of one or more characters in the range U+0030 DIGIT ZERO
@@ -3689,7 +3740,7 @@
   </ol></div>
 
 
-  <h5 id=signed-integers><span class=secno>2.4.4.2 </span>Signed integers</h5>
+  <h5 id=signed-integers><span class=secno>2.5.4.2 </span>Signed integers</h5>
 
   <p>A string is a <dfn id=valid-integer>valid integer</dfn> if it consists of one or
   more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT
@@ -3768,7 +3819,7 @@
   </ol></div>
 
 
-  <h5 id=real-numbers><span class=secno>2.4.4.3 </span>Real numbers</h5>
+  <h5 id=real-numbers><span class=secno>2.5.4.3 </span>Real numbers</h5>
 
   <p>A string is a <dfn id=valid-floating-point-number>valid floating point number</dfn> if it
   consists of:</p>
@@ -3973,7 +4024,7 @@
 
 
 <div class=impl>
-  <h5 id=percentages-and-dimensions><span class=secno>2.4.4.4 </span>Percentages and lengths</h5>
+  <h5 id=percentages-and-dimensions><span class=secno>2.5.4.4 </span>Percentages and lengths</h5>
 <!--(percentages are not used in valid html anymore)
   <p>A string is a <dfn>valid dimension value</dfn> if it consists of
   a character in the range U+0031 DIGIT ONE (1) to U+0039 DIGIT NINE
@@ -4076,7 +4127,7 @@
   </ol></div>
 
 
-  <h5 id=lists-of-integers><span class=secno>2.4.4.5 </span>Lists of integers</h5>
+  <h5 id=lists-of-integers><span class=secno>2.5.4.5 </span>Lists of integers</h5>
 
   <p>A <dfn id=valid-list-of-integers>valid list of integers</dfn> is a number of <a href=#valid-integer title="valid integer">valid integers</a> separated by U+002C
   COMMA characters, with no other characters (e.g. no <a href=#space-character title="space character">space characters</a>). In addition, there
@@ -4347,7 +4398,7 @@
 
   <div class=impl>
 
-  <h5 id=lists-of-dimensions><span class=secno>2.4.4.6 </span>Lists of dimensions</h5>
+  <h5 id=lists-of-dimensions><span class=secno>2.5.4.6 </span>Lists of dimensions</h5>
 
   <!-- no definition of a type since no conforming feature uses this
   syntax (it's only used in cols="" and rows="" on <frameset> -->
@@ -4451,7 +4502,7 @@
   </ol></div>
 
 
-  <h4 id=dates-and-times><span class=secno>2.4.5 </span>Dates and times</h4>
+  <h4 id=dates-and-times><span class=secno>2.5.5 </span>Dates and times</h4>
 
   <p>In the algorithms below, the <dfn id=number-of-days-in-month-month-of-year-year>number of days in month <var title="">month</var> of year <var title="">year</var></dfn> is:
   <em>31</em> if <var title="">month</var> is 1, 3, 5, 7, 8, 10, or
@@ -4478,7 +4529,7 @@
   </div>
 
 
-  <h5 id=months><span class=secno>2.4.5.1 </span>Months</h5>
+  <h5 id=months><span class=secno>2.5.5.1 </span>Months</h5>
 
   <p>A <dfn id=concept-month title=concept-month>month</dfn> consists of a specific
   proleptic Gregorian date with no time-zone information and no date
@@ -4550,7 +4601,7 @@
   </ol></div>
 
 
-  <h5 id=dates><span class=secno>2.4.5.2 </span>Dates</h5>
+  <h5 id=dates><span class=secno>2.5.5.2 </span>Dates</h5>
 
   <p>A <dfn id=concept-date title=concept-date>date</dfn> consists of a specific
   proleptic Gregorian date with no time-zone information, consisting
@@ -4622,7 +4673,7 @@
   </ol></div>
 
 
-  <h5 id=times><span class=secno>2.4.5.3 </span>Times</h5>
+  <h5 id=times><span class=secno>2.5.5.3 </span>Times</h5>
 
   <p>A <dfn id=concept-time title=concept-time>time</dfn> consists of a specific
   time with no time-zone information, consisting of an hour, a minute,
@@ -4759,7 +4810,7 @@
   </ol></div>
 
 
-  <h5 id=local-dates-and-times><span class=secno>2.4.5.4 </span>Local dates and times</h5>
+  <h5 id=local-dates-and-times><span class=secno>2.5.5.4 </span>Local dates and times</h5>
 
   <p>A <dfn id=concept-datetime-local title=concept-datetime-local>local date and time</dfn>
   consists of a specific proleptic Gregorian date, consisting of a
@@ -4811,7 +4862,7 @@
 
 
 
-  <h5 id=global-dates-and-times><span class=secno>2.4.5.5 </span>Global dates and times</h5>
+  <h5 id=global-dates-and-times><span class=secno>2.5.5.5 </span>Global dates and times</h5>
 
   <p>A <dfn id=concept-datetime title=concept-datetime>global date and time</dfn>
   consists of a specific proleptic Gregorian date, consisting of a
@@ -5027,7 +5078,7 @@
   </ol></div>
 
 
-  <h5 id=weeks><span class=secno>2.4.5.6 </span>Weeks</h5>
+  <h5 id=weeks><span class=secno>2.5.5.6 </span>Weeks</h5>
 
   <p>A <dfn id=concept-week title=concept-week>week</dfn> consists of a week-year
   number and a week number representing a seven-day period starting on
@@ -5122,7 +5173,7 @@
   </ol></div>
 
 
-  <h5 id=vaguer-moments-in-time><span class=secno>2.4.5.7 </span>Vaguer moments in time</h5>
+  <h5 id=vaguer-moments-in-time><span class=secno>2.5.5.7 </span>Vaguer moments in time</h5>
 
   <p>A string is a <dfn id=valid-date-or-time-string>valid date or time string</dfn> if it is also
   one of the following:</p>
@@ -5232,7 +5283,7 @@
   </ol></div>
 
 
-  <h4 id=colors><span class=secno>2.4.6 </span>Colors</h4>
+  <h4 id=colors><span class=secno>2.5.6 </span>Colors</h4>
 
   <p>A <dfn id=simple-color>simple color</dfn> consists of three 8-bit numbers in the
   range 0..255, representing the red, green, and blue components of
@@ -5436,7 +5487,7 @@
   <!--2DCANVAS-->
 
 
-  <h4 id=space-separated-tokens><span class=secno>2.4.7 </span>Space-separated tokens</h4>
+  <h4 id=space-separated-tokens><span class=secno>2.5.7 </span>Space-separated tokens</h4>
 
   <p>A <dfn id=set-of-space-separated-tokens>set of space-separated tokens</dfn> is a string containing
   zero or more words separated by one or more <a href=#space-character title="space
@@ -5556,7 +5607,7 @@
 
 
 
-  <h4 id=comma-separated-tokens><span class=secno>2.4.8 </span>Comma-separated tokens</h4>
+  <h4 id=comma-separated-tokens><span class=secno>2.5.8 </span>Comma-separated tokens</h4>
 
   <p>A <dfn id=set-of-comma-separated-tokens>set of comma-separated tokens</dfn> is a string containing
   zero or more tokens each separated from the next by a single U+002C
@@ -5613,7 +5664,7 @@
 
 
 
-  <h4 id=syntax-references><span class=secno>2.4.9 </span>References</h4>
+  <h4 id=syntax-references><span class=secno>2.5.9 </span>References</h4>
 
   <p>A <dfn id=valid-hash-name-reference>valid hash-name reference</dfn> to an element of type <var title="">type</var> is a string consisting of a U+0023 NUMBER SIGN
   character (#) followed by a string which exactly matches the value
@@ -5652,7 +5703,7 @@
   </ol></div>
 
 
-  <h4 id=mq><span class=secno>2.4.10 </span>Media queries</h4>
+  <h4 id=mq><span class=secno>2.5.10 </span>Media queries</h4>
 
   <p>A string is a <dfn id=valid-media-query>valid media query</dfn> if it matches the
   <code title="">media_query_list</code> production of the Media
@@ -5667,9 +5718,9 @@
 
 
 
-  <h3 id=urls><span class=secno>2.5 </span>URLs</h3>
+  <h3 id=urls><span class=secno>2.6 </span>URLs</h3>
 
-  <h4 id=terminology-0><span class=secno>2.5.1 </span>Terminology</h4>
+  <h4 id=terminology-0><span class=secno>2.6.1 </span>Terminology</h4>
 
   <!-- see also: svn diff -r3244:3245 source -->
 
@@ -5859,7 +5910,7 @@
 
   <div class=impl>
 
-  <h4 id=dynamic-changes-to-base-urls><span class=secno>2.5.2 </span>Dynamic changes to base URLs</h4>
+  <h4 id=dynamic-changes-to-base-urls><span class=secno>2.6.2 </span>Dynamic changes to base URLs</h4>
 
   <p>When an <code title=attr-xml-base><a href=#the-xml:base-attribute-(xml-only)>xml:base</a></code> attribute
   changes, the attribute's element, and all descendant elements, are
@@ -5932,7 +5983,7 @@
 
 
 
-  <h4 id=interfaces-for-url-manipulation><span class=secno>2.5.3 </span>Interfaces for URL manipulation</h4>
+  <h4 id=interfaces-for-url-manipulation><span class=secno>2.6.3 </span>Interfaces for URL manipulation</h4>
 
   <p>An interface that has a complement of <dfn id=url-decomposition-idl-attributes>URL decomposition IDL
   attributes</dfn> will have seven attributes with the following
@@ -6133,7 +6184,7 @@
 
   <div class=impl>
 
-  <h3 id=fetching-resources><span class=secno>2.6 </span>Fetching resources</h3>
+  <h3 id=fetching-resources><span class=secno>2.7 </span>Fetching resources</h3>
 
   <p>When a user agent is to <dfn id=fetch>fetch</dfn> a resource or
   <a href=#url>URL</a>, optionally from an origin <i title="">origin</i>,
@@ -6346,7 +6397,7 @@
   applicable.</p>
 
 
-  <h4 id=concept-http-equivalent><span class=secno>2.6.1 </span>Protocol concepts</h4>
+  <h4 id=concept-http-equivalent><span class=secno>2.7.1 </span>Protocol concepts</h4>
 
   <p>User agents can implement a variety of transfer protocols, but
   this specification mostly defines behavior in terms of HTTP. <a href=#refsHTTP>[HTTP]</a></p>
@@ -6369,7 +6420,7 @@
   protocol.</p>
 
 
-  <h4 id=encrypted-http-and-related-security-concerns><span class=secno>2.6.2 </span>Encrypted HTTP and related security concerns</h4>
+  <h4 id=encrypted-http-and-related-security-concerns><span class=secno>2.7.2 </span>Encrypted HTTP and related security concerns</h4>
 
   <p>Anything in this specification that refers to HTTP also applies
   to HTTP-over-TLS, as represented by <a href=#url title=url>URLs</a>
@@ -6415,7 +6466,7 @@
   </div>
 
 
-  <h4 id=content-type-sniffing><span class=secno>2.6.3 </span>Determining the type of a resource</h4>
+  <h4 id=content-type-sniffing><span class=secno>2.7.3 </span>Determining the type of a resource</h4>
 
   <p>The <dfn id=content-type title=Content-Type>Content-Type metadata</dfn> of a
   resource must be obtained and interpreted in a manner consistent
@@ -6490,9 +6541,9 @@
 
 
 
-  <h3 id=common-dom-interfaces><span class=secno>2.7 </span>Common DOM interfaces</h3>
+  <h3 id=common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</h3>
 
-  <h4 id=reflecting-content-attributes-in-idl-attributes><span class=secno>2.7.1 </span>Reflecting content attributes in IDL attributes</h4>
+  <h4 id=reflecting-content-attributes-in-idl-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in IDL attributes</h4>
 
   <p>Some IDL attributes are defined to <dfn id=reflect>reflect</dfn> a
   particular content attribute. This means that on getting, the IDL
@@ -6694,7 +6745,7 @@
   </div>
 
 
-  <h4 id=collections-0><span class=secno>2.7.2 </span>Collections</h4>
+  <h4 id=collections-0><span class=secno>2.8.2 </span>Collections</h4>
 
   <p>The <code><a href=#htmlcollection>HTMLCollection</a></code>, <code><a href=#htmlallcollection>HTMLAllCollection</a></code>,
   <code><a href=#htmlformcontrolscollection>HTMLFormControlsCollection</a></code>,
@@ -6730,7 +6781,7 @@
   </div>
 
 
-  <h5 id=htmlcollection-0><span class=secno>2.7.2.1 </span>HTMLCollection</h5>
+  <h5 id=htmlcollection-0><span class=secno>2.8.2.1 </span>HTMLCollection</h5>
 
   <p>The <code><a href=#htmlcollection>HTMLCollection</a></code> interface represents a generic
   <a href=#collections title=collections>collection</a> of elements.</p>
@@ -6810,7 +6861,7 @@
   </div>
 
 
-  <h5 id=htmlallcollection-0><span class=secno>2.7.2.2 </span>HTMLAllCollection</h5>
+  <h5 id=htmlallcollection-0><span class=secno>2.8.2.2 </span>HTMLAllCollection</h5>
 
   <p>The <code><a href=#htmlallcollection>HTMLAllCollection</a></code> interface represents a generic
   <a href=#collections title=collections>collection</a> of elements just like
@@ -6909,7 +6960,7 @@
   </div>
 
 
-  <h5 id=htmlformcontrolscollection-0><span class=secno>2.7.2.3 </span>HTMLFormControlsCollection</h5>
+  <h5 id=htmlformcontrolscollection-0><span class=secno>2.8.2.3 </span>HTMLFormControlsCollection</h5>
 
   <p>The <code><a href=#htmlformcontrolscollection>HTMLFormControlsCollection</a></code> interface represents
   a <a href=#collections title=collections>collection</a> of <a href=#category-listed title=category-listed>listed elements</a> in <code><a href=#the-form-element>form</a></code>
@@ -7026,7 +7077,7 @@
 --></div>
 
 
-  <h5 id=htmloptionscollection-0><span class=secno>2.7.2.4 </span>HTMLOptionsCollection</h5>
+  <h5 id=htmloptionscollection-0><span class=secno>2.8.2.4 </span>HTMLOptionsCollection</h5>
 
   <p>The <code><a href=#htmloptionscollection>HTMLOptionsCollection</a></code> interface represents a
   list of <code><a href=#the-option-element>option</a></code> elements. It is always rooted on a
@@ -7205,7 +7256,7 @@
 <!--MD-->
   <div data-component="HTML Microdata (editor: Ian Hickson)">
 
-  <h5 id=htmlpropertiescollection-0><span class=secno>2.7.2.5 </span>HTMLPropertiesCollection</h5>
+  <h5 id=htmlpropertiescollection-0><span class=secno>2.8.2.5 </span>HTMLPropertiesCollection</h5>
 
   <p>The <code><a href=#htmlpropertiescollection>HTMLPropertiesCollection</a></code> interface represents a
   <a href=#collections title=collections>collection</a> of elements that add
@@ -7296,7 +7347,7 @@
 <!--MD-->
 
 
-  <h4 id=domtokenlist-0><span class=secno>2.7.3 </span>DOMTokenList</h4>
+  <h4 id=domtokenlist-0><span class=secno>2.8.3 </span>DOMTokenList</h4>
 
   <p>The <code><a href=#domtokenlist>DOMTokenList</a></code> interface represents an interface
   to an underlying string that consists of a <a href=#set-of-space-separated-tokens>set of
@@ -7482,7 +7533,7 @@
   </div>
 
 
-  <h4 id=domsettabletokenlist-0><span class=secno>2.7.4 </span>DOMSettableTokenList</h4>
+  <h4 id=domsettabletokenlist-0><span class=secno>2.8.4 </span>DOMSettableTokenList</h4>
 
   <p>The <code><a href=#domsettabletokenlist>DOMSettableTokenList</a></code> interface is the same as the
   <code><a href=#domtokenlist>DOMTokenList</a></code> interface, except that it allows the
@@ -7514,7 +7565,7 @@
 
   <div class=impl>
 
-  <h4 id=safe-passing-of-structured-data><span class=secno>2.7.5 </span>Safe passing of structured data</h4>
+  <h4 id=safe-passing-of-structured-data><span class=secno>2.8.5 </span>Safe passing of structured data</h4>
 
   <p>When a user agent is required to obtain a <dfn id=structured-clone>structured
   clone</dfn> of an object, it must run the following algorithm, which
@@ -7639,7 +7690,7 @@
   </dl></div>
 
 
-  <h4 id=domstringmap-0><span class=secno>2.7.6 </span>DOMStringMap</h4>
+  <h4 id=domstringmap-0><span class=secno>2.8.6 </span>DOMStringMap</h4>
 
   <p>The <code><a href=#domstringmap>DOMStringMap</a></code> interface represents a set of
   name-value pairs. It exposes these using the scripting language's
@@ -7722,7 +7773,7 @@
   </div>
 
 
-  <h4 id=dom-feature-strings><span class=secno>2.7.7 </span>DOM feature strings</h4>
+  <h4 id=dom-feature-strings><span class=secno>2.8.7 </span>DOM feature strings</h4>
 
   <p>DOM3 Core defines mechanisms for checking for interface support,
   and for obtaining implementations of interfaces, using <a href=http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMFeatures>feature
@@ -7744,7 +7795,7 @@
   </div>
 
 
-  <h4 id=exceptions><span class=secno>2.7.8 </span>Exceptions</h4>
+  <h4 id=exceptions><span class=secno>2.8.8 </span>Exceptions</h4>
 
   <p>The following are <code><a href=#domexception>DOMException</a></code> codes. <a href=#refsDOMCORE>[DOMCORE]</a></p>
 
@@ -7792,7 +7843,7 @@
 
   <div class=impl>
 
-  <h4 id=garbage-collection><span class=secno>2.7.9 </span>Garbage collection</h4>
+  <h4 id=garbage-collection><span class=secno>2.8.9 </span>Garbage collection</h4>
 
   <p>There is an <dfn id=implied-strong-reference>implied strong reference</dfn> from any IDL
   attribute that returns a pre-existing object to that object.</p>
@@ -7811,7 +7862,7 @@
   </div>
 
 
-  <h3 id=namespaces><span class=secno>2.8 </span>Namespaces</h3>
+  <h3 id=namespaces><span class=secno>2.9 </span>Namespaces</h3>
 
   <p>The <dfn id=html-namespace-0>HTML namespace</dfn> is: <code>http://www.w3.org/1999/xhtml</code></p>
 
@@ -8104,9 +8155,8 @@
   <code><a href=#security_err>SECURITY_ERR</a></code> exception. Otherwise, the user agent must
   first <a href=#obtain-the-storage-mutex>obtain the storage mutex</a> and then return the
   cookie-string for <a href="#the-document's-address">the document's address</a> for a
-  "non-HTTP" API, decoded as UTF-8, with bytes or sequences of bytes
-  that are not valid UTF-8 sequences interpreted as U+FFFD REPLACEMENT
-  CHARACTERs. <a href=#refsCOOKIES>[COOKIES]</a> <a href=#refsRFC3629>[RFC3629]</a></p>
+  "non-HTTP" API, <a href=#decoded-as-utf-8,-with-error-handling>decoded as UTF-8, with error handling</a>.
+  <a href=#refsCOOKIES>[COOKIES]</a></p>
 
   <p>On setting, if the document is a <a href=#cookie-free-document-object>cookie-free
   <code>Document</code> object</a>, then the user agent must do
@@ -28648,10 +28698,10 @@
   <ul class=brief><li><code><a href=#text/srt>text/srt</a></code></li>
   </ul><!--<p class="note">Not all of these MIME types are valid registered
   types.</p>--><p>When converting the bytes into Unicode characters, if the
-  encoding used is UTF-8, bytes or sequences of bytes that are not
-  valid UTF-8 sequences must be interpreted as a U+FFFD REPLACEMENT
-  CHARACTER, and all U+0000 NULL characters must be replaced by U+FFFD
-  REPLACEMENT CHARACTERs.</p>
+  encoding used is UTF-8, the bytes must be <a href=#decoded-as-utf-8,-with-error-handling title="decoded as
+  UTF-8, with error handling">decoded with the error handling</a>
+  defined in this specification, and all U+0000 NULL characters must
+  be replaced by U+FFFD REPLACEMENT CHARACTERs.</p>
 
   <p>The <dfn id=websrt-parser-algorithm>WebSRT parser algorithm</dfn> is as follows:</p>
 
@@ -62689,13 +62739,13 @@
   <p>When a user agent is to <dfn id=parse-a-manifest>parse a manifest</dfn>, it means
   that the user agent must run the following steps:</p>
 
-  <ol><li><p>The user agent must decode the byte stream corresponding with
-   the manifest to be parsed, treating it as UTF-8. Bytes or sequences
-   of bytes that are not valid UTF-8 sequences must be interpreted as
-   a U+FFFD REPLACEMENT CHARACTER. <!--All U+0000 NULL characters must
-   be replaced by U+FFFD REPLACEMENT CHARACTERs. (this isn't black-box
-   testable since neither U+0000 nor U+FFFD are valid anywhere in the
-   syntax and thus both will be treated the same anyway)--> <a href=#refsRFC3629>[RFC3629]</a></li>
+  <ol><li><p>The user agent must decode the byte stream corresponding
+   with the manifest to be parsed <a href=#decoded-as-utf-8,-with-error-handling title="decoded as UTF-8, with
+   error handling">as UTF-8, with error handling</a>. <!--All
+   U+0000 NULL characters must be replaced by U+FFFD REPLACEMENT
+   CHARACTERs. (this isn't black-box testable since neither U+0000 nor
+   U+FFFD are valid anywhere in the syntax and thus both will be
+   treated the same anyway)--></li>
 
    <li><p>Let <var title="">base URL</var> be the <a href=#absolute-url>absolute
    URL</a> representing the manifest.</li>
@@ -72920,7 +72970,10 @@
 
   <p>Bytes or sequences of bytes in the original byte stream that
   could not be converted to Unicode code points must be converted to
-  U+FFFD REPLACEMENT CHARACTERs.</p>
+  U+FFFD REPLACEMENT CHARACTERs. Specifically, if the encoding is
+  UTF-8, the bytes must be <a href=#decoded-as-utf-8,-with-error-handling title="decoded as UTF-8, with error
+  handling">decoded with the error handling</a> defined in this
+  specification.</p>
 
   <p class=note>Bytes or sequences of bytes in the original byte
   stream that did not conform to the encoding specification

Modified: source
===================================================================
--- source	2010-09-28 18:31:45 UTC (rev 5529)
+++ source	2010-09-28 19:16:16 UTC (rev 5530)
@@ -2496,6 +2496,61 @@
   two strings as matches of each other.</p>
 
 
+  <h3>UTF-8</h3>
+
+  <p>When a user agent is required to <dfn title="decoded as UTF-8,
+  with error handling">decode a byte string as UTF-8, with error
+  handling</dfn>, it means that the byte stream must be converted to a
+  Unicode string by interpreting it as UTF-8, except that any errors
+  must be handled as described in the following list. Bytes in the
+  following list are represented in hexadecimal. <a
+  href="#refsRFC3629">[RFC3629]</a>
+
+  <dl class="switch">
+
+   <dt>One byte in the range FE to FF</dt>
+
+   <dt>Overlong forms (e.g. F0 80 80 A0)</dt>
+
+   <dt>One byte in the range C0 to C1, followed by one byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range F0 to F4, followed by three bytes in the range 80 to BF that represent a code point above U+10FFFF</dt>
+
+   <dt>One byte in the range F5 to F7, followed by three bytes in the range 80 to BF</dt>
+
+   <dt>One byte in the range F8 to FB, followed by four bytes in the range 80 to BF</dt>
+
+   <dt>One byte in the range FC to FD, followed by five bytes in the range 80 to BF</dt>
+
+   <dt>One byte in the range E0 to FD, followed by a byte in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range F0 to FD, followed by two bytes in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range F5 to FD, followed by three bytes in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+   <dt>One byte in the range FC to FD, followed by four bytes in the range 80 to BF, not followed by a byte in the range 80 to BF</dt>
+
+
+   <dd>The whole sequence must be replaced by a single U+FFFD
+   REPLACEMENT CHARACTER.</dd>
+
+
+   <dt>One byte in the range 80 to BF not preceded by a byte in the range 80 to FD</dt>
+
+   <dt>A sequence of bytes in the range 80 to BF that does not follow a byte in the range C0 to FD</dt>
+
+   <dt>One byte in the range C0 to FD not followed by a byte in the range 80 to BF</dt>
+
+
+   <dd>Each byte must be replace with a U+FFFD REPLACEMENT CHARACTER.</dd>
+
+  </dl>
+
+  <p class="example">For example, the byte string "41 98 BA 42 E2 98
+  43 E2 98 BA E2 98" would be converted to the string
+  "A&#xFFFD;&#xFFFD;B&#xFFFD;C&#x263A;&#xFFFD;".</p>
+
+
   <h3>Common microsyntaxes</h3>
 
   <p>There are various places in HTML that accept particular data
@@ -8021,10 +8076,8 @@
   <code>SECURITY_ERR</code> exception. Otherwise, the user agent must
   first <span>obtain the storage mutex</span> and then return the
   cookie-string for <span>the document's address</span> for a
-  "non-HTTP" API, decoded as UTF-8, with bytes or sequences of bytes
-  that are not valid UTF-8 sequences interpreted as U+FFFD REPLACEMENT
-  CHARACTERs. <a href="#refsCOOKIES">[COOKIES]</a> <a
-  href="#refsRFC3629">[RFC3629]</a></p>
+  "non-HTTP" API, <span>decoded as UTF-8, with error handling</span>.
+  <a href="#refsCOOKIES">[COOKIES]</a></p>
 
   <p>On setting, if the document is a <span>cookie-free
   <code>Document</code> object</span>, then the user agent must do
@@ -31255,10 +31308,10 @@
   types.</p>-->
 
   <p>When converting the bytes into Unicode characters, if the
-  encoding used is UTF-8, bytes or sequences of bytes that are not
-  valid UTF-8 sequences must be interpreted as a U+FFFD REPLACEMENT
-  CHARACTER, and all U+0000 NULL characters must be replaced by U+FFFD
-  REPLACEMENT CHARACTERs.</p>
+  encoding used is UTF-8, the bytes must be <span title="decoded as
+  UTF-8, with error handling">decoded with the error handling</span>
+  defined in this specification, and all U+0000 NULL characters must
+  be replaced by U+FFFD REPLACEMENT CHARACTERs.</p>
 
   <p>The <dfn>WebSRT parser algorithm</dfn> is as follows:</p>
 
@@ -70902,13 +70955,13 @@
 
   <ol>
 
-   <li><p>The user agent must decode the byte stream corresponding with
-   the manifest to be parsed, treating it as UTF-8. Bytes or sequences
-   of bytes that are not valid UTF-8 sequences must be interpreted as
-   a U+FFFD REPLACEMENT CHARACTER. <!--All U+0000 NULL characters must
-   be replaced by U+FFFD REPLACEMENT CHARACTERs. (this isn't black-box
-   testable since neither U+0000 nor U+FFFD are valid anywhere in the
-   syntax and thus both will be treated the same anyway)--> <a href="#refsRFC3629">[RFC3629]</a></p></li>
+   <li><p>The user agent must decode the byte stream corresponding
+   with the manifest to be parsed <span title="decoded as UTF-8, with
+   error handling">as UTF-8, with error handling</span>. <!--All
+   U+0000 NULL characters must be replaced by U+FFFD REPLACEMENT
+   CHARACTERs. (this isn't black-box testable since neither U+0000 nor
+   U+FFFD are valid anywhere in the syntax and thus both will be
+   treated the same anyway)--></p></li>
 
    <li><p>Let <var title="">base URL</var> be the <span>absolute
    URL</span> representing the manifest.</p></li>
@@ -80690,9 +80743,13 @@
     title="event-error">error</code> at that object. Abort these
     steps.</p>
 
-    <p>If the attempt succeeds, then convert the script resource to
-    Unicode by assuming it was encoded as UTF-8, to obtain its <var
-    title="">source</var>. <a href="#refsRFC3629">[RFC3629]</a></p>
+    <p>If the attempt succeeds, then let <var title="">source</var> be
+    the script resource <span>decoded as UTF-8, with error
+    handling</span>.
+    <!--END complete--><!--END epub-->
+    <a href="#refsHTML">[HTML]</a>
+    <!--START complete--><!--START epub-->
+    </p>
 
     <p>Let <var title="">language</var> be JavaScript.</p>
 
@@ -81563,9 +81620,13 @@
       <code>NETWORK_ERR</code> exception and abort all these
       steps.</p>
 
-      <p>If the attempt succeeds, then convert the script resource to
-      Unicode by assuming it was encoded as UTF-8, to obtain its <var
-      title="">source</var>. <a href="#refsRFC3629">[RFC3629]</a></p>
+      <p>If the attempt succeeds, then let <var title="">source</var> be
+      the script resource <span>decoded as UTF-8, with error
+      handling</span>.
+      <!--END complete--><!--END epub-->
+      <a href="#refsHTML">[HTML]</a>
+      <!--START complete--><!--START epub-->
+      </p>
 
       <p>Let <var title="">language</var> be JavaScript.</p>
 
@@ -82255,9 +82316,12 @@
 
   <h4 id="event-stream-interpretation">Interpreting an event stream</h4>
 
-  <p>Streams must be decoded as UTF-8 text. Bytes or sequences of
-  bytes that are not valid UTF-8 sequences must be interpreted as the
-  U+FFFD REPLACEMENT CHARACTER. <a href="#refsRFC3629">[RFC3629]</a></p>
+  <p>Streams must be <span>decoded as UTF-8, with error
+  handling</span>.
+  <!--END complete--><!--END epub-->
+  <a href="#refsHTML">[HTML]</a>
+  <!--START complete--><!--START epub-->
+  </p>
 
   <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
   any are present.</p>
@@ -87960,7 +88024,10 @@
 
   <p>Bytes or sequences of bytes in the original byte stream that
   could not be converted to Unicode code points must be converted to
-  U+FFFD REPLACEMENT CHARACTERs.</p>
+  U+FFFD REPLACEMENT CHARACTERs. Specifically, if the encoding is
+  UTF-8, the bytes must be <span title="decoded as UTF-8, with error
+  handling">decoded with the error handling</span> defined in this
+  specification.</p>
 
   <p class="note">Bytes or sequences of bytes in the original byte
   stream that did not conform to the encoding specification




More information about the Commit-Watchers mailing list