[html5] r3234 - [e] (0) Reference abarth's draft and remove the duplicate content from the HTML5 [...]

whatwg at whatwg.org whatwg at whatwg.org
Fri Jun 12 14:58:18 PDT 2009


Author: ianh
Date: 2009-06-12 14:58:16 -0700 (Fri, 12 Jun 2009)
New Revision: 3234

Modified:
   index
   source
Log:
[e] (0) Reference abarth's draft and remove the duplicate content from the HTML5 spec.

Modified: index
===================================================================
--- index	2009-06-12 19:46:29 UTC (rev 3233)
+++ index	2009-06-12 21:58:16 UTC (rev 3234)
@@ -278,32 +278,25 @@
    <li><a href=#fetching-resources><span class=secno>2.6 </span>Fetching resources</a>
     <ol>
      <li><a href=#concept-http-equivalent><span class=secno>2.6.1 </span>Protocol concepts</a></li>
-     <li><a href=#encrypted-http-and-related-security-concerns><span class=secno>2.6.2 </span>Encrypted HTTP and related security concerns</a></ol></li>
-   <li><a href=#content-type-sniffing><span class=secno>2.7 </span>Determining the type of a resource</a>
+     <li><a href=#encrypted-http-and-related-security-concerns><span class=secno>2.6.2 </span>Encrypted HTTP and related security concerns</a></li>
+     <li><a href=#content-type-sniffing><span class=secno>2.6.3 </span>Determining the type of a resource</a></ol></li>
+   <li><a href=#character-encodings-0><span class=secno>2.7 </span>Character encodings</a></li>
+   <li><a href=#common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</a>
     <ol>
-     <li><a href=#content-type><span class=secno>2.7.1 </span>Content-Type metadata</a></li>
-     <li><a href=#content-type-sniffing:-web-pages><span class=secno>2.7.2 </span>Content-Type sniffing: Web pages</a></li>
-     <li><a href=#content-type-sniffing:-text-or-binary><span class=secno>2.7.3 </span>Content-Type sniffing: text or binary</a></li>
-     <li><a href=#content-type-sniffing:-unknown-type><span class=secno>2.7.4 </span>Content-Type sniffing: unknown type</a></li>
-     <li><a href=#content-type-sniffing:-image><span class=secno>2.7.5 </span>Content-Type sniffing: image</a></li>
-     <li><a href=#content-type-sniffing:-feed-or-html><span class=secno>2.7.6 </span>Content-Type sniffing: feed or HTML</a></ol></li>
-   <li><a href=#character-encodings-0><span class=secno>2.8 </span>Character encodings</a></li>
-   <li><a href=#common-dom-interfaces><span class=secno>2.9 </span>Common DOM interfaces</a>
-    <ol>
-     <li><a href=#reflecting-content-attributes-in-dom-attributes><span class=secno>2.9.1 </span>Reflecting content attributes in DOM attributes</a></li>
-     <li><a href=#collections><span class=secno>2.9.2 </span>Collections</a>
+     <li><a href=#reflecting-content-attributes-in-dom-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in DOM attributes</a></li>
+     <li><a href=#collections><span class=secno>2.8.2 </span>Collections</a>
       <ol>
-       <li><a href=#htmlcollection><span class=secno>2.9.2.1 </span>HTMLCollection</a></li>
-       <li><a href=#htmlformcontrolscollection><span class=secno>2.9.2.2 </span>HTMLFormControlsCollection</a></li>
-       <li><a href=#htmloptionscollection><span class=secno>2.9.2.3 </span>HTMLOptionsCollection</a></li>
-       <li><a href=#htmlpropertycollection><span class=secno>2.9.2.4 </span>HTMLPropertyCollection</a></ol></li>
-     <li><a href=#domtokenlist><span class=secno>2.9.3 </span>DOMTokenList</a></li>
-     <li><a href=#domsettabletokenlist><span class=secno>2.9.4 </span>DOMSettableTokenList</a></li>
-     <li><a href=#safe-passing-of-structured-data><span class=secno>2.9.5 </span>Safe passing of structured data</a></li>
-     <li><a href=#domstringmap><span class=secno>2.9.6 </span>DOMStringMap</a></li>
-     <li><a href=#dom-feature-strings><span class=secno>2.9.7 </span>DOM feature strings</a></li>
-     <li><a href=#exceptions><span class=secno>2.9.8 </span>Exceptions</a></li>
-     <li><a href=#garbage-collection><span class=secno>2.9.9 </span>Garbage collection</a></ol></ol></li>
+       <li><a href=#htmlcollection><span class=secno>2.8.2.1 </span>HTMLCollection</a></li>
+       <li><a href=#htmlformcontrolscollection><span class=secno>2.8.2.2 </span>HTMLFormControlsCollection</a></li>
+       <li><a href=#htmloptionscollection><span class=secno>2.8.2.3 </span>HTMLOptionsCollection</a></li>
+       <li><a href=#htmlpropertycollection><span class=secno>2.8.2.4 </span>HTMLPropertyCollection</a></ol></li>
+     <li><a href=#domtokenlist><span class=secno>2.8.3 </span>DOMTokenList</a></li>
+     <li><a href=#domsettabletokenlist><span class=secno>2.8.4 </span>DOMSettableTokenList</a></li>
+     <li><a href=#safe-passing-of-structured-data><span class=secno>2.8.5 </span>Safe passing of structured data</a></li>
+     <li><a href=#domstringmap><span class=secno>2.8.6 </span>DOMStringMap</a></li>
+     <li><a href=#dom-feature-strings><span class=secno>2.8.7 </span>DOM feature strings</a></li>
+     <li><a href=#exceptions><span class=secno>2.8.8 </span>Exceptions</a></li>
+     <li><a href=#garbage-collection><span class=secno>2.8.9 </span>Garbage collection</a></ol></ol></li>
  <li><a href=#dom><span class=secno>3 </span>Semantics, structure, and APIs of HTML documents</a>
   <ol>
    <li><a href=#semantics-intro><span class=secno>3.1 </span>Introduction</a></li>
@@ -5285,637 +5278,43 @@
 
   </div>
 
-  </div>
 
+  <h4 id=content-type-sniffing><span class=secno>2.6.3 </span>Determining the type of a resource</h4>
 
-  <div class=impl>
+  <!-- MIMESNIFF = http://tools.ietf.org/html/draft-abarth-mime-sniff -->
 
-  <h3 id=content-type-sniffing><span class=secno>2.7 </span>Determining the type of a resource</h3>
+  <p>The <dfn id=content-type title=Content-Type>Content-Type metadata</dfn> of a
+  resource must be obtained and interpreted in a manner consistent
+  with the requirements of the Content-Type Processing Model
+  specification. <a href=#refsMIMESNIFF>[MIMESNIFF]</a></p>
 
-  <p class=warning>It is imperative that the rules in this section
-  be followed exactly. When a user agent uses different heuristics for
-  content type detection than the server expects, security problems
-  can occur. For example, if a server believes that the client will
-  treat a contributed file as an image (and thus treat it as benign),
-  but a Web browser believes the content to be HTML (and thus execute
-  any scripts contained therein), the end user can be exposed to
-  malicious content, making the user vulnerable to cookie theft
-  attacks and other cross-site scripting attacks.</p>
-
-
-  <h4 id=content-type><span class=secno>2.7.1 </span>Content-Type metadata</h4>
-
-  <p>What explicit <dfn id=content-type-0 title=Content-Type>Content-Type
-  metadata</dfn> is associated with the resource (the resource's type
-  information) depends on the protocol that was used to
-  <a href=#fetch>fetch</a> the resource.</p>
-
-  <p>For HTTP resources, only the first Content-Type HTTP header, if
-  any, contributes any type information; the explicit type of the
-  resource is then the value of that header, interpreted as described
-  by the HTTP specifications. If the Content-Type HTTP header is
-  present but the value of the first such header cannot be interpreted
-  as described by the HTTP specifications (e.g. because its value
-  doesn't contain a U+002F SOLIDUS ('/') character), then the resource
-  has no type information (even if there are multiple Content-Type
-  HTTP headers and one of the other ones is syntactically correct). <a href=#refsHTTP>[HTTP]</a></p>
-
-  <p>For resources fetched from the file system, user agents should use
-  platform-specific conventions, e.g. operating system extension/type
-  mappings.</p>
-
-  <p>Extensions must not be used for determining resource types for
-  resources fetched over HTTP.</p>
-
-  <p>For resources fetched over most other protocols, e.g. FTP, there
-  is no type information.</p>
-
-
   <p>The <dfn id=algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for extracting an encoding from a
-  Content-Type</dfn>, given a string <var title="">s</var>, is as
-  follows. It either returns an encoding or nothing.</p>
+  Content-Type</dfn>, given a string <var title="">s</var>, is given
+  in the Content-Type Processing Model specification. It either
+  returns an encoding or nothing. <a href=#refsMIMESNIFF>[MIMESNIFF]</a></p>
 
-  <ol><li><p>Find the first seven characters in <var title="">s</var>
-   that are an <a href=#ascii-case-insensitive>ASCII case-insensitive</a> match for the word
-   "charset". If no such match is found, return nothing.</p>
-
-   <li><p>Skip any U+0009, U+000A, U+000C, U+000D, or U+0020
-   characters that immediately follow the word 'charset' (there might
-   not be any).</li>
-
-   <li><p>If the next character is not a U+003D EQUALS SIGN ('='),
-   return nothing.</li>
-
-   <li><p>Skip any U+0009, U+000A, U+000C, U+000D, or U+0020
-   characters that immediately follow the equals sign (there might not
-   be any).</li>
-
-   <li><p>Process the next character as follows:</p>
-
-    <dl class=switch><dt>If it is a U+0022 QUOTATION MARK ('"') and there is a later
-     U+0022 QUOTATION MARK ('"') in <var title="">s</var></dt>
-
-     <dt>If it is a U+0027 APOSTROPHE ("'") and there is a later
-     U+0027 APOSTROPHE ("'") in  <var title="">s</var></dt>
-
-     <dd><p>Return the string between this character and the next
-     earliest occurrence of this character.</dd>
-
-
-     <dt>If it is an unmatched U+0022 QUOTATION MARK ('"')</dt>
-     <dt>If it is an unmatched U+0027 APOSTROPHE ("'")</dt>
-     <dt>If there is no next character</dt>
-
-     <dd><p>Return nothing.</dd>
-
-
-     <dt>Otherwise</dt>
-
-     <dd><p>Return the string from this character to the first U+0009,
-     U+000A, U+000C, U+000D, U+0020, or U+003B character or the end of
-     <var title="">s</var>, whichever comes first.</dd>
-
-    </dl></li>
-
-  </ol><p class=note>The above algorithm is a <a href=#willful-violation>willful
-  violation</a> of the HTTP specification, which requires that the
-  Content-Type headers be honored, despite implementation experience
-  showing that this is not pratical in many cases. <a href=#refsHTTP>[HTTP]</a></p>
-
-
-  <h4 id=content-type-sniffing:-web-pages><span class=secno>2.7.2 </span>Content-Type sniffing: Web pages</h4>
-
   <p>The <dfn id=content-type-sniffing-0 title="Content-Type sniffing">sniffed type of a
-  resource</dfn> must be found as follows:</p>
+  resource</dfn> must be found in a manner consistent with the
+  requirements given in the Content-Type Processing Model
+  specification for finding that <i>sniffed type</i>. <a href=#refsMIMESNIFF>[MIMESNIFF]</a></p>
 
-  <ol><li><p>If the user agent is configured to strictly obey
-   Content-Type headers for this resource, then jump to the last step
-   in this set of steps.</li>
+  <p>The <dfn id=content-type-sniffing:-image title="Content-Type sniffing: image">rules for sniffing
+  images specifically</dfn> are also defined in the Content-Type
+  Processing Model specification. <a href=#refsMIMESNIFF>[MIMESNIFF]</a></p>
 
-   <li><p>If the resource was fetched over an HTTP protocol and there
-   is an HTTP Content-Type header and the value of the first such
-   header has bytes that exactly match one of the following lines:</p>
+  <p class=warning>It is imperative that the rules in the
+  Content-Type Processing Model specification be followed
+  exactly. When a user agent uses different heuristics for content
+  type detection than the server expects, security problems can
+  occur. For more details, see the Content-Type Processing Model
+  specification. <a href=#refsMIMESNIFF>[MIMESNIFF]</a></p>
 
-    <table><thead><tr><th>Bytes in Hexadecimal
-       <th>Textual representation
-     <tbody><tr><!-- Very old Apache default --><td>74 65 78 74 2f 70 6c 61 69 6e
-       <td><code title="">text/plain</code>
-      <tr><!-- Old Apache default --><td>74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 49 53 4f 2d 38 38 35 39 2d 31
-       <td><code title="">text/plain; charset=ISO-8859-1</code>
-      <tr><!-- Debian's arbitrarily different Apache default --><td>74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 69 73 6f 2d 38 38 35 39 2d 31
-       <td><code title="">text/plain; charset=iso-8859-1</code>
-      <tr><!-- Someone else's arbitrarily different Apache default (who?) --><td>74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 55 54 46 2d 38
-       <td><code title="">text/plain; charset=UTF-8</code>
-    </table><p>...then jump to the <i title="content-type sniffing: text or
-    binary"><a href=#content-type-sniffing:-text-or-binary>text or binary</a></i> section below.</p>
-
-    <!-- while IE sniffs all text/plain, this will continue to grow as
-    people add new defaults. Hopefully IE will stop the madness in due
-    course and stop sniffing anything but the above... -->
-
-   </li>
-
-   <li><p>Let <var title="">official type</var> be the type given by
-   the <a href=#content-type-0 title=Content-Type>Content-Type metadata</a> for the
-   resource, ignoring parameters. If there is no such type, jump to
-   the <i title="content-type sniffing: unknown type"><a href=#content-type-sniffing:-unknown-type>unknown type</a></i>
-   step below. Comparisons with this type, as defined by MIME
-   specifications, are done in an <a href=#ascii-case-insensitive>ASCII case-insensitive</a>
-   manner. <a href=#refsRFC2046>[RFC2046]</a></li>
-
-   <li><p>If <var title="">official type</var> is "unknown/unknown" or
-   "application/unknown", jump to the <i title="content-type sniffing:
-   unknown type"><a href=#content-type-sniffing:-unknown-type>unknown type</a></i> step below.</p> <!-- In a study
-   looking at many billions of pages whose first five characters were
-   "<HTML", "unknown/unknown" was used to label documents about once
-   for every 5000 pages labeled "text/html", and "application/unknown"
-   was used about once for every 35000 pages labeled
-   "text/html". --></li>
-
-   <li><p>If <var title="">official type</var> ends in "+xml", or if
-   it is either "text/xml" or "application/xml", then the sniffed
-   type of the resource is <var title="">official type</var>; return
-   that and abort these steps.</li>
-
-   <li><p>If <var title="">official type</var> is an image type
-   supported by the user agent (e.g. "image/png", "image/gif",
-   "image/jpeg", etc), then jump to the <i title="content-type
-   sniffing: image"><a href=#content-type-sniffing:-image>images</a></i> section below, passing it the <var title="">official type</var>.</li>
-
-   <li><p>If <var title="">official type</var> is "text/html", then
-   jump to the <i title="content-type sniffing: feed or html"><a href=#content-type-sniffing:-feed-or-html>feed or
-   HTML</a></i> section below.</li>
-
-   <li><p>The sniffed type of the resource is <var title="">official
-   type</var>.</li>
-
-  </ol><h4 id=content-type-sniffing:-text-or-binary><span class=secno>2.7.3 </span><dfn>Content-Type sniffing: text or binary</dfn></h4>
-
-  <ol><li><p>The user agent may wait for 512 or more bytes of the resource
-   to be available.</li>
-
-   <li><p>Let <var title="">n</var> be the smaller of either 512 or
-   the number of bytes already available.</li>
-
-   <li>
-
-    <p>If <var title="">n</var> is 4 or more, and the first bytes of
-    the resource match one of the following byte sets:</p>
-
-    <!-- this table is present in several forms in this file; keep them in sync -->
-    <table><thead><tr><th>Bytes in Hexadecimal
-       <th>Description
-     <tbody><tr><td>FE FF
-       <td>UTF-16BE BOM <!-- followed by a character --><!-- nobody uses this: or UTF-32LE BOM -->
-      <tr><td>FF FE
-       <td>UTF-16LE BOM <!-- followed by a character -->
-<!-- nobody uses this
-      <tr>
-       <td>00 00 FE FF
-       <td>UTF-32BE BOM
--->
-<!-- this one is redundant with the one above
-      <tr>
-       <td>FF FE 00 00
-       <td>UTF-32LE BOM
--->
-      <tr><td>EF BB BF
-       <td>UTF-8 BOM <!-- followed by a character, or the first byte of a multiple character sequence -->
-<!-- nobody uses this
-      <tr>
-       <td>DD 73 66 73
-       <td>UTF-EBCDIC BOM
--->
-    </table><p>...then the sniffed type of the resource is "text/plain". Abort
-    these steps.</p>
-
-   </li>
-
-   <li><p>If none of the first <var title="">n</var> bytes of the
-   resource are <a href=#binary-data-bytes>binary data bytes</a> then the sniffed type
-   of the resource is "text/plain". Abort these steps.</li>
-
-   <li>
-
-    <p>If the first bytes of the resource match one of the byte
-    sequences in the "pattern" column of the table in the <i title="content-type sniffing: unknown type"><a href=#content-type-sniffing:-unknown-type>unknown type</a></i>
-    section below, ignoring any rows whose cell in the "security"
-    column says "scriptable" (or "n/a"), then the sniffed type of the
-    resource is the type given in the corresponding cell in the
-    "sniffed type" column on that row; abort these steps.</p>
-
-    <p class=warning>It is critical that this step not ever return a
-    scriptable type (e.g. text/html), as otherwise that would allow a
-    privilege escalation attack.</p>
-
-   </li>
-
-   <li><p>Otherwise, the sniffed type of the resource is
-   "application/octet-stream".</li>
-
-  </ol><p>Bytes covered by the following ranges are <dfn id=binary-data-bytes>binary data
-  bytes</dfn>:</p>
-
-  <!-- This byte list is based on RFC 2046 Section 4.1.2. Characters
-  in the range 0x00-0x1F, with the exception of 0x09, 0x0A, 0x0C, 0x0D
-  (ASCII for TAB, LF, FF, and CR), and character 0x1B (reportedly used
-  by some encodings as a shift escape), are invalid. Thus, if we see
-  them, we assume it's not text. -->
-
-  <ul class=brief><li> 0x00 - 0x08 </li>
-   <li> 0x0B </li>
-   <li> 0x0E - 0x1A </li>
-   <li> 0x1C - 0x1F </li>
-  </ul><h4 id=content-type-sniffing:-unknown-type><span class=secno>2.7.4 </span><dfn>Content-Type sniffing: unknown type</dfn></h4>
-
-  <ol><li><p>The user agent may wait for 512 or more bytes of the
-   resource to be available.</li>
-
-   <li><p>Let <var title="">stream length</var> be the smaller of
-   either 512 or the number of bytes already available.</li>
-
-   <li><p>For each row in the table below:</p>
-
-    <dl class=switch><dt>If the row has no "<em>WS</em>" bytes:</dt>
-
-     <dd>
-
-      <ol><li>Let <var title="">pattern length</var> be the length of the
-       pattern (number of bytes described by the cell in the second
-       column of the row).</li>
-
-       <li>If <var title="">stream length</var> is smaller than <var title="">pattern length</var> then skip this row.</li>
-
-       <li>Apply the "and" operator to the first <var title="">pattern
-       length</var> bytes of the resource and the given mask (the
-       bytes in the cell of first column of that row), and let the
-       result be the <var title="">data</var>.</li>
-
-       <li>If the bytes of the <var title="">data</var> matches the
-       given pattern bytes exactly, then the sniffed type of the
-       resource is the type given in the cell of the third column in
-       that row; abort these steps.</li>
-
-      </ol></dd>
-
-     <dt>If the row has a "<em>WS</em>" byte:</dt>
-
-     <dd>
-
-      <ol><li><p>Let <var title="">index<sub>pattern</sub></var> be an
-       index into the mask and pattern byte strings of the
-       row.</li>
-
-       <li><p>Let <var title="">index<sub>stream</sub></var> be an
-       index into the byte stream being examined.</li>
-
-       <li><p><em>Loop</em>: If <var title="">index<sub>stream</sub></var> points beyond the end of
-       the byte stream, then this row doesn't match, skip this
-       row.</li>
-
-       <li>
-
-        <p>Examine the <var title="">index<sub>stream</sub></var>th
-        byte of the byte stream as follows:</p>
-
-        <dl class=switch><dt>If the <var title="">index<sub>pattern</sub></var>th byte
-         of the pattern is a normal hexadecimal byte and not a "<em>WS</em>"
-         byte:</dt>
-
-         <dd>
-
-          <p>If the "and" operator, applied to the <var title="">index<sub>stream</sub></var>th byte of the stream
-          and the <var title="">index<sub>pattern</sub></var>th byte
-          of the mask, yield a value different that the <var title="">index<sub>pattern</sub></var>th byte of the
-          pattern, then skip this row.</p>
-
-          <p>Otherwise, increment <var title="">index<sub>pattern</sub></var> to the next byte in
-          the mask and pattern and <var title="">index<sub>stream</sub></var> to the next byte in
-          the byte stream.</p>
-
-         </dd>
-
-         <dt>Otherwise, if the <var title="">index<sub>pattern</sub></var>th byte of the pattern
-         is a "<em>WS</em>" byte:</dt>
-
-         <dd>
-
-          <p>"<em>WS</em>" means "whitespace", and allows insignificant
-          whitespace to be skipped when sniffing for a type
-          signature.</p>
-
-          <p>If the <var title="">index<sub>stream</sub></var>th byte
-          of the stream is one of 0x09 (ASCII TAB), 0x0A (ASCII LF),
-          0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space),
-          then increment only the <var title="">index<sub>stream</sub></var> to the next byte in
-          the byte stream.</p>
-
-          <p>Otherwise, increment only the <var title="">index<sub>pattern</sub></var> to the next byte in
-          the mask and pattern.</p>
-
-         </dd>
-
-        </dl></li>
-
-       <li><p>If <var title="">index<sub>pattern</sub></var> does not
-       point beyond the end of the mask and pattern byte strings, then
-       jump back to the <em>loop</em> step in this algorithm.</li>
-
-       <li><p>Otherwise, the sniffed type of the resource is the type
-       given in the cell of the third column in that row; abort these
-       steps.</li>
-
-      </ol></dd>
-
-    </dl></li>
-
-   <li><p>If none of the first <var title="">n</var> bytes of the
-   resource are <a href=#binary-data-bytes>binary data bytes</a> then the sniffed type
-   of the resource is "text/plain". Abort these steps.</li>
-
-   <li><p>Otherwise, the sniffed type of the resource is
-   "application/octet-stream".</li>
-
-  </ol><p>The table used by the above algorithm is:</p>
-
-  <table><thead><tr><th colspan=2>Bytes in Hexadecimal
-     <th rowspan=2>Sniffed type
-     <th rowspan=2>Security
-     <th rowspan=2>Comment
-    <tr><th>Mask
-     <th>Pattern
-   <tbody><tr><td>FF FF DF DF DF DF DF DF DF FF DF DF DF DF
-     <td>3C 21 44 4F 43 54 59 50 45 20 48 54 4D 4C <!-- "<!DOCTYPE HTML" --> <!-- common in static data -->
-     <td>text/html
-     <td>Scriptable
-     <td>The string "<code title=""><!DOCTYPE HTML</code>" in US-ASCII or compatible encodings, case-insensitively.
-    <tr><td>FF FF DF DF DF DF
-     <td><em>WS</em> 3C 48 54 4D 4C <!-- "<HTML" --> <!-- common in static data -->
-     <td>text/html
-     <td>Scriptable
-     <td>The string "<code title=""><HTML</code>" in US-ASCII or compatible encodings, case-insensitively, possibly with leading spaces.
-    <tr><td>FF FF DF DF DF DF
-     <td><em>WS</em> 3C 48 45 41 44 <!-- "<HEAD" --> <!-- common in static data -->
-     <td>text/html
-     <td>Scriptable
-     <td>The string "<code title=""><HEAD</code>" in US-ASCII or compatible encodings, case-insensitively, possibly with leading spaces.
-    <tr><td>FF FF DF DF DF DF DF DF
-     <td><em>WS</em> 3C 53 43 52 49 50 54 <!-- "<SCRIPT" --> <!-- common in dynamic data -->
-     <td>text/html
-     <td>Scriptable
-     <td>The string "<code title=""><SCRIPT</code>" in US-ASCII or compatible encodings, case-insensitively, possibly with leading spaces.
-    <tr><td>FF FF FF FF FF
-     <td>25 50 44 46 2D <!-- "%PDF-" (from http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#321) -->
-     <td>application/pdf
-     <td>Scriptable
-     <td>The string "<code title="">%PDF-</code>", the PDF signature.
-    <tr><td>FF FF FF FF FF FF FF FF FF FF FF
-     <td>25 21 50 53 2D 41 64 6F 62 65 2D <!-- "%!PS-Adobe-" (from http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#321) -->
-     <td>application/postscript
-     <td>Safe
-     <td>The string "<code title="">%!PS-Adobe-</code>", the PostScript signature.
-
-   <!-- copied from the text or binary section above -->
-   <tbody><tr><td>FF FF 00 00
-     <td>FE FF 00 00
-     <td>text/plain
-     <td>n/a
-     <td>UTF-16BE BOM <!-- followed by at least one character -->
-    <tr><td>FF FF 00 00
-     <td>FF FF 00 00
-     <td>text/plain
-     <td>n/a
-     <td>UTF-16LE BOM <!-- followed by at least one character -->
-    <tr><td>FF FF FF 00
-     <td>EF BB BF 00
-     <td>text/plain
-     <td>n/a
-     <td>UTF-8 BOM <!-- followed by at least one character -->
-
-   <!-- based on the table in the image section below -->
-   <tbody><tr><td>FF FF FF FF FF FF
-     <td>47 49 46 38 37 61 <!-- GIF87a -->
-     <td>image/gif
-     <td>Safe
-     <td>The string "<code title="">GIF87a</code>", a GIF signature.
-    <tr><td>FF FF FF FF FF FF
-     <td>47 49 46 38 39 61 <!-- GIF89a -->
-     <td>image/gif
-     <td>Safe
-     <td>The string "<code title="">GIF89a</code>", a GIF signature.
-    <tr><td>FF FF FF FF FF FF FF FF
-     <td>89 50 4E 47 0D 0A 1A 0A <!-- [TAB]PNG[CR][LF][EOF][LF]; 137 80 78 71 13 10 26 10 -->
-     <td>image/png
-     <td>Safe
-     <td>The PNG signature.
-    <tr><td>FF FF FF
-     <td>FF D8 FF <!-- SOI marker followed by the first byte of another marker -->
-     <td>image/jpeg
-     <td>Safe
-     <td>A JPEG SOI marker followed by the first byte of another marker.
-    <tr><td>FF FF
-     <td>42 4D
-     <td>image/bmp
-     <td>Safe
-     <td>The string "<code title="">BM</code>", a BMP signature.
-    <tr><td>FF FF FF FF
-     <td>00 00 01 00
-     <td>image/vnd.microsoft.icon
-     <td>Safe
-     <td>A 0 word following by a 1 word, a Windows Icon file format signature.
-
-  </table><p class=XXX>I'd like to add types like MPEG, AVI, Flash,
-  Java, etc, to the above table.</p>
-
-  <p>User agents may support further types if desired, by implicitly
-  adding to the above table. However, user agents should not use any
-  other patterns for types already mentioned in the table above, as
-  this could then be used for privilege escalation (where, e.g., a
-  server uses the above table to determine that content is not HTML
-  and thus safe from XSS attacks, but then a user agent detects it as
-  HTML anyway and allows script to execute).</p>
-
-  <p>The column marked "security" is used by the algorithm in the
-  "text or binary" section, to avoid sniffing <code title="">text/plain</code> content as a type that can be used for a
-  privilege escalation attack.</p>
-
-
-  <h4 id=content-type-sniffing:-image><span class=secno>2.7.5 </span><dfn>Content-Type sniffing: image</dfn></h4>
-
-  <p>If the resource's <var title="">official type</var> is
-  "image/svg+xml", then the sniffed type of the resource is its <var title="">official type</var> (an XML type).</p>
-
-  <p>Otherwise, if the first bytes of the resource match one of the
-  byte sequences in the first column of the following table, then the
-  sniffed type of the resource is the type given in the corresponding
-  cell in the second column on the same row:</p>
-
-  <table><thead><tr><th>Bytes in Hexadecimal
-     <th>Sniffed type
-     <th>Comment
-
-   <!-- update the table above if you change this! -->
-   <tbody><tr><td>47 49 46 38 37 61 <!-- GIF87a -->
-     <td>image/gif
-     <td>The string "<code title="">GIF87a</code>", a GIF signature.
-    <tr><td>47 49 46 38 39 61 <!-- GIF89a -->
-     <td>image/gif
-     <td>The string "<code title="">GIF89a</code>", a GIF signature.
-    <tr><td>89 50 4E 47 0D 0A 1A 0A <!-- [TAB]PNG[CR][LF][EOF][LF]; 137 80 78 71 13 10 26 10 -->
-     <td>image/png
-     <td>The PNG signature.
-    <tr><td>FF D8 FF <!-- SOI marker followed by the first byte of another marker -->
-     <td>image/jpeg
-     <td>A JPEG SOI marker followed by the first byte of another marker.
-    <tr><td>42 4D
-     <td>image/bmp
-     <td>The string "<code title="">BM</code>", a BMP signature.
-    <tr><td>00 00 01 00
-     <td>image/vnd.microsoft.icon
-     <td>A 0 word following by a 1 word, a Windows Icon file format signature.
-    <!-- XXX Mozilla also detects ART (AOL proprietary format) and Windows Cursor files -->
-  </table><p>Otherwise, the sniffed type of the resource is the same as
-  its <var title="">official type</var>.</p>
-
-
-  <h4 id=content-type-sniffing:-feed-or-html><span class=secno>2.7.6 </span><dfn>Content-Type sniffing: feed or HTML</dfn></h4>
-  <!-- mostly based on:
-   http://blogs.msdn.com/rssteam/articles/PublishersGuide.aspx
-   http://lxr.mozilla.org/seamonkey/source/browser/components/feeds/src/nsFeedSniffer.cpp#192
-   http://lxr.mozilla.org/seamonkey/source/browser/components/feeds/src/nsFeedSniffer.cpp#127
-  -->
-
-  <ol><li><p>The user agent may wait for 512 or more bytes of the
-   resource to be available.</li>
-
-   <li><p>Let <var title="">s</var> be the stream of bytes, and let
-   <span title=""><var title="">s</var>[<var title="">i</var>]</span>
-   represent the byte in <var title="">s</var> with position <var title="">i</var>, treating <var title="">s</var> as zero-indexed
-   (so the first byte is at <span title=""><var title="">i</var>=0</span>).</li>
-
-   <li><p>If at any point this algorithm requires the user agent to
-   determine the value of a byte in <var title="">s</var> which is not
-   yet available, or which is past the first 512 bytes of the
-   resource, or which is beyond the end of the resource, the user
-   agent must stop this algorithm, and assume that the sniffed type of
-   the resource is "text/html".</p>
-
-   <p class=note>User agents are allowed, by the first step of this
-   algorithm, to wait until the first 512 bytes of the resource are
-   available.</li>
-
-   <li><p>Initialize <var title="">pos</var> to 0.</li>
-
-   <li><p>If <span title=""><var title="">s</var>[0]</span> is 0xEF,
-   <span title=""><var title="">s</var>[1]</span> is 0xBB, and <span title=""><var title="">s</var>[2]</span> is 0xBF, then set <var title="">pos</var> to 3. (This skips over a leading UTF-8 BOM, if
-   any.)</li>
-
-   <li><p><i>Loop start:</i> Examine <span title=""><var title="">s</var>[<var title="">pos</var>]</span>.</p>
-
-   <dl class=switch><!-- skip whitespace (S token as defined in XML 1.0 section 2.3; production [3] --><dt>If it is 0x09 (ASCII tab), 0x20 (ASCII space), 0x0A (ASCII LF), or 0x0D (ASCII CR)</dt>
-    <dd>Increase <var title="">pos</var> by 1 and repeat this step.</dd>
-
-    <dt>If it is 0x3C (ASCII "<code title=""><</code>")</dt>
-    <dd>Increase <var title="">pos</var> by 1 and go to the next step.</dd>
-
-    <dt>If it is anything else</dt>
-    <dd>The sniffed type of the resource is "text/html". Abort these
-    steps.</dd>
-
-   </dl></li>
-
-   <li><p>If the bytes with positions <var title="">pos</var> to
-   <span title=""><var title="">pos</var>+2</span> in <var title="">s</var> are
-   exactly equal to 0x21, 0x2D, 0x2D respectively (ASCII for "<code title="">!--</code>"), then:</p>
-
-    <ol><li>Increase <var title="">pos</var> by 3.</li> <!-- skips past the " ! - - " -->
-
-     <li>If the bytes with positions <span title=""><var title="">pos</var></span> to <span title=""><var title="">pos</var>+2</span> in <var title="">s</var> are exactly
-     equal to 0x2D, 0x2D, 0x3E respectively (ASCII for "<code title="">--></code>"), then increase <var title="">pos</var>
-     by 3 and jump back to the previous step (the step labeled
-     <i>loop start</i>) in the overall algorithm in this section.</li>
-
-     <li>Otherwise, increase <var title="">pos</var> by 1.</li>
-
-     <li>Return to step 2 in these substeps.</li>
-
-    </ol></li>
-
-   <li><p>If <span title=""><var title="">s</var>[<var title="">pos</var>]</span> is 0x21 (ASCII "<code title="">!</code>"):</p>
-
-    <!-- this skips past a DOCTYPE if there is one. It is brain-dead
-    because we don't have to be clever to parse the Atom and RSS x.y
-    DOCTYPEs, as they don't do anything clever like have internal
-    subsets or quoted ">" characters. If this fails, then that's ok,
-    we'll treat it as HTML which is fine since we know it's not a feed
-    in that case. -->
-
-    <ol><li>Increase <var title="">pos</var> by 1.</li>
-
-     <li>If <span title=""><var title="">s</var>[<var title="">pos</var>]</span> equal 0x3E, then increase <var title="">pos</var> by 1 and jump back to the step labeled
-     <i>loop start</i> in the overall algorithm in this section.</li>
-
-     <li>Otherwise, return to step 1 in these substeps.</li>
-
-    </ol></li>
-
-   <li><p>If <span title=""><var title="">s</var>[<var title="">pos</var>]</span> is 0x3F (ASCII "<code title="">?</code>"):</p>
-
-    <ol><li>Increase <var title="">pos</var> by 1.</li>
-
-     <li>If <span title=""><var title="">s</var>[<var title="">pos</var>]</span> and <span title=""><var title="">s</var>[<var title="">pos</var>+1]</span> equal 0x3F and
-     0x3E respectively, then increase <var title="">pos</var> by 1 and
-     jump back to the step labeled <i>loop start</i> in the overall
-     algorithm in this section.</li>
-
-     <li>Otherwise, return to step 1 in these substeps.</li>
-
-    </ol></li>
-
-   <li><p>Otherwise, if the bytes in <var title="">s</var> starting at
-   <var title="">pos</var> match any of the sequences of bytes in the
-   first column of the following table, then the user agent must
-   follow the steps given in the corresponding cell in the second
-   column of the same row.</p>
-
-    <table><thead><tr><th>Bytes in Hexadecimal
-       <th>Requirement
-       <th>Comment
-
-     <tbody><tr><td>72 73 73
-       <td>The sniffed type of the resource is "application/rss+xml"; abort these steps
-       <td>The three ASCII characters "<code title="">rss</code>"
-      <tr><td>66 65 65 64
-       <td>The sniffed type of the resource is "application/atom+xml"; abort these steps
-       <td>The four ASCII characters "<code title="">feed</code>"
-      <tr><td>72 64 66 3A 52 44 46
-       <td>Continue to the next step in this algorithm
-       <td>The ASCII characters "<code title="">rdf:RDF</code>"
-    </table><p>If none of the byte sequences above match the bytes in <var title="">s</var> starting at <var title="">pos</var>, then the
-    sniffed type of the resource is "text/html". Abort these
-    steps.</p>
-
-   </li>
-
-   <li><p class=XXX>If, before the next ">", you find two
-   xmlns* attributes with http://www.w3.org/1999/02/22-rdf-syntax-ns#
-   and http://purl.org/rss/1.0/ as the namespaces, then the sniffed
-   type of the resource is "application/rss+xml", abort these
-   steps. (maybe we only need to check for http://purl.org/rss/1.0/
-   actually)</li>
-
-   <li><p>Otherwise, the sniffed type of the resource is
-   "text/html".</li>
-
-  </ol><p class=note>For efficiency reasons, implementations may wish to
-  implement this algorithm and the algorithm for detecting the
-  character encoding of HTML documents in parallel.</p>
-
   </div>
 
+
   <div class=impl>
 
-  <h3 id=character-encodings-0><span class=secno>2.8 </span>Character encodings</h3>
+  <h3 id=character-encodings-0><span class=secno>2.7 </span>Character encodings</h3>
 
   <p>User agents must at a minimum support the UTF-8 and Windows-1252
   encodings, but may support more.</p>
@@ -6000,9 +5399,9 @@
   </div>
 
 
-  <h3 id=common-dom-interfaces><span class=secno>2.9 </span>Common DOM interfaces</h3>
+  <h3 id=common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</h3>
 
-  <h4 id=reflecting-content-attributes-in-dom-attributes><span class=secno>2.9.1 </span>Reflecting content attributes in DOM attributes</h4>
+  <h4 id=reflecting-content-attributes-in-dom-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in DOM attributes</h4>
 
   <p>Some <span title="DOM attribute">DOM attributes</span> are
   defined to <dfn id=reflect>reflect</dfn> a particular <span>content
@@ -6171,7 +5570,7 @@
   </div>
 
 
-  <h4 id=collections><span class=secno>2.9.2 </span>Collections</h4>
+  <h4 id=collections><span class=secno>2.8.2 </span>Collections</h4>
 
   <p>The <code><a href=#htmlcollection-0>HTMLCollection</a></code>,
   <code><a href=#htmlformcontrolscollection-0>HTMLFormControlsCollection</a></code>,
@@ -6207,7 +5606,7 @@
   </div>
 
 
-  <h5 id=htmlcollection><span class=secno>2.9.2.1 </span>HTMLCollection</h5>
+  <h5 id=htmlcollection><span class=secno>2.8.2.1 </span>HTMLCollection</h5>
 
   <p>The <code><a href=#htmlcollection-0>HTMLCollection</a></code> interface represents a generic
   <a href=#collections-0 title=collections>collection</a> of elements.</p>
@@ -6302,7 +5701,7 @@
   </div>
 
 
-  <h5 id=htmlformcontrolscollection><span class=secno>2.9.2.2 </span>HTMLFormControlsCollection</h5>
+  <h5 id=htmlformcontrolscollection><span class=secno>2.8.2.2 </span>HTMLFormControlsCollection</h5>
 
   <p>The <code><a href=#htmlformcontrolscollection-0>HTMLFormControlsCollection</a></code> interface represents
   a <a href=#collections-0 title=collections>collection</a> of <a href=#category-listed title=category-listed>listed</a> elements in <code><a href=#the-form-element>form</a></code>
@@ -6435,7 +5834,7 @@
 --></div>
 
 
-  <h5 id=htmloptionscollection><span class=secno>2.9.2.3 </span>HTMLOptionsCollection</h5>
+  <h5 id=htmloptionscollection><span class=secno>2.8.2.3 </span>HTMLOptionsCollection</h5>
 
   <p>The <code><a href=#htmloptionscollection-0>HTMLOptionsCollection</a></code> interface represents a
   list of <code><a href=#the-option-element>option</a></code> elements. It is always rooted on a
@@ -6597,7 +5996,7 @@
   </ol><!-- see also http://ln.hixie.ch/?start=1161042744&count=1 --></div>
 
 
-  <h5 id=htmlpropertycollection><span class=secno>2.9.2.4 </span>HTMLPropertyCollection</h5>
+  <h5 id=htmlpropertycollection><span class=secno>2.8.2.4 </span>HTMLPropertyCollection</h5>
 
   <p>The <code><a href=#htmlpropertycollection-0>HTMLPropertyCollection</a></code> interface represents a
   <a href=#collections-0 title=collections>collection</a> of elements that add
@@ -6702,7 +6101,7 @@
   </div>
 
 
-  <h4 id=domtokenlist><span class=secno>2.9.3 </span>DOMTokenList</h4>
+  <h4 id=domtokenlist><span class=secno>2.8.3 </span>DOMTokenList</h4>
 
   <p>The <code><a href=#domtokenlist-0>DOMTokenList</a></code> interface represents an interface
   to an underlying string that consists of an <a href=#unordered-set-of-unique-space-separated-tokens>unordered set of
@@ -6867,7 +6266,7 @@
   </div>
 
 
-  <h4 id=domsettabletokenlist><span class=secno>2.9.4 </span>DOMSettableTokenList</h4>
+  <h4 id=domsettabletokenlist><span class=secno>2.8.4 </span>DOMSettableTokenList</h4>
 
   <p>The <code><a href=#domsettabletokenlist-0>DOMSettableTokenList</a></code> interface is the same as the
   <code><a href=#domtokenlist-0>DOMTokenList</a></code> interface, except that it allows the
@@ -6899,7 +6298,7 @@
 
   <div class=impl>
 
-  <h4 id=safe-passing-of-structured-data><span class=secno>2.9.5 </span>Safe passing of structured data</h4>
+  <h4 id=safe-passing-of-structured-data><span class=secno>2.8.5 </span>Safe passing of structured data</h4>
 
   <p>When a user agent is required to obtain a <dfn id=structured-clone>structured
   clone</dfn> of an object, it must run the following algorithm, which
@@ -7014,7 +6413,7 @@
   </dl></div>
 
 
-  <h4 id=domstringmap><span class=secno>2.9.6 </span>DOMStringMap</h4>
+  <h4 id=domstringmap><span class=secno>2.8.6 </span>DOMStringMap</h4>
 
   <p>The <code><a href=#domstringmap-0>DOMStringMap</a></code> interface represents a set of
   name-value pairs. It exposes these using the scripting language's
@@ -7093,7 +6492,7 @@
   </div>
 
 
-  <h4 id=dom-feature-strings><span class=secno>2.9.7 </span>DOM feature strings</h4>
+  <h4 id=dom-feature-strings><span class=secno>2.8.7 </span>DOM feature strings</h4>
 
   <p>DOM3 Core defines mechanisms for checking for interface support,
   and for obtaining implementations of interfaces, using <a href=http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMFeatures>feature
@@ -7123,7 +6522,7 @@
   not guaranteed that an implementation that supports "<code title="">HTML</code>" "<code>5.0</code>" also supports "<code title="">HTML</code>" "<code>2.0</code>".</p>
 
 
-  <h4 id=exceptions><span class=secno>2.9.8 </span>Exceptions</h4>
+  <h4 id=exceptions><span class=secno>2.8.8 </span>Exceptions</h4>
 
   <p>The following <code>DOMException</code> codes are defined in DOM
   Core. <a href=#refsDOMCORE>[DOMCORE]</a></p>
@@ -7157,7 +6556,7 @@
    <li value=82><dfn id=serialise_err><code>SERIALISE_ERR</code></dfn></li> <!-- actually defined in dom3ls -->
   </ol><div class=impl>
 
-  <h4 id=garbage-collection><span class=secno>2.9.9 </span>Garbage collection</h4>
+  <h4 id=garbage-collection><span class=secno>2.8.9 </span>Garbage collection</h4>
 
   <p>There is an <dfn id=implied-strong-reference>implied strong reference</dfn> from any DOM
   attribute that returns a pre-existing object to that object.</p>
@@ -10641,21 +10040,21 @@
   aforementioned assumed type.</p>
 
   <p id=concept-link-type-sniffing>If the external resource link
-  type defines rules for processing the resource's <a href=#content-type-0 title=Content-Type>Content-Type metadata</a>, then those rules
+  type defines rules for processing the resource's <a href=#content-type title=Content-Type>Content-Type metadata</a>, then those rules
   apply. Otherwise, if the resource is expected to be an image, user
   agents may apply the <a href=#content-type-sniffing:-image title="Content-Type sniffing:
   image">image sniffing rules</a>, with the <var title="">official
-  type</var> being the type determined from the resource's <a href=#content-type-0 title=Content-Type>Content-Type metadata</a>, and use the
+  type</var> being the type determined from the resource's <a href=#content-type title=Content-Type>Content-Type metadata</a>, and use the
   resulting sniffed type of the resource as if it was the actual
   type. Otherwise, if neither of these conditions apply or if the user
   agent opts not to apply the image sniffing rules, then the user
-  agent must use the resource's <a href=#content-type-0 title=Content-Type>Content-Type metadata</a> to determine the
+  agent must use the resource's <a href=#content-type title=Content-Type>Content-Type metadata</a> to determine the
   type of the resource. If there is no type metadata, but the external
   resource link type has a default type defined, then the user agent
   must assume that the resource is of that type.</p>
 
   <p class=note>The <code title=link-type-stylesheet>stylesheet</code> link type defines
-  rules for processing the resource's <a href=#content-type-0 title=Content-Type>Content-Type metadata</a>.</p>
+  rules for processing the resource's <a href=#content-type title=Content-Type>Content-Type metadata</a>.</p>
 
   <p>Once the user agent has established the type of the resource, the
   user agent must apply the resource if it is of a supported type and
@@ -10681,7 +10080,7 @@
    <code>text/plain</code>, or any other type, it would not.</p>
 
    <p>If one the two files was returned without a
-   <a href=#content-type-0>Content-Type</a> metadata, or with a syntactically
+   <a href=#content-type>Content-Type</a> metadata, or with a syntactically
    incorrect type like <code title="">Content-Type: "null"</code>, then the default type
    for <code title=rel-stylesheet><a href=#link-type-stylesheet>stylesheet</a></code> links would kick
    in. Since that default type is <code title="">text/css</code>, the
@@ -11361,7 +10760,7 @@
 
   </ul><p>If an <a href=#html-documents title="HTML documents">HTML document</a> does not
   start with a BOM, and if its encoding is not explicitly given by
-  <a href=#content-type-0 title=Content-Type>Content-Type metadata</a>, then the
+  <a href=#content-type title=Content-Type>Content-Type metadata</a>, then the
   character encoding used must be an <a href=#ascii-compatible-character-encoding>ASCII-compatible character
   encoding</a>, and, in addition, if that encoding isn't US-ASCII
   itself, then the encoding must be specified using a
@@ -11573,7 +10972,7 @@
    type. For <code><a href=#the-style-element>style</a></code> elements, this is the same as the
    <code title=attr-style-type><a href=#attr-style-type>type</a></code> content attribute's
    value, or <code title="">text/css</code> if that is omitted. For
-   <code><a href=#the-link-element>link</a></code> elements, this is the <a href=#content-type-0 title=Content-Type>Content-Type metadata of the specified
+   <code><a href=#the-link-element>link</a></code> elements, this is the <a href=#content-type title=Content-Type>Content-Type metadata of the specified
    resource</a>.</dd>
 
    <dt>The location (<code title=dom-stylesheet-href>href</code> DOM attribute)</dt>
@@ -11713,7 +11112,7 @@
   attribute is set, its value must be a valid character encoding name,
   must be the preferred name for that encoding, and must match the
   encoding given in the <code title="">charset</code> parameter of the
-  <a href=#content-type-0 title=Content-Type>Content-Type metadata</a> of the
+  <a href=#content-type title=Content-Type>Content-Type metadata</a> of the
   external file, if any. <a href=#refsIANACHARSET>[IANACHARSET]</a></p>
 
   <p>The <dfn id=attr-script-async title=attr-script-async><code>async</code></dfn> and
@@ -11862,7 +11261,7 @@
     agent must act as if it had received an empty HTTP 400
     response.</p>
 
-    <p>Once the resource's <a href=#content-type-0 title=Content-Type>Content Type
+    <p>Once the resource's <a href=#content-type title=Content-Type>Content Type
     metadata</a> is available, if it ever is, apply the
     <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for extracting an encoding from a
     Content-Type</a> to it. If this returns an encoding, and the
@@ -17552,10 +16951,10 @@
 
   <p>The user agents should apply the <a href=#content-type-sniffing:-image title="Content-Type
   sniffing: image">image sniffing rules</a> to determine the type
-  of the image, with the image's <a href=#content-type-0 title=Content-Type>associated
+  of the image, with the image's <a href=#content-type title=Content-Type>associated
   Content-Type headers</a> giving the <var title="">official
   type</var>. If these rules are not applied, then the type of the
-  image must be the type given by the image's <a href=#content-type-0 title=Content-Type>associated Content-Type headers</a>.</p>
+  image must be the type given by the image's <a href=#content-type title=Content-Type>associated Content-Type headers</a>.</p>
 
   <p>User agents must not support non-image resources with the
   <code><a href=#the-img-element>img</a></code> element (e.g. XML files whose root element is an
@@ -19269,7 +18668,7 @@
   parameters. If both the <code title=attr-embed-type><a href=#attr-embed-type>type</a></code>
   attribute and the <code title=attr-embed-src><a href=#attr-embed-src>src</a></code> attribute
   are present, then the <code title=attr-embed-type><a href=#attr-embed-type>type</a></code>
-  attribute must specify the same type as the <a href=#content-type-0 title=Content-Type>explicit Content-Type metadata</a> of the
+  attribute must specify the same type as the <a href=#content-type title=Content-Type>explicit Content-Type metadata</a> of the
   resource given by the <code title=attr-embed-src><a href=#attr-embed-src>src</a></code>
   attribute. <a href=#refsRFC2046>[RFC2046]</a></p>
 
@@ -19357,7 +18756,7 @@
 
    </li>
 
-   <li><p>Otherwise, if the specified resource has <a href=#content-type-0 title=Content-Type>explicit Content-Type metadata</a>, then
+   <li><p>Otherwise, if the specified resource has <a href=#content-type title=Content-Type>explicit Content-Type metadata</a>, then
    that is the <span>content's type</span>.</li>
 
    <li><p>Otherwise, the content has no type and there can be no
@@ -19594,8 +18993,8 @@
 
        <li>
 
-        <p>If the resource has <a href=#content-type-0 title=Content-Type>associated
-        Content-Type metadata</a>, then let the <var title="">resource type</var> be the type specified in <a href=#content-type-0 title=Content-Type>the resource's Content-Type
+        <p>If the resource has <a href=#content-type title=Content-Type>associated
+        Content-Type metadata</a>, then let the <var title="">resource type</var> be the type specified in <a href=#content-type title=Content-Type>the resource's Content-Type
         metadata</a>.</p>
 
        </li>
@@ -21217,7 +20616,7 @@
      to network errors, causing the user agent to give up trying to
      fetch the resource</dt>
 
-     <dt>If the <a href=#media-resource>media resource</a> is found to have <a href=#content-type-0 title=Content-Type>Content-Type metadata</a> that, when
+     <dt>If the <a href=#media-resource>media resource</a> is found to have <a href=#content-type title=Content-Type>Content-Type metadata</a> that, when
      parsed as a MIME type (including any codecs described by the
      <code title="">codec</code> parameter), represents <a href=#a-type-that-the-user-agent-knows-it-cannot-render>a type
      that the user agent knows it cannot render</a> (even if the
@@ -32634,10 +32033,10 @@
 
   <p>The user agents should apply the <a href=#content-type-sniffing:-image title="Content-Type
   sniffing: image">image sniffing rules</a> to determine the type
-  of the image, with the image's <a href=#content-type-0 title=Content-Type>associated
+  of the image, with the image's <a href=#content-type title=Content-Type>associated
   Content-Type headers</a> giving the <var title="">official
   type</var>. If these rules are not applied, then the type of the
-  image must be the type given by the image's <a href=#content-type-0 title=Content-Type>associated Content-Type headers</a>.</p>
+  image must be the type given by the image's <a href=#content-type title=Content-Type>associated Content-Type headers</a>.</p>
 
   <p>User agents must not support non-image resources with the
   <code><a href=#the-input-element>input</a></code> element. User agents must not run executable code
@@ -47919,11 +47318,11 @@
     to an HTTP resource with an HTTP 204 No Content response.</p>
 
     <p>Otherwise, the URL must be treated in a manner equivalent to an
-    HTTP resource with a 200 OK response whose <a href=#content-type-0 title=Content-Type>Content-Type metadata</a> is <code title="">text/html</code> and whose response body is the return
+    HTTP resource with a 200 OK response whose <a href=#content-type title=Content-Type>Content-Type metadata</a> is <code title="">text/html</code> and whose response body is the return
     value converted to a string value.</p>
 
     <p class=note>Certain contexts, in particular <code><a href=#the-img-element>img</a></code>
-    elements, ignore the <a href=#content-type-0 title=Content-Type>Content-Type
+    elements, ignore the <a href=#content-type title=Content-Type>Content-Type
     metadata</a>.</p>
 
    </li>
@@ -51960,7 +51359,7 @@
    </li>
 
    <li><p>If the document's out-of-band metadata (e.g. HTTP headers),
-   not counting any <a href=#content-type-0 title=Content-Type>type information</a>
+   not counting any <a href=#content-type title=Content-Type>type information</a>
    (such as the Content-Type HTTP header), requires some sort of
    processing that will not affect the browsing context, then perform
    that processing and abort these steps.</p>
@@ -52162,7 +51561,7 @@
 
   <p class=note>The <a href=#the-input-stream>input stream</a> converts bytes into
   characters for use in the <a href=#tokenization title=tokenization>tokenizer</a>. This process relies, in part,
-  on character encoding information found in the real <a href=#content-type-0 title=Content-Type>Content-Type metadata</a> of the resource;
+  on character encoding information found in the real <a href=#content-type title=Content-Type>Content-Type metadata</a> of the resource;
   the "sniffed type" is not used for this purpose.</p>
 
   <!-- next two paragraphs are nearly identical to the navigate-text
@@ -53638,7 +53037,7 @@
   <div class=impl>
 
   <p><strong>Quirk:</strong> If the document has been set to
-  <a href=#quirks-mode>quirks mode</a> and the <a href=#content-type-0 title=Content-Type>Content-Type metadata</a> of the external
+  <a href=#quirks-mode>quirks mode</a> and the <a href=#content-type title=Content-Type>Content-Type metadata</a> of the external
   resource is not a supported style sheet type, the user agent must
   instead assume it to be <code title="">text/css</code>.</p>
 
@@ -59399,7 +58798,7 @@
   algorithm (the <dfn id=encoding-sniffing-algorithm>encoding sniffing algorithm</dfn>) to determine
   the character encoding to use when decoding a document in the first
   pass. This algorithm takes as input any out-of-band metadata
-  available to the user agent (e.g. the <a href=#content-type-0 title=Content-Type>Content-Type metadata</a> of the document)
+  available to the user agent (e.g. the <a href=#content-type title=Content-Type>Content-Type metadata</a> of the document)
   and all the bytes available so far, and returns an encoding and a
   <dfn id=concept-encoding-confidence title=concept-encoding-confidence>confidence</dfn>. The
   confidence is either <i>tentative</i>, <i>certain</i>, or

Modified: source
===================================================================
--- source	2009-06-12 19:46:29 UTC (rev 3233)
+++ source	2009-06-12 21:58:16 UTC (rev 3234)
@@ -4969,794 +4969,43 @@
 
   </div>
 
-  </div>
 
+  <h4 id="content-type-sniffing">Determining the type of a resource</h4>
 
-  <div class="impl">
+  <!-- MIMESNIFF = http://tools.ietf.org/html/draft-abarth-mime-sniff -->
 
-  <h3 id="content-type-sniffing">Determining the type of a resource</h3>
+  <p>The <dfn title="Content-Type">Content-Type metadata</dfn> of a
+  resource must be obtained and interpreted in a manner consistent
+  with the requirements of the Content-Type Processing Model
+  specification. <a href="#refsMIMESNIFF">[MIMESNIFF]</a></p>
 
-  <p class="warning">It is imperative that the rules in this section
-  be followed exactly. When a user agent uses different heuristics for
-  content type detection than the server expects, security problems
-  can occur. For example, if a server believes that the client will
-  treat a contributed file as an image (and thus treat it as benign),
-  but a Web browser believes the content to be HTML (and thus execute
-  any scripts contained therein), the end user can be exposed to
-  malicious content, making the user vulnerable to cookie theft
-  attacks and other cross-site scripting attacks.</p>
-
-
-  <h4 id="content-type">Content-Type metadata</h4>
-
-  <p>What explicit <dfn title="Content-Type">Content-Type
-  metadata</dfn> is associated with the resource (the resource's type
-  information) depends on the protocol that was used to
-  <span>fetch</span> the resource.</p>
-
-  <p>For HTTP resources, only the first Content-Type HTTP header, if
-  any, contributes any type information; the explicit type of the
-  resource is then the value of that header, interpreted as described
-  by the HTTP specifications. If the Content-Type HTTP header is
-  present but the value of the first such header cannot be interpreted
-  as described by the HTTP specifications (e.g. because its value
-  doesn't contain a U+002F SOLIDUS ('/') character), then the resource
-  has no type information (even if there are multiple Content-Type
-  HTTP headers and one of the other ones is syntactically correct). <a
-  href="#refsHTTP">[HTTP]</a></p>
-
-  <p>For resources fetched from the file system, user agents should use
-  platform-specific conventions, e.g. operating system extension/type
-  mappings.</p>
-
-  <p>Extensions must not be used for determining resource types for
-  resources fetched over HTTP.</p>
-
-  <p>For resources fetched over most other protocols, e.g. FTP, there
-  is no type information.</p>
-
-
   <p>The <dfn>algorithm for extracting an encoding from a
-  Content-Type</dfn>, given a string <var title="">s</var>, is as
-  follows. It either returns an encoding or nothing.</p>
+  Content-Type</dfn>, given a string <var title="">s</var>, is given
+  in the Content-Type Processing Model specification. It either
+  returns an encoding or nothing. <a
+  href="#refsMIMESNIFF">[MIMESNIFF]</a></p>
 
-  <ol>
-
-   <li><p>Find the first seven characters in <var title="">s</var>
-   that are an <span>ASCII case-insensitive</span> match for the word
-   "charset". If no such match is found, return nothing.</p>
-
-   <li><p>Skip any U+0009, U+000A, U+000C, U+000D, or U+0020
-   characters that immediately follow the word 'charset' (there might
-   not be any).</p></li>
-
-   <li><p>If the next character is not a U+003D EQUALS SIGN ('='),
-   return nothing.</p></li>
-
-   <li><p>Skip any U+0009, U+000A, U+000C, U+000D, or U+0020
-   characters that immediately follow the equals sign (there might not
-   be any).</p></li>
-
-   <li><p>Process the next character as follows:</p>
-
-    <dl class="switch">
-
-     <dt>If it is a U+0022 QUOTATION MARK ('"') and there is a later
-     U+0022 QUOTATION MARK ('"') in <var title="">s</var></dt>
-
-     <dt>If it is a U+0027 APOSTROPHE ("'") and there is a later
-     U+0027 APOSTROPHE ("'") in  <var title="">s</var></dt>
-
-     <dd><p>Return the string between this character and the next
-     earliest occurrence of this character.</dd>
-
-
-     <dt>If it is an unmatched U+0022 QUOTATION MARK ('"')</dt>
-     <dt>If it is an unmatched U+0027 APOSTROPHE ("'")</dt>
-     <dt>If there is no next character</dt>
-
-     <dd><p>Return nothing.</dd>
-
-
-     <dt>Otherwise</dt>
-
-     <dd><p>Return the string from this character to the first U+0009,
-     U+000A, U+000C, U+000D, U+0020, or U+003B character or the end of
-     <var title="">s</var>, whichever comes first.</dd>
-
-    </dl>
-
-   </li>
-
-  </ol>
-
-  <p class="note">The above algorithm is a <span>willful
-  violation</span> of the HTTP specification, which requires that the
-  Content-Type headers be honored, despite implementation experience
-  showing that this is not pratical in many cases. <a
-  href="#refsHTTP">[HTTP]</a></p>
-
-
-  <h4>Content-Type sniffing: Web pages</h4>
-
   <p>The <dfn title="Content-Type sniffing">sniffed type of a
-  resource</dfn> must be found as follows:</p>
+  resource</dfn> must be found in a manner consistent with the
+  requirements given in the Content-Type Processing Model
+  specification for finding that <i>sniffed type</i>. <a
+  href="#refsMIMESNIFF">[MIMESNIFF]</a></p>
 
-  <ol>
+  <p>The <dfn title="Content-Type sniffing: image">rules for sniffing
+  images specifically</dfn> are also defined in the Content-Type
+  Processing Model specification. <a
+  href="#refsMIMESNIFF">[MIMESNIFF]</a></p>
 
-   <li><p>If the user agent is configured to strictly obey
-   Content-Type headers for this resource, then jump to the last step
-   in this set of steps.</p></li>
+  <p class="warning">It is imperative that the rules in the
+  Content-Type Processing Model specification be followed
+  exactly. When a user agent uses different heuristics for content
+  type detection than the server expects, security problems can
+  occur. For more details, see the Content-Type Processing Model
+  specification. <a href="#refsMIMESNIFF">[MIMESNIFF]</a></p>
 
-   <li><p>If the resource was fetched over an HTTP protocol and there
-   is an HTTP Content-Type header and the value of the first such
-   header has bytes that exactly match one of the following lines:</p>
-
-    <table>
-     <thead>
-      <tr>
-       <th>Bytes in Hexadecimal
-       <th>Textual representation
-     <tbody>
-      <tr> <!-- Very old Apache default -->
-       <td>74 65 78 74 2f 70 6c 61 69 6e
-       <td><code title="">text/plain</code>
-      <tr> <!-- Old Apache default -->
-       <td>74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 49 53 4f 2d 38 38 35 39 2d 31
-       <td><code title="">text/plain; charset=ISO-8859-1</code>
-      <tr> <!-- Debian's arbitrarily different Apache default -->
-       <td>74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 69 73 6f 2d 38 38 35 39 2d 31
-       <td><code title="">text/plain; charset=iso-8859-1</code>
-      <tr> <!-- Someone else's arbitrarily different Apache default (who?) -->
-       <td>74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 55 54 46 2d 38
-       <td><code title="">text/plain; charset=UTF-8</code>
-    </table>
-
-    <p>...then jump to the <i title="content-type sniffing: text or
-    binary">text or binary</i> section below.</p>
-
-    <!-- while IE sniffs all text/plain, this will continue to grow as
-    people add new defaults. Hopefully IE will stop the madness in due
-    course and stop sniffing anything but the above... -->
-
-   </li>
-
-   <li><p>Let <var title="">official type</var> be the type given by
-   the <span title="Content-Type">Content-Type metadata</span> for the
-   resource, ignoring parameters. If there is no such type, jump to
-   the <i title="content-type sniffing: unknown type">unknown type</i>
-   step below. Comparisons with this type, as defined by MIME
-   specifications, are done in an <span>ASCII case-insensitive</span>
-   manner. <a href="#refsRFC2046">[RFC2046]</a></p></li>
-
-   <li><p>If <var title="">official type</var> is "unknown/unknown" or
-   "application/unknown", jump to the <i title="content-type sniffing:
-   unknown type">unknown type</i> step below.</p> <!-- In a study
-   looking at many billions of pages whose first five characters were
-   "<HTML", "unknown/unknown" was used to label documents about once
-   for every 5000 pages labeled "text/html", and "application/unknown"
-   was used about once for every 35000 pages labeled
-   "text/html". --></li>
-
-   <li><p>If <var title="">official type</var> ends in "+xml", or if
-   it is either "text/xml" or "application/xml", then the sniffed
-   type of the resource is <var title="">official type</var>; return
-   that and abort these steps.</p></li>
-
-   <li><p>If <var title="">official type</var> is an image type
-   supported by the user agent (e.g. "image/png", "image/gif",
-   "image/jpeg", etc), then jump to the <i title="content-type
-   sniffing: image">images</i> section below, passing it the <var
-   title="">official type</var>.</p></li>
-
-   <li><p>If <var title="">official type</var> is "text/html", then
-   jump to the <i title="content-type sniffing: feed or html">feed or
-   HTML</i> section below.</p></li>
-
-   <li><p>The sniffed type of the resource is <var title="">official
-   type</var>.</p></li>
-
-  </ol>
-
-
-  <h4><dfn>Content-Type sniffing: text or binary</dfn></h4>
-
-  <ol>
-
-   <li><p>The user agent may wait for 512 or more bytes of the resource
-   to be available.</p></li>
-
-   <li><p>Let <var title="">n</var> be the smaller of either 512 or
-   the number of bytes already available.</p></li>
-
-   <li>
-
-    <p>If <var title="">n</var> is 4 or more, and the first bytes of
-    the resource match one of the following byte sets:</p>
-
-    <!-- this table is present in several forms in this file; keep them in sync -->
-    <table>
-     <thead>
-      <tr>
-       <th>Bytes in Hexadecimal
-       <th>Description
-     <tbody>
-      <tr>
-       <td>FE FF
-       <td>UTF-16BE BOM <!-- followed by a character --><!-- nobody uses this: or UTF-32LE BOM -->
-      <tr>
-       <td>FF FE
-       <td>UTF-16LE BOM <!-- followed by a character -->
-<!-- nobody uses this
-      <tr>
-       <td>00 00 FE FF
-       <td>UTF-32BE BOM
--->
-<!-- this one is redundant with the one above
-      <tr>
-       <td>FF FE 00 00
-       <td>UTF-32LE BOM
--->
-      <tr>
-       <td>EF BB BF
-       <td>UTF-8 BOM <!-- followed by a character, or the first byte of a multiple character sequence -->
-<!-- nobody uses this
-      <tr>
-       <td>DD 73 66 73
-       <td>UTF-EBCDIC BOM
--->
-    </table>
-
-    <p>...then the sniffed type of the resource is "text/plain". Abort
-    these steps.</p>
-
-   </li>
-
-   <li><p>If none of the first <var title="">n</var> bytes of the
-   resource are <span>binary data bytes</span> then the sniffed type
-   of the resource is "text/plain". Abort these steps.</p></li>
-
-   <li>
-
-    <p>If the first bytes of the resource match one of the byte
-    sequences in the "pattern" column of the table in the <i
-    title="content-type sniffing: unknown type">unknown type</i>
-    section below, ignoring any rows whose cell in the "security"
-    column says "scriptable" (or "n/a"), then the sniffed type of the
-    resource is the type given in the corresponding cell in the
-    "sniffed type" column on that row; abort these steps.</p>
-
-    <p class="warning">It is critical that this step not ever return a
-    scriptable type (e.g. text/html), as otherwise that would allow a
-    privilege escalation attack.</p>
-
-   </li>
-
-   <li><p>Otherwise, the sniffed type of the resource is
-   "application/octet-stream".</p></li>
-
-  </ol>
-
-  <p>Bytes covered by the following ranges are <dfn>binary data
-  bytes</dfn>:</p>
-
-  <!-- This byte list is based on RFC 2046 Section 4.1.2. Characters
-  in the range 0x00-0x1F, with the exception of 0x09, 0x0A, 0x0C, 0x0D
-  (ASCII for TAB, LF, FF, and CR), and character 0x1B (reportedly used
-  by some encodings as a shift escape), are invalid. Thus, if we see
-  them, we assume it's not text. -->
-
-  <ul class="brief">
-   <li> 0x00 - 0x08 </li>
-   <li> 0x0B </li>
-   <li> 0x0E - 0x1A </li>
-   <li> 0x1C - 0x1F </li>
-  </ul>
-
-
-
-  <h4><dfn>Content-Type sniffing: unknown type</dfn></h4>
-
-  <ol>
-
-   <li><p>The user agent may wait for 512 or more bytes of the
-   resource to be available.</p></li>
-
-   <li><p>Let <var title="">stream length</var> be the smaller of
-   either 512 or the number of bytes already available.</p></li>
-
-   <li><p>For each row in the table below:</p>
-
-    <dl class="switch">
-
-     <dt>If the row has no "<em>WS</em>" bytes:</dt>
-
-     <dd>
-
-      <ol>
-
-       <li>Let <var title="">pattern length</var> be the length of the
-       pattern (number of bytes described by the cell in the second
-       column of the row).</li>
-
-       <li>If <var title="">stream length</var> is smaller than <var
-       title="">pattern length</var> then skip this row.</li>
-
-       <li>Apply the "and" operator to the first <var title="">pattern
-       length</var> bytes of the resource and the given mask (the
-       bytes in the cell of first column of that row), and let the
-       result be the <var title="">data</var>.</li>
-
-       <li>If the bytes of the <var title="">data</var> matches the
-       given pattern bytes exactly, then the sniffed type of the
-       resource is the type given in the cell of the third column in
-       that row; abort these steps.</li>
-
-      </ol>
-
-     </dd>
-
-     <dt>If the row has a "<em>WS</em>" byte:</dt>
-
-     <dd>
-
-      <ol>
-
-       <li><p>Let <var title="">index<sub>pattern</sub></var> be an
-       index into the mask and pattern byte strings of the
-       row.</p></li>
-
-       <li><p>Let <var title="">index<sub>stream</sub></var> be an
-       index into the byte stream being examined.</p></li>
-
-       <li><p><em>Loop</em>: If <var
-       title="">index<sub>stream</sub></var> points beyond the end of
-       the byte stream, then this row doesn't match, skip this
-       row.</p></li>
-
-       <li>
-
-        <p>Examine the <var title="">index<sub>stream</sub></var>th
-        byte of the byte stream as follows:</p>
-
-        <dl class="switch">
-
-         <dt>If the <var title="">index<sub>pattern</sub></var>th byte
-         of the pattern is a normal hexadecimal byte and not a "<em>WS</em>"
-         byte:</dt>
-
-         <dd>
-
-          <p>If the "and" operator, applied to the <var
-          title="">index<sub>stream</sub></var>th byte of the stream
-          and the <var title="">index<sub>pattern</sub></var>th byte
-          of the mask, yield a value different that the <var
-          title="">index<sub>pattern</sub></var>th byte of the
-          pattern, then skip this row.</p>
-
-          <p>Otherwise, increment <var
-          title="">index<sub>pattern</sub></var> to the next byte in
-          the mask and pattern and <var
-          title="">index<sub>stream</sub></var> to the next byte in
-          the byte stream.</p>
-
-         </dd>
-
-         <dt>Otherwise, if the <var
-         title="">index<sub>pattern</sub></var>th byte of the pattern
-         is a "<em>WS</em>" byte:</dt>
-
-         <dd>
-
-          <p>"<em>WS</em>" means "whitespace", and allows insignificant
-          whitespace to be skipped when sniffing for a type
-          signature.</p>
-
-          <p>If the <var title="">index<sub>stream</sub></var>th byte
-          of the stream is one of 0x09 (ASCII TAB), 0x0A (ASCII LF),
-          0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space),
-          then increment only the <var
-          title="">index<sub>stream</sub></var> to the next byte in
-          the byte stream.</p>
-
-          <p>Otherwise, increment only the <var
-          title="">index<sub>pattern</sub></var> to the next byte in
-          the mask and pattern.</p>
-
-         </dd>
-
-        </dl>
-
-       </li>
-
-       <li><p>If <var title="">index<sub>pattern</sub></var> does not
-       point beyond the end of the mask and pattern byte strings, then
-       jump back to the <em>loop</em> step in this algorithm.</p></li>
-
-       <li><p>Otherwise, the sniffed type of the resource is the type
-       given in the cell of the third column in that row; abort these
-       steps.</p></li>
-
-      </ol>
-
-     </dd>
-
-    </dl>
-
-   </li>
-
-   <li><p>If none of the first <var title="">n</var> bytes of the
-   resource are <span>binary data bytes</span> then the sniffed type
-   of the resource is "text/plain". Abort these steps.</p></li>
-
-   <li><p>Otherwise, the sniffed type of the resource is
-   "application/octet-stream".</p></li>
-
-  </ol>
-
-  <p>The table used by the above algorithm is:</p>
-
-  <table>
-   <thead>
-    <tr>
-     <th colspan="2">Bytes in Hexadecimal
-     <th rowspan="2">Sniffed type
-     <th rowspan="2">Security
-     <th rowspan="2">Comment
-    <tr>
-     <th>Mask
-     <th>Pattern
-   <tbody>
-    <tr>
-     <td>FF FF DF DF DF DF DF DF DF FF DF DF DF DF
-     <td>3C 21 44 4F 43 54 59 50 45 20 48 54 4D 4C <!-- "<!DOCTYPE HTML" --> <!-- common in static data -->
-     <td>text/html
-     <td>Scriptable
-     <td>The string "<code title=""><!DOCTYPE HTML</code>" in US-ASCII or compatible encodings, case-insensitively.
-    <tr>
-     <td>FF FF DF DF DF DF
-     <td><em>WS</em> 3C 48 54 4D 4C <!-- "<HTML" --> <!-- common in static data -->
-     <td>text/html
-     <td>Scriptable
-     <td>The string "<code title=""><HTML</code>" in US-ASCII or compatible encodings, case-insensitively, possibly with leading spaces.
-    <tr>
-     <td>FF FF DF DF DF DF
-     <td><em>WS</em> 3C 48 45 41 44 <!-- "<HEAD" --> <!-- common in static data -->
-     <td>text/html
-     <td>Scriptable
-     <td>The string "<code title=""><HEAD</code>" in US-ASCII or compatible encodings, case-insensitively, possibly with leading spaces.
-    <tr>
-     <td>FF FF DF DF DF DF DF DF
-     <td><em>WS</em> 3C 53 43 52 49 50 54 <!-- "<SCRIPT" --> <!-- common in dynamic data -->
-     <td>text/html
-     <td>Scriptable
-     <td>The string "<code title=""><SCRIPT</code>" in US-ASCII or compatible encodings, case-insensitively, possibly with leading spaces.
-    <tr>
-     <td>FF FF FF FF FF
-     <td>25 50 44 46 2D <!-- "%PDF-" (from http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#321) -->
-     <td>application/pdf
-     <td>Scriptable
-     <td>The string "<code title="">%PDF-</code>", the PDF signature.
-    <tr>
-     <td>FF FF FF FF FF FF FF FF FF FF FF
-     <td>25 21 50 53 2D 41 64 6F 62 65 2D <!-- "%!PS-Adobe-" (from http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#321) -->
-     <td>application/postscript
-     <td>Safe
-     <td>The string "<code title="">%!PS-Adobe-</code>", the PostScript signature.
-
-   <!-- copied from the text or binary section above -->
-   <tbody>
-    <tr>
-     <td>FF FF 00 00
-     <td>FE FF 00 00
-     <td>text/plain
-     <td>n/a
-     <td>UTF-16BE BOM <!-- followed by at least one character -->
-    <tr>
-     <td>FF FF 00 00
-     <td>FF FF 00 00
-     <td>text/plain
-     <td>n/a
-     <td>UTF-16LE BOM <!-- followed by at least one character -->
-    <tr>
-     <td>FF FF FF 00
-     <td>EF BB BF 00
-     <td>text/plain
-     <td>n/a
-     <td>UTF-8 BOM <!-- followed by at least one character -->
-
-   <!-- based on the table in the image section below -->
-   <tbody>
-    <tr>
-     <td>FF FF FF FF FF FF
-     <td>47 49 46 38 37 61 <!-- GIF87a -->
-     <td>image/gif
-     <td>Safe
-     <td>The string "<code title="">GIF87a</code>", a GIF signature.
-    <tr>
-     <td>FF FF FF FF FF FF
-     <td>47 49 46 38 39 61 <!-- GIF89a -->
-     <td>image/gif
-     <td>Safe
-     <td>The string "<code title="">GIF89a</code>", a GIF signature.
-    <tr>
-     <td>FF FF FF FF FF FF FF FF
-     <td>89 50 4E 47 0D 0A 1A 0A <!-- [TAB]PNG[CR][LF][EOF][LF]; 137 80 78 71 13 10 26 10 -->
-     <td>image/png
-     <td>Safe
-     <td>The PNG signature.
-    <tr>
-     <td>FF FF FF
-     <td>FF D8 FF <!-- SOI marker followed by the first byte of another marker -->
-     <td>image/jpeg
-     <td>Safe
-     <td>A JPEG SOI marker followed by the first byte of another marker.
-    <tr>
-     <td>FF FF
-     <td>42 4D
-     <td>image/bmp
-     <td>Safe
-     <td>The string "<code title="">BM</code>", a BMP signature.
-    <tr>
-     <td>FF FF FF FF
-     <td>00 00 01 00
-     <td>image/vnd.microsoft.icon
-     <td>Safe
-     <td>A 0 word following by a 1 word, a Windows Icon file format signature.
-
-  </table>
-
-  <p class="XXX">I'd like to add types like MPEG, AVI, Flash,
-  Java, etc, to the above table.</p>
-
-  <p>User agents may support further types if desired, by implicitly
-  adding to the above table. However, user agents should not use any
-  other patterns for types already mentioned in the table above, as
-  this could then be used for privilege escalation (where, e.g., a
-  server uses the above table to determine that content is not HTML
-  and thus safe from XSS attacks, but then a user agent detects it as
-  HTML anyway and allows script to execute).</p>
-
-  <p>The column marked "security" is used by the algorithm in the
-  "text or binary" section, to avoid sniffing <code
-  title="">text/plain</code> content as a type that can be used for a
-  privilege escalation attack.</p>
-
-
-  <h4><dfn>Content-Type sniffing: image</dfn></h4>
-
-  <p>If the resource's <var title="">official type</var> is
-  "image/svg+xml", then the sniffed type of the resource is its <var
-  title="">official type</var> (an XML type).</p>
-
-  <p>Otherwise, if the first bytes of the resource match one of the
-  byte sequences in the first column of the following table, then the
-  sniffed type of the resource is the type given in the corresponding
-  cell in the second column on the same row:</p>
-
-  <table>
-   <thead>
-    <tr>
-     <th>Bytes in Hexadecimal
-     <th>Sniffed type
-     <th>Comment
-
-   <!-- update the table above if you change this! -->
-   <tbody>
-    <tr>
-     <td>47 49 46 38 37 61 <!-- GIF87a -->
-     <td>image/gif
-     <td>The string "<code title="">GIF87a</code>", a GIF signature.
-    <tr>
-     <td>47 49 46 38 39 61 <!-- GIF89a -->
-     <td>image/gif
-     <td>The string "<code title="">GIF89a</code>", a GIF signature.
-    <tr>
-     <td>89 50 4E 47 0D 0A 1A 0A <!-- [TAB]PNG[CR][LF][EOF][LF]; 137 80 78 71 13 10 26 10 -->
-     <td>image/png
-     <td>The PNG signature.
-    <tr>
-     <td>FF D8 FF <!-- SOI marker followed by the first byte of another marker -->
-     <td>image/jpeg
-     <td>A JPEG SOI marker followed by the first byte of another marker.
-    <tr>
-     <td>42 4D
-     <td>image/bmp
-     <td>The string "<code title="">BM</code>", a BMP signature.
-    <tr>
-     <td>00 00 01 00
-     <td>image/vnd.microsoft.icon
-     <td>A 0 word following by a 1 word, a Windows Icon file format signature.
-    <!-- XXX Mozilla also detects ART (AOL proprietary format) and Windows Cursor files -->
-  </table>
-
-  <p>Otherwise, the sniffed type of the resource is the same as
-  its <var title="">official type</var>.</p>
-
-
-  <h4><dfn>Content-Type sniffing: feed or HTML</dfn></h4>
-  <!-- mostly based on:
-   http://blogs.msdn.com/rssteam/articles/PublishersGuide.aspx
-   http://lxr.mozilla.org/seamonkey/source/browser/components/feeds/src/nsFeedSniffer.cpp#192
-   http://lxr.mozilla.org/seamonkey/source/browser/components/feeds/src/nsFeedSniffer.cpp#127
-  -->
-
-  <ol>
-
-   <li><p>The user agent may wait for 512 or more bytes of the
-   resource to be available.</p></li>
-
-   <li><p>Let <var title="">s</var> be the stream of bytes, and let
-   <span title=""><var title="">s</var>[<var title="">i</var>]</span>
-   represent the byte in <var title="">s</var> with position <var
-   title="">i</var>, treating <var title="">s</var> as zero-indexed
-   (so the first byte is at <span title=""><var
-   title="">i</var>=0</span>).</p></li>
-
-   <li><p>If at any point this algorithm requires the user agent to
-   determine the value of a byte in <var title="">s</var> which is not
-   yet available, or which is past the first 512 bytes of the
-   resource, or which is beyond the end of the resource, the user
-   agent must stop this algorithm, and assume that the sniffed type of
-   the resource is "text/html".</p>
-
-   <p class="note">User agents are allowed, by the first step of this
-   algorithm, to wait until the first 512 bytes of the resource are
-   available.</p></li>
-
-   <li><p>Initialize <var title="">pos</var> to 0.</p></li>
-
-   <li><p>If <span title=""><var title="">s</var>[0]</span> is 0xEF,
-   <span title=""><var title="">s</var>[1]</span> is 0xBB, and <span title=""><var
-   title="">s</var>[2]</span> is 0xBF, then set <var
-   title="">pos</var> to 3. (This skips over a leading UTF-8 BOM, if
-   any.)</p></li>
-
-   <li><p><i>Loop start:</i> Examine <span title=""><var title="">s</var>[<var
-   title="">pos</var>]</span>.</p>
-
-   <dl class="switch">
-
-    <!-- skip whitespace (S token as defined in XML 1.0 section 2.3; production [3] -->
-    <dt>If it is 0x09 (ASCII tab), 0x20 (ASCII space), 0x0A (ASCII LF), or 0x0D (ASCII CR)</dt>
-    <dd>Increase <var title="">pos</var> by 1 and repeat this step.</dd>
-
-    <dt>If it is 0x3C (ASCII "<code title=""><</code>")</dt>
-    <dd>Increase <var title="">pos</var> by 1 and go to the next step.</dd>
-
-    <dt>If it is anything else</dt>
-    <dd>The sniffed type of the resource is "text/html". Abort these
-    steps.</dd>
-
-   </dl>
-
-   </li>
-
-   <li><p>If the bytes with positions <var title="">pos</var> to
-   <span title=""><var title="">pos</var>+2</span> in <var title="">s</var> are
-   exactly equal to 0x21, 0x2D, 0x2D respectively (ASCII for "<code
-   title="">!--</code>"), then:</p>
-
-    <ol>
-
-     <li>Increase <var title="">pos</var> by 3.</li> <!-- skips past the " ! - - " -->
-
-     <li>If the bytes with positions <span title=""><var
-     title="">pos</var></span> to <span title=""><var
-     title="">pos</var>+2</span> in <var title="">s</var> are exactly
-     equal to 0x2D, 0x2D, 0x3E respectively (ASCII for "<code
-     title="">--></code>"), then increase <var title="">pos</var>
-     by 3 and jump back to the previous step (the step labeled
-     <i>loop start</i>) in the overall algorithm in this section.</li>
-
-     <li>Otherwise, increase <var title="">pos</var> by 1.</li>
-
-     <li>Return to step 2 in these substeps.</li>
-
-    </ol>
-
-   </li>
-
-   <li><p>If <span title=""><var title="">s</var>[<var
-   title="">pos</var>]</span> is 0x21 (ASCII "<code
-   title="">!</code>"):</p>
-
-    <!-- this skips past a DOCTYPE if there is one. It is brain-dead
-    because we don't have to be clever to parse the Atom and RSS x.y
-    DOCTYPEs, as they don't do anything clever like have internal
-    subsets or quoted ">" characters. If this fails, then that's ok,
-    we'll treat it as HTML which is fine since we know it's not a feed
-    in that case. -->
-
-    <ol>
-
-     <li>Increase <var title="">pos</var> by 1.</li>
-
-     <li>If <span title=""><var title="">s</var>[<var
-     title="">pos</var>]</span> equal 0x3E, then increase <var
-     title="">pos</var> by 1 and jump back to the step labeled
-     <i>loop start</i> in the overall algorithm in this section.</li>
-
-     <li>Otherwise, return to step 1 in these substeps.</li>
-
-    </ol>
-
-   </li>
-
-   <li><p>If <span title=""><var title="">s</var>[<var
-   title="">pos</var>]</span> is 0x3F (ASCII "<code
-   title="">?</code>"):</p>
-
-    <ol>
-
-     <li>Increase <var title="">pos</var> by 1.</li>
-
-     <li>If <span title=""><var title="">s</var>[<var
-     title="">pos</var>]</span> and <span title=""><var
-     title="">s</var>[<var title="">pos</var>+1]</span> equal 0x3F and
-     0x3E respectively, then increase <var title="">pos</var> by 1 and
-     jump back to the step labeled <i>loop start</i> in the overall
-     algorithm in this section.</li>
-
-     <li>Otherwise, return to step 1 in these substeps.</li>
-
-    </ol>
-
-   </li>
-
-   <li><p>Otherwise, if the bytes in <var title="">s</var> starting at
-   <var title="">pos</var> match any of the sequences of bytes in the
-   first column of the following table, then the user agent must
-   follow the steps given in the corresponding cell in the second
-   column of the same row.</p>
-
-    <table>
-     <thead>
-      <tr>
-       <th>Bytes in Hexadecimal
-       <th>Requirement
-       <th>Comment
-
-     <tbody>
-      <tr>
-       <td>72 73 73
-       <td>The sniffed type of the resource is "application/rss+xml"; abort these steps
-       <td>The three ASCII characters "<code title="">rss</code>"
-      <tr>
-       <td>66 65 65 64
-       <td>The sniffed type of the resource is "application/atom+xml"; abort these steps
-       <td>The four ASCII characters "<code title="">feed</code>"
-      <tr>
-       <td>72 64 66 3A 52 44 46
-       <td>Continue to the next step in this algorithm
-       <td>The ASCII characters "<code title="">rdf:RDF</code>"
-    </table>
-
-    <p>If none of the byte sequences above match the bytes in <var
-    title="">s</var> starting at <var title="">pos</var>, then the
-    sniffed type of the resource is "text/html". Abort these
-    steps.</p>
-
-   </li>
-
-   <li><p class="XXX">If, before the next ">", you find two
-   xmlns* attributes with http://www.w3.org/1999/02/22-rdf-syntax-ns#
-   and http://purl.org/rss/1.0/ as the namespaces, then the sniffed
-   type of the resource is "application/rss+xml", abort these
-   steps. (maybe we only need to check for http://purl.org/rss/1.0/
-   actually)</p></li>
-
-   <li><p>Otherwise, the sniffed type of the resource is
-   "text/html".</p></li>
-
-  </ol>
-
-  <p class="note">For efficiency reasons, implementations may wish to
-  implement this algorithm and the algorithm for detecting the
-  character encoding of HTML documents in parallel.</p>
-
   </div>
 
+
   <div class="impl">
 
   <h3>Character encodings</h3>




More information about the Commit-Watchers mailing list