[html5] r3882 - [e] (0) Mention and encourage UTF-8 detection specifically.

whatwg at whatwg.org whatwg at whatwg.org
Thu Sep 17 15:42:13 PDT 2009


Author: ianh
Date: 2009-09-17 15:42:12 -0700 (Thu, 17 Sep 2009)
New Revision: 3882

Modified:
   entities-unicode.inc
   index
   source
Log:
[e] (0) Mention and encourage UTF-8 detection specifically.

/home/ianh/svn/webapps/hooks/commit-email.pl: `/usr/bin/svnlook diff /home/ianh/svn/webapps -r 3882' failed with this output:
Modified: entities-unicode.inc
===================================================================
--- entities-unicode.inc	2009-09-17 10:38:01 UTC (rev 3881)
+++ entities-unicode.inc	2009-09-17 22:42:12 UTC (rev 3882)
@@ -1661,7 +1661,6 @@
      <tr> <td> <code title="">rtriltri;</code> </td> <td> U+029CE </td> </tr>
      <tr> <td> <code title="">LeftTriangleBar;</code> </td> <td> U+029CF </td> </tr>
      <tr> <td> <code title="">RightTriangleBar;</code> </td> <td> U+029D0 </td> </tr>
-     <tr> <td> <code title="">race;</code> </td> <td> U+029DA </td> </tr>
      <tr> <td> <code title="">iinfin;</code> </td> <td> U+029DC </td> </tr>
      <tr> <td> <code title="">infintie;</code> </td> <td> U+029DD </td> </tr>
      <tr> <td> <code title="">nvinfin;</code> </td> <td> U+029DE </td> </tr>

Modified: index
===================================================================
--- index	2009-09-17 10:38:01 UTC (rev 3881)
+++ index	2009-09-17 22:42:12 UTC (rev 3882)
@@ -7348,6 +7348,7 @@
   purpose. Authors must not use elements, attributes, and attribute
   values that are not permitted by this specification or other
   applicable specifications.</p>
+  <!-- http://www.w3.org/mid/17E341CD-E790-422C-9F9A-69347EE01CEB@iki.fi -->
 
   <div class=example>
    <p>For example, the following document is non-conforming, despite
@@ -62031,12 +62032,23 @@
    visited, then return that encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
    <i>tentative</i>, and abort these steps.</li>
 
-   <li><p>The user agent may attempt to autodetect the character
-   encoding from applying frequency analysis or other algorithms to
-   the data stream. If autodetection succeeds in determining a
-   character encoding, then return that encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
-   <i>tentative</i>, and abort these steps. <a href=#refsUNIVCHARDET>[UNIVCHARDET]</a></li>
+   <li>
 
+    <p>The user agent may attempt to autodetect the character encoding
+    from applying frequency analysis or other algorithms to the data
+    stream. If autodetection succeeds in determining a character
+    encoding, then return that encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+    <i>tentative</i>, and abort these steps. <a href=#refsUNIVCHARDET>[UNIVCHARDET]</a></p>
+
+    <p class=note>The UTF-8 encoding has a highly detectable bit
+    pattern. Documents that contain bytes with values greater than
+    0x7F which match the UTF-8 pattern are very likely to be UTF-8,
+    while documents with byte sequences that do not match it are very
+    likely not. User-agents are therefore encouraged to search for
+    this common encoding.</p>
+
+   </li>
+
    <li><p>Otherwise, return an implementation-defined or
    user-specified default character encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
    <i>tentative</i>. In non-legacy environments, the more
@@ -70270,7 +70282,6 @@
      <tr> <td> <code title="">rAtail;</code> </td> <td> U+0291C </td> </tr>
      <tr> <td> <code title="">rBarr;</code> </td> <td> U+0290F </td> </tr>
      <tr> <td> <code title="">rHar;</code> </td> <td> U+02964 </td> </tr>
-     <tr> <td> <code title="">race;</code> </td> <td> U+029DA </td> </tr>
      <tr> <td> <code title="">racute;</code> </td> <td> U+00155 </td> </tr>
      <tr> <td> <code title="">radic;</code> </td> <td> U+0221A </td> </tr>
      <tr> <td> <code title="">raemptyv;</code> </td> <td> U+029B3 </td> </tr>

Modified: source
===================================================================



More information about the Commit-Watchers mailing list