[html5] r8722 - [e] (0) Adjust notes on encoding detection Fixing https://www.w3.org/Bugs/Public [...]
whatwg at whatwg.org
whatwg at whatwg.org
Wed Aug 27 16:12:40 PDT 2014
Author: ianh
Date: 2014-08-27 16:12:36 -0700 (Wed, 27 Aug 2014)
New Revision: 8722
Modified:
complete.html
index
source
Log:
[e] (0) Adjust notes on encoding detection
Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=25534
Affected topics: HTML Syntax and Parsing
Modified: complete.html
===================================================================
--- complete.html 2014-08-27 00:03:22 UTC (rev 8721)
+++ complete.html 2014-08-27 23:12:36 UTC (rev 8722)
@@ -291,7 +291,7 @@
</style><link rel=stylesheet href=status.css><body onload=init()>
<header id=head class="head with-buttons">
<p><a href=//www.whatwg.org/ class=logo><img src=/images/logo width=101 alt=WHATWG height=101></a></p>
- <hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>26 August 2014</span></h2></hgroup>
+ <hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>27 August 2014</span></h2></hgroup>
<nav>
<div>
@@ -71079,11 +71079,18 @@
encoding, then return that encoding, with the <a href=#concept-encoding-confidence id=determining-the-character-encoding:concept-encoding-confidence-8>confidence</a> <i>tentative</i>, and abort these steps.
<a href=#refsUNIVCHARDET>[UNIVCHARDET]</a></p>
- <p class=note>The UTF-8 encoding has a highly detectable bit pattern. Documents that contain
- bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
- while documents with byte sequences that do not match it are very likely not. User-agents are
- therefore encouraged to search for this common encoding. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>
+ <p class=note>User agents are generally discouraged from attempting to autodetect encodings
+ for resources obtained over the network, since doing so involves inherently non-interoperable
+ heuristics. Attempting to detect encodings based on an HTML document's preamble is especially
+ tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin
+ with a lot of markup rather than with text content.</p>
+ <p class=note>The UTF-8 encoding has a highly detectable bit pattern. Files from the local
+ file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are
+ very likely to be UTF-8, while documents with byte sequences that do not match it are very
+ likely not. When a user agent can examine the whole file, rather than just the preamble,
+ detecting for UTF-8 specifically can be especially effective. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>
+
<li>
<p>Otherwise, return an implementation-defined or user-specified default character encoding,
Modified: index
===================================================================
--- index 2014-08-27 00:03:22 UTC (rev 8721)
+++ index 2014-08-27 23:12:36 UTC (rev 8722)
@@ -291,7 +291,7 @@
</style><link rel=stylesheet href=status.css><body onload=init()>
<header id=head class="head with-buttons">
<p><a href=//www.whatwg.org/ class=logo><img src=/images/logo width=101 alt=WHATWG height=101></a></p>
- <hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>26 August 2014</span></h2></hgroup>
+ <hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>27 August 2014</span></h2></hgroup>
<nav>
<div>
@@ -71079,11 +71079,18 @@
encoding, then return that encoding, with the <a href=#concept-encoding-confidence id=determining-the-character-encoding:concept-encoding-confidence-8>confidence</a> <i>tentative</i>, and abort these steps.
<a href=#refsUNIVCHARDET>[UNIVCHARDET]</a></p>
- <p class=note>The UTF-8 encoding has a highly detectable bit pattern. Documents that contain
- bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
- while documents with byte sequences that do not match it are very likely not. User-agents are
- therefore encouraged to search for this common encoding. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>
+ <p class=note>User agents are generally discouraged from attempting to autodetect encodings
+ for resources obtained over the network, since doing so involves inherently non-interoperable
+ heuristics. Attempting to detect encodings based on an HTML document's preamble is especially
+ tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin
+ with a lot of markup rather than with text content.</p>
+ <p class=note>The UTF-8 encoding has a highly detectable bit pattern. Files from the local
+ file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are
+ very likely to be UTF-8, while documents with byte sequences that do not match it are very
+ likely not. When a user agent can examine the whole file, rather than just the preamble,
+ detecting for UTF-8 specifically can be especially effective. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>
+
<li>
<p>Otherwise, return an implementation-defined or user-specified default character encoding,
Modified: source
===================================================================
--- source 2014-08-27 00:03:22 UTC (rev 8721)
+++ source 2014-08-27 23:12:36 UTC (rev 8722)
@@ -95700,11 +95700,19 @@
data-x="concept-encoding-confidence">confidence</span> <i>tentative</i>, and abort these steps.
<ref spec=UNIVCHARDET></p>
- <p class="note">The UTF-8 encoding has a highly detectable bit pattern. Documents that contain
- bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
- while documents with byte sequences that do not match it are very likely not. User-agents are
- therefore encouraged to search for this common encoding. <ref spec=PPUTF8> <ref spec=UTF8DET></p>
+ <p class="note">User agents are generally discouraged from attempting to autodetect encodings
+ for resources obtained over the network, since doing so involves inherently non-interoperable
+ heuristics. Attempting to detect encodings based on an HTML document's preamble is especially
+ tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin
+ with a lot of markup rather than with text content.</p>
+ <p class="note">The UTF-8 encoding has a highly detectable bit pattern. Files from the local
+ file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are
+ very likely to be UTF-8, while documents with byte sequences that do not match it are very
+ likely not. When a user agent can examine the whole file, rather than just the preamble,
+ detecting for UTF-8 specifically can be especially effective. <ref spec=PPUTF8> <ref
+ spec=UTF8DET></p>
+
</li>
<li>
More information about the Commit-Watchers
mailing list