[html5] r2861 - [c] (0) Reword how we require that XML documents that use <meta charset> must us [...]

whatwg at whatwg.org whatwg at whatwg.org
Mon Feb 23 04:57:55 PST 2009


Author: ianh
Date: 2009-02-23 04:57:55 -0800 (Mon, 23 Feb 2009)
New Revision: 2861

Modified:
   index
   source
Log:
[c] (0) Reword how we require that XML documents that use <meta charset> must use UTF-8. Also require it in the first 512 bytes.

Modified: index
===================================================================
--- index	2009-02-23 12:26:47 UTC (rev 2860)
+++ index	2009-02-23 12:57:55 UTC (rev 2861)
@@ -9145,14 +9145,18 @@
   also be specified. Otherwise, it must be omitted.</p>
 
   <p>The <dfn id=attr-meta-charset title=attr-meta-charset><code>charset</code></dfn>
-  attribute specifies the character encoding used by the document. In
-  <a href=#html5 title=HTML5>HTML documents</a> this is a <a href=#character-encoding-declaration>character
-  encoding declaration</a>. If the attribute is present in an <a href=#xhtml5 title=XHTML>XML document</a>, its value must be an <a href=#ascii-case-insensitive>ASCII
-  case-insensitive</a> match for the string "<code title="">UTF-8</code>", and the resource must be encoded using the
-  UTF-8 character encoding. (The element has no effect in XML
-  documents, and is only allowed to facilitate migration to and from
-  XHTML.)</p>
+  attribute specifies the character encoding used by the
+  document. This is a <a href=#character-encoding-declaration>character encoding declaration</a>. If
+  the attribute is present in an <a href=#xhtml5 title=XHTML>XML
+  document</a>, its value must be an <a href=#ascii-case-insensitive>ASCII
+  case-insensitive</a> match for the string "<code title="">UTF-8</code>" (and the document is therefore required to
+  use UTF-8 as its encoding).</p>
 
+  <p class=note>The <code title=attr-meta-charset><a href=#attr-meta-charset>charset</a></code>
+  attribute on the <code><a href=#meta>meta</a></code> element has no effect in XML
+  documents, and is only allowed in order to facilitate migration to
+  and from XHTML.</p>
+
   <p>There must not be more than one <code><a href=#meta>meta</a></code> element with a
   <code title=attr-meta-charset><a href=#attr-meta-charset>charset</a></code> attribute per
   document.</p>
@@ -9645,7 +9649,9 @@
 
   <!-- XXX maybe the rest should move to "writing html" section,
   though if we do then we have to duplicate the requirements in the
-  parsing section for conformance checkers -->
+  parsing section for conformance checkers, and we have to make sure
+  that the requirements for charset="" apply even in XML, for the
+   polyglot hack -->
 
   <p>A <dfn id=character-encoding-declaration>character encoding declaration</dfn> is a mechanism by
   which the character encoding used to store or transmit a document is
@@ -9669,16 +9675,18 @@
    declaration must be serialised completely within the first 512
    bytes of the document.</li>
 
-  </ul><p>If the document does not start with a BOM, and if its encoding is
-  not explicitly given by <a href=#content-type-0 title=Content-Type>Content-Type
-  metadata</a>, then the character encoding used must be an
-  <a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a>, and, in addition,
-  if that encoding isn't US-ASCII itself, then the encoding must be
-  specified using a <code><a href=#meta>meta</a></code> element with a <code title=attr-meta-charset><a href=#attr-meta-charset>charset</a></code> attribute or a
+  </ul><p>If an <a href=#html-documents title="HTML documents">HTML document</a> does not
+  start with a BOM, and if its encoding is not explicitly given by
+  <a href=#content-type-0 title=Content-Type>Content-Type metadata</a>, then the
+  character encoding used must be an <a href=#ascii-compatible-character-encoding>ASCII-compatible character
+  encoding</a>, and, in addition, if that encoding isn't US-ASCII
+  itself, then the encoding must be specified using a
+  <code><a href=#meta>meta</a></code> element with a <code title=attr-meta-charset><a href=#attr-meta-charset>charset</a></code> attribute or a
   <code><a href=#meta>meta</a></code> element in the <a href=#attr-meta-http-equiv-content-type title=attr-meta-http-equiv-content-type>Encoding declaration
   state</a>.</p>
 
-  <p>If the document contains a <code><a href=#meta>meta</a></code> element with a <code title=attr-meta-charset><a href=#attr-meta-charset>charset</a></code> attribute or a
+  <p>If an <a href=#html-documents title="HTML documents">HTML document</a> contains
+  a <code><a href=#meta>meta</a></code> element with a <code title=attr-meta-charset><a href=#attr-meta-charset>charset</a></code> attribute or a
   <code><a href=#meta>meta</a></code> element in the <a href=#attr-meta-http-equiv-content-type title=attr-meta-http-equiv-content-type>Encoding declaration
   state</a>, then the character encoding used must be an
   <a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a>.</p>

Modified: source
===================================================================
--- source	2009-02-23 12:26:47 UTC (rev 2860)
+++ source	2009-02-23 12:57:55 UTC (rev 2861)
@@ -9488,16 +9488,19 @@
   also be specified. Otherwise, it must be omitted.</p>
 
   <p>The <dfn title="attr-meta-charset"><code>charset</code></dfn>
-  attribute specifies the character encoding used by the document. In
-  <span title="HTML5">HTML documents</span> this is a <span>character
-  encoding declaration</span>. If the attribute is present in an <span
-  title="XHTML">XML document</span>, its value must be an <span>ASCII
+  attribute specifies the character encoding used by the
+  document. This is a <span>character encoding declaration</span>. If
+  the attribute is present in an <span title="XHTML">XML
+  document</span>, its value must be an <span>ASCII
   case-insensitive</span> match for the string "<code
-  title="">UTF-8</code>", and the resource must be encoded using the
-  UTF-8 character encoding. (The element has no effect in XML
-  documents, and is only allowed to facilitate migration to and from
-  XHTML.)</p>
+  title="">UTF-8</code>" (and the document is therefore required to
+  use UTF-8 as its encoding).</p>
 
+  <p class="note">The <code title="attr-meta-charset">charset</code>
+  attribute on the <code>meta</code> element has no effect in XML
+  documents, and is only allowed in order to facilitate migration to
+  and from XHTML.</p>
+
   <p>There must not be more than one <code>meta</code> element with a
   <code title="attr-meta-charset">charset</code> attribute per
   document.</p>
@@ -10081,7 +10084,9 @@
 
   <!-- XXX maybe the rest should move to "writing html" section,
   though if we do then we have to duplicate the requirements in the
-  parsing section for conformance checkers -->
+  parsing section for conformance checkers, and we have to make sure
+  that the requirements for charset="" apply even in XML, for the
+  <meta charset=""> polyglot hack -->
 
   <p>A <dfn>character encoding declaration</dfn> is a mechanism by
   which the character encoding used to store or transmit a document is
@@ -10110,18 +10115,20 @@
 
   </ul>
 
-  <p>If the document does not start with a BOM, and if its encoding is
-  not explicitly given by <span title="Content-Type">Content-Type
-  metadata</span>, then the character encoding used must be an
-  <span>ASCII-compatible character encoding</span>, and, in addition,
-  if that encoding isn't US-ASCII itself, then the encoding must be
-  specified using a <code>meta</code> element with a <code
+  <p>If an <span title="HTML documents">HTML document</span> does not
+  start with a BOM, and if its encoding is not explicitly given by
+  <span title="Content-Type">Content-Type metadata</span>, then the
+  character encoding used must be an <span>ASCII-compatible character
+  encoding</span>, and, in addition, if that encoding isn't US-ASCII
+  itself, then the encoding must be specified using a
+  <code>meta</code> element with a <code
   title="attr-meta-charset">charset</code> attribute or a
   <code>meta</code> element in the <span
   title="attr-meta-http-equiv-content-type">Encoding declaration
   state</span>.</p>
 
-  <p>If the document contains a <code>meta</code> element with a <code
+  <p>If an <span title="HTML documents">HTML document</span> contains
+  a <code>meta</code> element with a <code
   title="attr-meta-charset">charset</code> attribute or a
   <code>meta</code> element in the <span
   title="attr-meta-http-equiv-content-type">Encoding declaration




More information about the Commit-Watchers mailing list