[html5] r1460 - /

whatwg at whatwg.org whatwg at whatwg.org
Fri Apr 18 15:34:06 PDT 2008


Author: ianh
Date: 2008-04-18 15:34:05 -0700 (Fri, 18 Apr 2008)
New Revision: 1460

Modified:
   index
   source
Log:
[] (0) Define document.charset, .characterSet, .defaultCharset

Modified: index
===================================================================
--- index	2008-04-17 23:50:22 UTC (rev 1459)
+++ index	2008-04-18 22:34:05 UTC (rev 1460)
@@ -24,7 +24,7 @@
 
    <h1 id=html-5>HTML 5</h1>
 
-   <h2 class="no-num no-toc" id=working>Working Draft — 17 April 2008</h2>
+   <h2 class="no-num no-toc" id=working>Working Draft — 18 April 2008</h2>
 
    <p>You can take part in this work. <a
     href="http://www.whatwg.org/mailing-list">Join the working group's
@@ -2598,6 +2598,9 @@
            attribute DOMString <a href="#cookie0" title=dom-document-cookie>cookie</a>;
   readonly attribute DOMString <a href="#lastmodified" title=dom-document-lastModified>lastModified</a>;
   readonly attribute DOMString <a href="#compatmode" title=dom-document-compatMode>compatMode</a>;
+           attribute DOMString <a href="#charset0" title=dom-document-charset>charset</a>;
+  readonly attribute DOMString <a href="#characterset" title=dom-document-characterSet>characterSet</a>;
+  readonly attribute DOMString <a href="#defaultcharset" title=dom-document-defaultCharset>defaultCharset</a>;
 
   // <a href="#dom-tree0">DOM tree accessors</a>
            attribute DOMString <a href="#document.title" title=dom-document-title>title</a>;
@@ -2642,9 +2645,6 @@
   DOMString <a href="#querycommandvalue" title=dom-document-queryCommandValue>queryCommandValue</a>(in DOMString commandId);
   <a href="#selection1">Selection</a> <a href="#getselection0" title=dom-document-getSelection>getSelection</a>();
 <!-- XXX we're not done here.
-          attribute DOMString charset;
- readonly attribute DOMString defaultCharset;
- readonly attribute DOMString characterSet;
  readonly attribute DOMString readyState;
  readonly attribute HTMLCollection scripts;
 -->
@@ -2806,6 +2806,35 @@
    </ul>
   </div>
 
+  <p>Documents have an associated <dfn id=character1 title="document's
+   character encoding">character encoding</dfn>. When a <code>Document</code>
+   object is created, the <a href="#character1">document's character
+   encoding</a> must be initialised to UTF-16. Various algorithms during page
+   loading affect this value, as does the <code title=dom-document-charset><a
+   href="#charset0">charset</a></code> setter. <a
+   href="#refsIANACHARSET">[IANACHARSET]</a> <!-- XXX
+  http://www.iana.org/assignments/character-sets -->
+
+  <p>The <dfn id=charset0
+   title=dom-document-charset><code>charset</code></dfn> DOM attribute must,
+   on getting, return the preferred MIME name of the <a
+   href="#character1">document's character encoding</a>. On setting, if the
+   new value is an IANA-registered alias for a character encoding, the <a
+   href="#character1">document's character encoding</a> must be set to that
+   character encoding. (Otherwise, nothing happens.)
+
+  <p>The <dfn id=characterset
+   title=dom-document-characterSet><code>characterSet</code></dfn> DOM
+   attribute must, on getting, return the preferred MIME name of the <a
+   href="#character1">document's character encoding</a>.
+
+  <p>The <dfn id=defaultcharset
+   title=dom-document-defaultCharset><code>defaultCharset</code></dfn> DOM
+   attribute must, on getting, return the preferred MIME name of a character
+   encoding, possibly the user's default encoding, or an encoding associated
+   with the user's current geographical location, or any arbitrary encoding
+   name.
+
   <h3 id=elements><span class=secno>2.2 </span>Elements</h3>
 
   <p>The nodes representing <a href="#html-elements">HTML elements</a> in the
@@ -7536,7 +7565,7 @@
    <dt>Contexts in which this element may be used:
 
    <dd>If the <code title=attr-meta-charset><a
-    href="#charset0">charset</a></code> attribute is present, or if the
+    href="#charset1">charset</a></code> attribute is present, or if the
     element is in the <a href="#encoding"
     title=attr-meta-http-equiv-content-type>Encoding declaraton state</a>: as
     the first element in a <code><a href="#head">head</a></code> element.
@@ -7571,7 +7600,7 @@
 
    <dd><code title=attr-meta-content><a href="#content0">content</a></code>
 
-   <dd><code title=attr-meta-charset><a href="#charset0">charset</a></code>
+   <dd><code title=attr-meta-charset><a href="#charset1">charset</a></code>
     (<a href="#html-" title="HTML documents">HTML</a> only)
 
    <dt>DOM interface:
@@ -7596,15 +7625,15 @@
    document-level metadata with the <code title=attr-meta-name><a
    href="#name">name</a></code> attribute, pragma directives with the <code
    title=attr-meta-http-equiv><a href="#http-equiv0">http-equiv</a></code>
-   attribute, and the file's <a href="#character1">character encoding
+   attribute, and the file's <a href="#character2">character encoding
    declaration</a> when an HTML document is serialised to string form (e.g.
    for transmission over the network or for disk storage) with the <code
-   title=attr-meta-charset><a href="#charset0">charset</a></code> attribute.
+   title=attr-meta-charset><a href="#charset1">charset</a></code> attribute.
 
   <p>Exactly one of the <code title=attr-meta-name><a
    href="#name">name</a></code>, <code title=attr-meta-http-equiv><a
    href="#http-equiv0">http-equiv</a></code>, and <code
-   title=attr-meta-charset><a href="#charset0">charset</a></code> attributes
+   title=attr-meta-charset><a href="#charset1">charset</a></code> attributes
    must be specified.
 
   <p>If either <code title=attr-meta-name><a href="#name">name</a></code> or
@@ -7613,15 +7642,15 @@
    title=attr-meta-content><a href="#content0">content</a></code> attribute
    must also be specified. Otherwise, it must be omitted.
 
-  <p>The <dfn id=charset0 title=attr-meta-charset><code>charset</code></dfn>
+  <p>The <dfn id=charset1 title=attr-meta-charset><code>charset</code></dfn>
    attribute specifies the character encoding used by the document. This is
-   called a <a href="#character1">character encoding declaration</a>.
+   called a <a href="#character2">character encoding declaration</a>.
 
-  <p>The <code title=attr-meta-charset><a href="#charset0">charset</a></code>
+  <p>The <code title=attr-meta-charset><a href="#charset1">charset</a></code>
    attribute may be specified in <a href="#html5" title=HTML5>HTML
    documents</a> only, it must not be used in <a href="#xhtml5"
    title=XHTML>XML documents</a>. If the <code title=attr-meta-charset><a
-   href="#charset0">charset</a></code> attribute is specified, the element
+   href="#charset1">charset</a></code> attribute is specified, the element
    must be the first element in <a href="#the-head0">the <code>head</code>
    element</a> of the file.
 
@@ -7892,7 +7921,7 @@
      user agent requirements are all handled by the parsing section of the
      specification. The state is just an alternative form of setting the
      <code title=meta-charset>charset</code> attribute: it is a <a
-     href="#character1">character encoding declaration</a>.</p>
+     href="#character2">character encoding declaration</a>.</p>
 
     <p>For <code><a href="#meta0">meta</a></code> elements in the <a
      href="#encoding" title=attr-meta-http-equiv-content-type>Encoding
@@ -7912,7 +7941,7 @@
      then that element must be the first element in the document's <code><a
      href="#head">head</a></code> element, and the document must not contain
      a <code><a href="#meta0">meta</a></code> element with the <code
-     title=attr-meta-charset><a href="#charset0">charset</a></code> attribute
+     title=attr-meta-charset><a href="#charset1">charset</a></code> attribute
      present.</p>
 
     <p>The <a href="#encoding"
@@ -8096,7 +8125,7 @@
   though if we do then we have to duplicate the requirements in the
   parsing section for conformance checkers -->
 
-  <p>A <dfn id=character1>character encoding declaration</dfn> is a mechanism
+  <p>A <dfn id=character2>character encoding declaration</dfn> is a mechanism
    by which the character encoding used to store or transmit a document is
    specified.
 
@@ -8127,7 +8156,7 @@
    and, in addition, if that encoding isn't US-ASCII itself, then the
    encoding must be specified using a <code><a href="#meta0">meta</a></code>
    element with a <code title=attr-meta-charset><a
-   href="#charset0">charset</a></code> attribute or a <code><a
+   href="#charset1">charset</a></code> attribute or a <code><a
    href="#meta0">meta</a></code> element in the <a href="#encoding"
    title=attr-meta-http-equiv-content-type>Encoding declaraton state</a>.
 
@@ -30279,7 +30308,9 @@
   <p>The actual HTTP headers and other metadata, not the headers as mutated
    or implied by the algorithms given in this specification, are the ones
    that must be used when determining the character encoding according to the
-   rules given in the above specifications.
+   rules given in the above specifications. Once the character encoding is
+   established, the <a href="#character1">document's character encoding</a>
+   must be set to that character encoding.
 
   <p>If the root element, as parsed according to the XML specifications cited
    above, is found to be an <code><a href="#html">html</a></code> element
@@ -30339,6 +30370,9 @@
    versions thereof. <a href="#refsRFC2046">[RFC2046]</a> <a
    href="#refsRFC2046">[RFC2646]</a>
 
+  <p>The <a href="#character1">document's character encoding</a> must be set
+   to the character encoding used to decode the document.
+
   <p>Upon creation of the <code>Document</code> object, the user agent must
    run the <a href="#application3"
    title=concept-appcache-init-no-attribute>application cache selection
@@ -38322,7 +38356,7 @@
    described below.
 
   <p>RCDATA elements can have <a href="#text1" title=syntax-text>text</a> and
-   <a href="#character2" title=syntax-entities>character entity
+   <a href="#character3" title=syntax-entities>character entity
    references</a>, but the text must not contain an <a href="#ambiguous"
    title=syntax-ambiguous-ampersand>ambiguous ampersand</a>. There are also
    <a href="#cdata-rcdata-restrictions">further restrictions</a> described
@@ -38332,7 +38366,7 @@
    any contents (since, again, as there's no end tag, no content can be put
    between the start tag and the end tag). Foreign elements whose start tag
    is <em>not</em> marked as self-closing can have <a href="#text1"
-   title=syntax-text>text</a>, <a href="#character2"
+   title=syntax-text>text</a>, <a href="#character3"
    title=syntax-entities>character entity references</a>, <a href="#cdata0"
    title=syntax-cdata>CDATA blocks</a>, other <a href="#elements2"
    title=syntax-elements>elements</a>, and <a href="#comments0"
@@ -38342,7 +38376,7 @@
    ampersand</a>.
 
   <p>Normal elements can have <a href="#text1" title=syntax-text>text</a>, <a
-   href="#character2" title=syntax-entities>character entity references</a>,
+   href="#character3" title=syntax-entities>character entity references</a>,
    other <a href="#elements2" title=syntax-elements>elements</a>, and <a
    href="#comments0" title=syntax-comments>comments</a>, but the text must
    not contain the character U+003C LESS-THAN SIGN (<code><</code>) or an
@@ -38438,7 +38472,7 @@
 
   <p><dfn id=attribute0 title=syntax-attribute-value>Attribute values</dfn>
    are a mixture of <a href="#text1" title=syntax-text>text</a> and <a
-   href="#character2" title=syntax-entities>character entity references</a>,
+   href="#character3" title=syntax-entities>character entity references</a>,
    except with the additional restriction that the text cannot contain an <a
    href="#ambiguous" title=syntax-ambiguous-ampersand>ambiguous
    ampersand</a>.
@@ -38818,7 +38852,7 @@
   <h4 id=character><span class=secno>8.1.4 </span>Character entity references</h4>
 
   <p>In certain cases described in other sections, <a href="#text1"
-   title=syntax-text>text</a> may be mixed with <dfn id=character2
+   title=syntax-text>text</a> may be mixed with <dfn id=character3
    title=syntax-entities>character entity references</dfn>. These can be used
    to escape characters that couldn't otherwise legally be included in <a
    href="#text1" title=syntax-text>text</a>.
@@ -39435,6 +39469,11 @@
      heuristically decide which to use as a default.
   </ol>
 
+  <p>The <a href="#character1">document's character encoding</a> must
+   immediately be set to the value returned from this algorithm, at the same
+   time as the user agent uses the returned value to select the decoder to
+   use for the input stream.
+
   <h5 id=character0><span class=secno>8.2.2.2. </span>Character encoding
    requirements</h5>
 
@@ -39566,9 +39605,11 @@
     have the same Unicode interpretations in both the current encoding and
     the new encoding, and if the user agent supports changing the converter
     on the fly, then the user agent may change to the new converter for the
-    encoding on the fly. Set the encoding to the new encoding, set the <a
-    href="#confidence" title=concept-encoding-confidence>confidence</a> to
-    <i>confident</i>, and abort these steps.
+    encoding on the fly. Set the <a href="#character1">document's character
+    encoding</a> and the encoding used to convert the input stream to the new
+    encoding, set the <a href="#confidence"
+    title=concept-encoding-confidence>confidence</a> to <i>confident</i>, and
+    abort these steps.
 
    <li>Otherwise, <a href="#navigate">navigate</a> to the document again,
     with <a href="#replacement">replacement enabled</a>, but this time skip
@@ -42752,16 +42793,16 @@
      set.</p>
 
     <p id=meta-charset-during-parse>If the element has a <code
-     title=attr-meta-charset><a href="#charset0">charset</a></code>
+     title=attr-meta-charset><a href="#charset1">charset</a></code>
      attribute, and its value is a supported encoding, and the <a
      href="#confidence" title=concept-encoding-confidence>confidence</a> is
      currently <i>tentative</i>, then <a href="#change">change the
      encoding</a> to the encoding given by the value of the <code
-     title=attr-meta-charset><a href="#charset0">charset</a></code>
+     title=attr-meta-charset><a href="#charset1">charset</a></code>
      attribute.</p>
 
     <p>Otherwise, if the element has a <code title=attr-meta-charset><a
-     href="#charset0">content</a></code> attribute, and applying the <a
+     href="#charset1">content</a></code> attribute, and applying the <a
      href="#algorithm4">algorithm for extracting an encoding from a
      Content-Type</a> to its value returns a supported encoding <var
      title="">encoding</var>, and the <a href="#confidence"
@@ -50029,7 +50070,6 @@
 
 
    Interaction with document.open/write/close is undefined
-   How to determine the character encoding
    Integration with quirks mode problems
    <style> parsing needs tweaking if we want to exactly match IE
    <base> parsing needs tweaking to handle multiple <base>s

Modified: source
===================================================================
--- source	2008-04-17 23:50:22 UTC (rev 1459)
+++ source	2008-04-18 22:34:05 UTC (rev 1460)
@@ -904,6 +904,9 @@
            attribute DOMString <span title="dom-document-cookie">cookie</span>;
   readonly attribute DOMString <span title="dom-document-lastModified">lastModified</span>;
   readonly attribute DOMString <span title="dom-document-compatMode">compatMode</span>;
+           attribute DOMString <span title="dom-document-charset">charset</span>;
+  readonly attribute DOMString <span title="dom-document-characterSet">characterSet</span>;
+  readonly attribute DOMString <span title="dom-document-defaultCharset">defaultCharset</span>;
 
   // <span>DOM tree accessors</span>
            attribute DOMString <span title="dom-document-title">title</span>;
@@ -948,9 +951,6 @@
   DOMString <span title="dom-document-queryCommandValue">queryCommandValue</span>(in DOMString commandId);
   <span>Selection</span> <span title="dom-document-getSelection">getSelection</span>();
 <!-- XXX we're not done here.
-          attribute DOMString charset;
- readonly attribute DOMString defaultCharset;
- readonly attribute DOMString characterSet;
  readonly attribute DOMString readyState;
  readonly attribute HTMLCollection scripts;
 -->
@@ -1127,6 +1127,36 @@
 
 
 
+  <p>Documents have an associated <dfn title="document's character
+  encoding">character encoding</dfn>. When a <code>Document</code>
+  object is created, the <span>document's character encoding</span>
+  must be initialised to UTF-16. Various algorithms during page
+  loading affect this value, as does the <code
+  title="dom-document-charset">charset</code> setter.  <a
+  href="#refsIANACHARSET">[IANACHARSET]</a> <!-- XXX
+  http://www.iana.org/assignments/character-sets --></p>
+
+  <p>The <dfn title="dom-document-charset"><code>charset</code></dfn>
+  DOM attribute must, on getting, return the preferred MIME name of
+  the <span>document's character encoding</span>. On setting, if the
+  new value is an IANA-registered alias for a character encoding, the
+  <span>document's character encoding</span> must be set to that
+  character encoding. (Otherwise, nothing happens.)</p>
+
+  <p>The <dfn
+  title="dom-document-characterSet"><code>characterSet</code></dfn>
+  DOM attribute must, on getting, return the preferred MIME name of
+  the <span>document's character encoding</span>.</p>
+
+  <p>The <dfn
+  title="dom-document-defaultCharset"><code>defaultCharset</code></dfn>
+  DOM attribute must, on getting, return the preferred MIME name of a
+  character encoding, possibly the user's default encoding, or an
+  encoding associated with the user's current geographical location,
+  or any arbitrary encoding name.</p>
+
+
+
   <h3>Elements</h3>
 
   <p>The nodes representing <span>HTML elements</span> in the DOM must
@@ -28042,7 +28072,9 @@
   mutated or implied by the algorithms given in this specification,
   are the ones that must be used when determining the character
   encoding according to the rules given in the above
-  specifications.</p>
+  specifications. Once the character encoding is established, the
+  <span>document's character encoding</span> must be set to that
+  character encoding.</p>
 
   <p>If the root element, as parsed according to the XML
   specifications cited above, is found to be an <code>html</code>
@@ -28103,6 +28135,9 @@
   subsequent versions thereof. <a href="#refsRFC2046">[RFC2046]</a> <a
   href="#refsRFC2046">[RFC2646]</a></p>
 
+  <p>The <span>document's character encoding</span> must be set to the
+  character encoding used to decode the document.</p>
+
   <p>Upon creation of the <code>Document</code> object, the user agent
   must run the <span
   title="concept-appcache-init-no-attribute">application cache
@@ -36975,7 +37010,12 @@
 
   </ol>
 
+  <p>The <span>document's character encoding</span> must immediately
+  be set to the value returned from this algorithm, at the same time
+  as the user agent uses the returned value to select the decoder to
+  use for the input stream.</p>
 
+
   <h5>Character encoding requirements</h5>
 
   <p>User agents must at a minimum support the UTF-8 and Windows-1252
@@ -37108,8 +37148,9 @@
    decoder have the same Unicode interpretations in both the current
    encoding and the new encoding, and if the user agent supports
    changing the converter on the fly, then the user agent may change
-   to the new converter for the encoding on the fly. Set the encoding
-   to the new encoding, set the <span
+   to the new converter for the encoding on the fly. Set the
+   <span>document's character encoding</span> and the encoding used to
+   convert the input stream to the new encoding, set the <span
    title="concept-encoding-confidence">confidence</span> to
    <i>confident</i>, and abort these steps.</li>
 
@@ -45190,7 +45231,6 @@
 
 
    Interaction with document.open/write/close is undefined
-   How to determine the character encoding
    Integration with quirks mode problems
    <style> parsing needs tweaking if we want to exactly match IE
    <base> parsing needs tweaking to handle multiple <base>s




More information about the Commit-Watchers mailing list