[html5] r1460 - /
whatwg at whatwg.org
whatwg at whatwg.org
Fri Apr 18 15:34:06 PDT 2008
Author: ianh
Date: 2008-04-18 15:34:05 -0700 (Fri, 18 Apr 2008)
New Revision: 1460
Modified:
index
source
Log:
[] (0) Define document.charset, .characterSet, .defaultCharset
Modified: index
===================================================================
--- index 2008-04-17 23:50:22 UTC (rev 1459)
+++ index 2008-04-18 22:34:05 UTC (rev 1460)
@@ -24,7 +24,7 @@
<h1 id=html-5>HTML 5</h1>
- <h2 class="no-num no-toc" id=working>Working Draft — 17 April 2008</h2>
+ <h2 class="no-num no-toc" id=working>Working Draft — 18 April 2008</h2>
<p>You can take part in this work. <a
href="http://www.whatwg.org/mailing-list">Join the working group's
@@ -2598,6 +2598,9 @@
attribute DOMString <a href="#cookie0" title=dom-document-cookie>cookie</a>;
readonly attribute DOMString <a href="#lastmodified" title=dom-document-lastModified>lastModified</a>;
readonly attribute DOMString <a href="#compatmode" title=dom-document-compatMode>compatMode</a>;
+ attribute DOMString <a href="#charset0" title=dom-document-charset>charset</a>;
+ readonly attribute DOMString <a href="#characterset" title=dom-document-characterSet>characterSet</a>;
+ readonly attribute DOMString <a href="#defaultcharset" title=dom-document-defaultCharset>defaultCharset</a>;
// <a href="#dom-tree0">DOM tree accessors</a>
attribute DOMString <a href="#document.title" title=dom-document-title>title</a>;
@@ -2642,9 +2645,6 @@
DOMString <a href="#querycommandvalue" title=dom-document-queryCommandValue>queryCommandValue</a>(in DOMString commandId);
<a href="#selection1">Selection</a> <a href="#getselection0" title=dom-document-getSelection>getSelection</a>();
<!-- XXX we're not done here.
- attribute DOMString charset;
- readonly attribute DOMString defaultCharset;
- readonly attribute DOMString characterSet;
readonly attribute DOMString readyState;
readonly attribute HTMLCollection scripts;
-->
@@ -2806,6 +2806,35 @@
</ul>
</div>
+ <p>Documents have an associated <dfn id=character1 title="document's
+ character encoding">character encoding</dfn>. When a <code>Document</code>
+ object is created, the <a href="#character1">document's character
+ encoding</a> must be initialised to UTF-16. Various algorithms during page
+ loading affect this value, as does the <code title=dom-document-charset><a
+ href="#charset0">charset</a></code> setter. <a
+ href="#refsIANACHARSET">[IANACHARSET]</a> <!-- XXX
+ http://www.iana.org/assignments/character-sets -->
+
+ <p>The <dfn id=charset0
+ title=dom-document-charset><code>charset</code></dfn> DOM attribute must,
+ on getting, return the preferred MIME name of the <a
+ href="#character1">document's character encoding</a>. On setting, if the
+ new value is an IANA-registered alias for a character encoding, the <a
+ href="#character1">document's character encoding</a> must be set to that
+ character encoding. (Otherwise, nothing happens.)
+
+ <p>The <dfn id=characterset
+ title=dom-document-characterSet><code>characterSet</code></dfn> DOM
+ attribute must, on getting, return the preferred MIME name of the <a
+ href="#character1">document's character encoding</a>.
+
+ <p>The <dfn id=defaultcharset
+ title=dom-document-defaultCharset><code>defaultCharset</code></dfn> DOM
+ attribute must, on getting, return the preferred MIME name of a character
+ encoding, possibly the user's default encoding, or an encoding associated
+ with the user's current geographical location, or any arbitrary encoding
+ name.
+
<h3 id=elements><span class=secno>2.2 </span>Elements</h3>
<p>The nodes representing <a href="#html-elements">HTML elements</a> in the
@@ -7536,7 +7565,7 @@
<dt>Contexts in which this element may be used:
<dd>If the <code title=attr-meta-charset><a
- href="#charset0">charset</a></code> attribute is present, or if the
+ href="#charset1">charset</a></code> attribute is present, or if the
element is in the <a href="#encoding"
title=attr-meta-http-equiv-content-type>Encoding declaraton state</a>: as
the first element in a <code><a href="#head">head</a></code> element.
@@ -7571,7 +7600,7 @@
<dd><code title=attr-meta-content><a href="#content0">content</a></code>
- <dd><code title=attr-meta-charset><a href="#charset0">charset</a></code>
+ <dd><code title=attr-meta-charset><a href="#charset1">charset</a></code>
(<a href="#html-" title="HTML documents">HTML</a> only)
<dt>DOM interface:
@@ -7596,15 +7625,15 @@
document-level metadata with the <code title=attr-meta-name><a
href="#name">name</a></code> attribute, pragma directives with the <code
title=attr-meta-http-equiv><a href="#http-equiv0">http-equiv</a></code>
- attribute, and the file's <a href="#character1">character encoding
+ attribute, and the file's <a href="#character2">character encoding
declaration</a> when an HTML document is serialised to string form (e.g.
for transmission over the network or for disk storage) with the <code
- title=attr-meta-charset><a href="#charset0">charset</a></code> attribute.
+ title=attr-meta-charset><a href="#charset1">charset</a></code> attribute.
<p>Exactly one of the <code title=attr-meta-name><a
href="#name">name</a></code>, <code title=attr-meta-http-equiv><a
href="#http-equiv0">http-equiv</a></code>, and <code
- title=attr-meta-charset><a href="#charset0">charset</a></code> attributes
+ title=attr-meta-charset><a href="#charset1">charset</a></code> attributes
must be specified.
<p>If either <code title=attr-meta-name><a href="#name">name</a></code> or
@@ -7613,15 +7642,15 @@
title=attr-meta-content><a href="#content0">content</a></code> attribute
must also be specified. Otherwise, it must be omitted.
- <p>The <dfn id=charset0 title=attr-meta-charset><code>charset</code></dfn>
+ <p>The <dfn id=charset1 title=attr-meta-charset><code>charset</code></dfn>
attribute specifies the character encoding used by the document. This is
- called a <a href="#character1">character encoding declaration</a>.
+ called a <a href="#character2">character encoding declaration</a>.
- <p>The <code title=attr-meta-charset><a href="#charset0">charset</a></code>
+ <p>The <code title=attr-meta-charset><a href="#charset1">charset</a></code>
attribute may be specified in <a href="#html5" title=HTML5>HTML
documents</a> only, it must not be used in <a href="#xhtml5"
title=XHTML>XML documents</a>. If the <code title=attr-meta-charset><a
- href="#charset0">charset</a></code> attribute is specified, the element
+ href="#charset1">charset</a></code> attribute is specified, the element
must be the first element in <a href="#the-head0">the <code>head</code>
element</a> of the file.
@@ -7892,7 +7921,7 @@
user agent requirements are all handled by the parsing section of the
specification. The state is just an alternative form of setting the
<code title=meta-charset>charset</code> attribute: it is a <a
- href="#character1">character encoding declaration</a>.</p>
+ href="#character2">character encoding declaration</a>.</p>
<p>For <code><a href="#meta0">meta</a></code> elements in the <a
href="#encoding" title=attr-meta-http-equiv-content-type>Encoding
@@ -7912,7 +7941,7 @@
then that element must be the first element in the document's <code><a
href="#head">head</a></code> element, and the document must not contain
a <code><a href="#meta0">meta</a></code> element with the <code
- title=attr-meta-charset><a href="#charset0">charset</a></code> attribute
+ title=attr-meta-charset><a href="#charset1">charset</a></code> attribute
present.</p>
<p>The <a href="#encoding"
@@ -8096,7 +8125,7 @@
though if we do then we have to duplicate the requirements in the
parsing section for conformance checkers -->
- <p>A <dfn id=character1>character encoding declaration</dfn> is a mechanism
+ <p>A <dfn id=character2>character encoding declaration</dfn> is a mechanism
by which the character encoding used to store or transmit a document is
specified.
@@ -8127,7 +8156,7 @@
and, in addition, if that encoding isn't US-ASCII itself, then the
encoding must be specified using a <code><a href="#meta0">meta</a></code>
element with a <code title=attr-meta-charset><a
- href="#charset0">charset</a></code> attribute or a <code><a
+ href="#charset1">charset</a></code> attribute or a <code><a
href="#meta0">meta</a></code> element in the <a href="#encoding"
title=attr-meta-http-equiv-content-type>Encoding declaraton state</a>.
@@ -30279,7 +30308,9 @@
<p>The actual HTTP headers and other metadata, not the headers as mutated
or implied by the algorithms given in this specification, are the ones
that must be used when determining the character encoding according to the
- rules given in the above specifications.
+ rules given in the above specifications. Once the character encoding is
+ established, the <a href="#character1">document's character encoding</a>
+ must be set to that character encoding.
<p>If the root element, as parsed according to the XML specifications cited
above, is found to be an <code><a href="#html">html</a></code> element
@@ -30339,6 +30370,9 @@
versions thereof. <a href="#refsRFC2046">[RFC2046]</a> <a
href="#refsRFC2046">[RFC2646]</a>
+ <p>The <a href="#character1">document's character encoding</a> must be set
+ to the character encoding used to decode the document.
+
<p>Upon creation of the <code>Document</code> object, the user agent must
run the <a href="#application3"
title=concept-appcache-init-no-attribute>application cache selection
@@ -38322,7 +38356,7 @@
described below.
<p>RCDATA elements can have <a href="#text1" title=syntax-text>text</a> and
- <a href="#character2" title=syntax-entities>character entity
+ <a href="#character3" title=syntax-entities>character entity
references</a>, but the text must not contain an <a href="#ambiguous"
title=syntax-ambiguous-ampersand>ambiguous ampersand</a>. There are also
<a href="#cdata-rcdata-restrictions">further restrictions</a> described
@@ -38332,7 +38366,7 @@
any contents (since, again, as there's no end tag, no content can be put
between the start tag and the end tag). Foreign elements whose start tag
is <em>not</em> marked as self-closing can have <a href="#text1"
- title=syntax-text>text</a>, <a href="#character2"
+ title=syntax-text>text</a>, <a href="#character3"
title=syntax-entities>character entity references</a>, <a href="#cdata0"
title=syntax-cdata>CDATA blocks</a>, other <a href="#elements2"
title=syntax-elements>elements</a>, and <a href="#comments0"
@@ -38342,7 +38376,7 @@
ampersand</a>.
<p>Normal elements can have <a href="#text1" title=syntax-text>text</a>, <a
- href="#character2" title=syntax-entities>character entity references</a>,
+ href="#character3" title=syntax-entities>character entity references</a>,
other <a href="#elements2" title=syntax-elements>elements</a>, and <a
href="#comments0" title=syntax-comments>comments</a>, but the text must
not contain the character U+003C LESS-THAN SIGN (<code><</code>) or an
@@ -38438,7 +38472,7 @@
<p><dfn id=attribute0 title=syntax-attribute-value>Attribute values</dfn>
are a mixture of <a href="#text1" title=syntax-text>text</a> and <a
- href="#character2" title=syntax-entities>character entity references</a>,
+ href="#character3" title=syntax-entities>character entity references</a>,
except with the additional restriction that the text cannot contain an <a
href="#ambiguous" title=syntax-ambiguous-ampersand>ambiguous
ampersand</a>.
@@ -38818,7 +38852,7 @@
<h4 id=character><span class=secno>8.1.4 </span>Character entity references</h4>
<p>In certain cases described in other sections, <a href="#text1"
- title=syntax-text>text</a> may be mixed with <dfn id=character2
+ title=syntax-text>text</a> may be mixed with <dfn id=character3
title=syntax-entities>character entity references</dfn>. These can be used
to escape characters that couldn't otherwise legally be included in <a
href="#text1" title=syntax-text>text</a>.
@@ -39435,6 +39469,11 @@
heuristically decide which to use as a default.
</ol>
+ <p>The <a href="#character1">document's character encoding</a> must
+ immediately be set to the value returned from this algorithm, at the same
+ time as the user agent uses the returned value to select the decoder to
+ use for the input stream.
+
<h5 id=character0><span class=secno>8.2.2.2. </span>Character encoding
requirements</h5>
@@ -39566,9 +39605,11 @@
have the same Unicode interpretations in both the current encoding and
the new encoding, and if the user agent supports changing the converter
on the fly, then the user agent may change to the new converter for the
- encoding on the fly. Set the encoding to the new encoding, set the <a
- href="#confidence" title=concept-encoding-confidence>confidence</a> to
- <i>confident</i>, and abort these steps.
+ encoding on the fly. Set the <a href="#character1">document's character
+ encoding</a> and the encoding used to convert the input stream to the new
+ encoding, set the <a href="#confidence"
+ title=concept-encoding-confidence>confidence</a> to <i>confident</i>, and
+ abort these steps.
<li>Otherwise, <a href="#navigate">navigate</a> to the document again,
with <a href="#replacement">replacement enabled</a>, but this time skip
@@ -42752,16 +42793,16 @@
set.</p>
<p id=meta-charset-during-parse>If the element has a <code
- title=attr-meta-charset><a href="#charset0">charset</a></code>
+ title=attr-meta-charset><a href="#charset1">charset</a></code>
attribute, and its value is a supported encoding, and the <a
href="#confidence" title=concept-encoding-confidence>confidence</a> is
currently <i>tentative</i>, then <a href="#change">change the
encoding</a> to the encoding given by the value of the <code
- title=attr-meta-charset><a href="#charset0">charset</a></code>
+ title=attr-meta-charset><a href="#charset1">charset</a></code>
attribute.</p>
<p>Otherwise, if the element has a <code title=attr-meta-charset><a
- href="#charset0">content</a></code> attribute, and applying the <a
+ href="#charset1">content</a></code> attribute, and applying the <a
href="#algorithm4">algorithm for extracting an encoding from a
Content-Type</a> to its value returns a supported encoding <var
title="">encoding</var>, and the <a href="#confidence"
@@ -50029,7 +50070,6 @@
Interaction with document.open/write/close is undefined
- How to determine the character encoding
Integration with quirks mode problems
<style> parsing needs tweaking if we want to exactly match IE
<base> parsing needs tweaking to handle multiple <base>s
Modified: source
===================================================================
--- source 2008-04-17 23:50:22 UTC (rev 1459)
+++ source 2008-04-18 22:34:05 UTC (rev 1460)
@@ -904,6 +904,9 @@
attribute DOMString <span title="dom-document-cookie">cookie</span>;
readonly attribute DOMString <span title="dom-document-lastModified">lastModified</span>;
readonly attribute DOMString <span title="dom-document-compatMode">compatMode</span>;
+ attribute DOMString <span title="dom-document-charset">charset</span>;
+ readonly attribute DOMString <span title="dom-document-characterSet">characterSet</span>;
+ readonly attribute DOMString <span title="dom-document-defaultCharset">defaultCharset</span>;
// <span>DOM tree accessors</span>
attribute DOMString <span title="dom-document-title">title</span>;
@@ -948,9 +951,6 @@
DOMString <span title="dom-document-queryCommandValue">queryCommandValue</span>(in DOMString commandId);
<span>Selection</span> <span title="dom-document-getSelection">getSelection</span>();
<!-- XXX we're not done here.
- attribute DOMString charset;
- readonly attribute DOMString defaultCharset;
- readonly attribute DOMString characterSet;
readonly attribute DOMString readyState;
readonly attribute HTMLCollection scripts;
-->
@@ -1127,6 +1127,36 @@
+ <p>Documents have an associated <dfn title="document's character
+ encoding">character encoding</dfn>. When a <code>Document</code>
+ object is created, the <span>document's character encoding</span>
+ must be initialised to UTF-16. Various algorithms during page
+ loading affect this value, as does the <code
+ title="dom-document-charset">charset</code> setter. <a
+ href="#refsIANACHARSET">[IANACHARSET]</a> <!-- XXX
+ http://www.iana.org/assignments/character-sets --></p>
+
+ <p>The <dfn title="dom-document-charset"><code>charset</code></dfn>
+ DOM attribute must, on getting, return the preferred MIME name of
+ the <span>document's character encoding</span>. On setting, if the
+ new value is an IANA-registered alias for a character encoding, the
+ <span>document's character encoding</span> must be set to that
+ character encoding. (Otherwise, nothing happens.)</p>
+
+ <p>The <dfn
+ title="dom-document-characterSet"><code>characterSet</code></dfn>
+ DOM attribute must, on getting, return the preferred MIME name of
+ the <span>document's character encoding</span>.</p>
+
+ <p>The <dfn
+ title="dom-document-defaultCharset"><code>defaultCharset</code></dfn>
+ DOM attribute must, on getting, return the preferred MIME name of a
+ character encoding, possibly the user's default encoding, or an
+ encoding associated with the user's current geographical location,
+ or any arbitrary encoding name.</p>
+
+
+
<h3>Elements</h3>
<p>The nodes representing <span>HTML elements</span> in the DOM must
@@ -28042,7 +28072,9 @@
mutated or implied by the algorithms given in this specification,
are the ones that must be used when determining the character
encoding according to the rules given in the above
- specifications.</p>
+ specifications. Once the character encoding is established, the
+ <span>document's character encoding</span> must be set to that
+ character encoding.</p>
<p>If the root element, as parsed according to the XML
specifications cited above, is found to be an <code>html</code>
@@ -28103,6 +28135,9 @@
subsequent versions thereof. <a href="#refsRFC2046">[RFC2046]</a> <a
href="#refsRFC2046">[RFC2646]</a></p>
+ <p>The <span>document's character encoding</span> must be set to the
+ character encoding used to decode the document.</p>
+
<p>Upon creation of the <code>Document</code> object, the user agent
must run the <span
title="concept-appcache-init-no-attribute">application cache
@@ -36975,7 +37010,12 @@
</ol>
+ <p>The <span>document's character encoding</span> must immediately
+ be set to the value returned from this algorithm, at the same time
+ as the user agent uses the returned value to select the decoder to
+ use for the input stream.</p>
+
<h5>Character encoding requirements</h5>
<p>User agents must at a minimum support the UTF-8 and Windows-1252
@@ -37108,8 +37148,9 @@
decoder have the same Unicode interpretations in both the current
encoding and the new encoding, and if the user agent supports
changing the converter on the fly, then the user agent may change
- to the new converter for the encoding on the fly. Set the encoding
- to the new encoding, set the <span
+ to the new converter for the encoding on the fly. Set the
+ <span>document's character encoding</span> and the encoding used to
+ convert the input stream to the new encoding, set the <span
title="concept-encoding-confidence">confidence</span> to
<i>confident</i>, and abort these steps.</li>
@@ -45190,7 +45231,6 @@
Interaction with document.open/write/close is undefined
- How to determine the character encoding
Integration with quirks mode problems
<style> parsing needs tweaking if we want to exactly match IE
<base> parsing needs tweaking to handle multiple <base>s
More information about the Commit-Watchers
mailing list