[html5] r2172 - [] (0) WF2: <form accept-charset> definition (but not the processing model yet).
whatwg at whatwg.org
whatwg at whatwg.org
Fri Sep 12 16:25:43 PDT 2008
Author: ianh
Date: 2008-09-12 16:25:42 -0700 (Fri, 12 Sep 2008)
New Revision: 2172
Modified:
index
source
Log:
[] (0) WF2: <form accept-charset> definition (but not the processing model yet).
Modified: index
===================================================================
--- index 2008-09-12 10:07:56 UTC (rev 2171)
+++ index 2008-09-12 23:25:42 UTC (rev 2172)
@@ -293,6 +293,9 @@
<li><a href="#plugins"><span class=secno>2.1.4 </span>Plugins</a>
+
+ <li><a href="#character"><span class=secno>2.1.5 </span>Character
+ encodings</a>
</ul>
<li><a href="#conformance"><span class=secno>2.2 </span>Conformance
@@ -1889,7 +1892,7 @@
</span>Newlines</a>
</ul>
- <li><a href="#character"><span class=secno>8.1.4 </span>Character
+ <li><a href="#character0"><span class=secno>8.1.4 </span>Character
references</a>
<li><a href="#cdata"><span class=secno>8.1.5 </span>CDATA sections</a>
@@ -1910,7 +1913,7 @@
<li><a href="#determining"><span class=secno>8.2.2.1.
</span>Determining the character encoding</a>
- <li><a href="#character0"><span class=secno>8.2.2.2.
+ <li><a href="#character1"><span class=secno>8.2.2.2.
</span>Character encoding requirements</a>
<li><a href="#preprocessing"><span class=secno>8.2.2.3.
@@ -1944,7 +1947,7 @@
<li><a href="#data-state"><span class=secno>8.2.4.1. </span>Data
state</a>
- <li><a href="#character1"><span class=secno>8.2.4.2.
+ <li><a href="#character2"><span class=secno>8.2.4.2.
</span>Character reference data state</a>
<li><a href="#tag-open"><span class=secno>8.2.4.3. </span>Tag open
@@ -1977,7 +1980,7 @@
<li><a href="#attribute2"><span class=secno>8.2.4.12.
</span>Attribute value (unquoted) state</a>
- <li><a href="#character2"><span class=secno>8.2.4.13.
+ <li><a href="#character3"><span class=secno>8.2.4.13.
</span>Character reference in attribute value state</a>
<li><a href="#after0"><span class=secno>8.2.4.14. </span>After
@@ -2678,6 +2681,16 @@
agent itself, vulnerabilities in the third-party software become as
dangerous as those in the user agent.
+ <h4 id=character><span class=secno>2.1.5 </span>Character encodings</h4>
+
+ <p>An <dfn id=ascii-compatible>ASCII-compatible character encoding</dfn> is
+ one that is a superset of US-ASCII (specifically, ANSI_X3.4-1968) for
+ bytes in the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C -
+ 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any
+ character sets we want to support do things outside that range?
+ -->.
+ <!-- XXX #refs RFC1345 ? -->
+
<h3 id=conformance><span class=secno>2.2 </span>Conformance requirements</h3>
<p>All diagrams, examples, and notes in this specification are
@@ -4864,7 +4877,7 @@
<li>
<p>The <a href="#url">URL</a> is a valid IRI reference and the <a
- href="#character3" title="document's character encoding">character
+ href="#character4" title="document's character encoding">character
encoding</a> of the URL's <code>Document</code> is UTF-8 or UTF-16. <a
href="#refsRFC3987">[RFC3987]</a>
</ul>
@@ -5079,7 +5092,7 @@
href="#urldoc">associated with</a> <var title="">url</var>.
<li>
- <p>Let <var title="">encoding</var> be the <a href="#character3"
+ <p>Let <var title="">encoding</var> be the <a href="#character4"
title="document's character encoding">character encoding</a> of <var
title="">document</var>.
@@ -7335,9 +7348,9 @@
</ul>
</div>
- <p>Documents have an associated <dfn id=character3 title="document's
+ <p>Documents have an associated <dfn id=character4 title="document's
character encoding">character encoding</dfn>. When a <code>Document</code>
- object is created, the <a href="#character3">document's character
+ object is created, the <a href="#character4">document's character
encoding</a> must be initialized to UTF-16. Various algorithms during page
loading affect this value, as does the <code title=dom-document-charset><a
href="#charset0">charset</a></code> setter. <a
@@ -7347,15 +7360,15 @@
<p>The <dfn id=charset0
title=dom-document-charset><code>charset</code></dfn> DOM attribute must,
on getting, return the preferred MIME name of the <a
- href="#character3">document's character encoding</a>. On setting, if the
+ href="#character4">document's character encoding</a>. On setting, if the
new value is an IANA-registered alias for a character encoding, the <a
- href="#character3">document's character encoding</a> must be set to that
+ href="#character4">document's character encoding</a> must be set to that
character encoding. (Otherwise, nothing happens.)
<p>The <dfn id=characterset
title=dom-document-characterSet><code>characterSet</code></dfn> DOM
attribute must, on getting, return the preferred MIME name of the <a
- href="#character3">document's character encoding</a>.
+ href="#character4">document's character encoding</a>.
<p>The <dfn id=defaultcharset
title=dom-document-defaultCharset><code>defaultCharset</code></dfn> DOM
@@ -8979,7 +8992,7 @@
<p>Remove all child nodes of the document.
<li>
- <p>Change the <a href="#character3">document's character encoding</a> to
+ <p>Change the <a href="#character4">document's character encoding</a> to
UTF-16.
<li>
@@ -10150,7 +10163,7 @@
document-level metadata with the <code title=attr-meta-name><a
href="#name">name</a></code> attribute, pragma directives with the <code
title=attr-meta-http-equiv><a href="#http-equiv">http-equiv</a></code>
- attribute, and the file's <a href="#character4">character encoding
+ attribute, and the file's <a href="#character5">character encoding
declaration</a> when an HTML document is serialized to string form (e.g.
for transmission over the network or for disk storage) with the <code
title=attr-meta-charset><a href="#charset1">charset</a></code> attribute.
@@ -10169,7 +10182,7 @@
<p>The <dfn id=charset1 title=attr-meta-charset><code>charset</code></dfn>
attribute specifies the character encoding used by the document. This is
- called a <a href="#character4">character encoding declaration</a>.
+ called a <a href="#character5">character encoding declaration</a>.
<p>The <code title=attr-meta-charset><a href="#charset1">charset</a></code>
attribute may be specified in <a href="#html5" title=HTML5>HTML
@@ -10508,7 +10521,7 @@
user agent requirements are all handled by the parsing section of the
specification. The state is just an alternative form of setting the
<code title=meta-charset>charset</code> attribute: it is a <a
- href="#character4">character encoding declaration</a>.</p>
+ href="#character5">character encoding declaration</a>.</p>
<p>For <code><a href="#meta0">meta</a></code> elements in the <a
href="#encoding" title=attr-meta-http-equiv-content-type>Encoding
@@ -10717,7 +10730,7 @@
though if we do then we have to duplicate the requirements in the
parsing section for conformance checkers -->
- <p>A <dfn id=character4>character encoding declaration</dfn> is a mechanism
+ <p>A <dfn id=character5>character encoding declaration</dfn> is a mechanism
by which the character encoding used to store or transmit a document is
specified.
@@ -10733,7 +10746,7 @@
http://www.iana.org/assignments/character-sets -->
<li>The character encoding declaration must be serialized without the use
- of <a href="#character5" title=syntax-charref>character references</a> or
+ of <a href="#character6" title=syntax-charref>character references</a> or
character escapes of any kind.
</ul>
@@ -10757,14 +10770,6 @@
then the character encoding used must be an <a
href="#ascii-compatible">ASCII-compatible character encoding</a>.
- <p>An <dfn id=ascii-compatible>ASCII-compatible character encoding</dfn> is
- one that is a superset of US-ASCII (specifically, ANSI_X3.4-1968) for
- bytes in the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C -
- 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any
- character sets we want to support do things outside that range?
- -->.
- <!-- XXX #refs RFC1345 ? -->
-
<p>Authors should not use JIS_X0212-1990, x-JIS0208, and encodings based on
EBCDIC. Authors should not use UTF-32. Authors must not use the CESU-8,
UTF-7, BOCU-1 and SCSU encodings. <a href="#refsCESU8">[CESU8]</a> <a
@@ -26569,7 +26574,8 @@
<dt>Element-specific attributes:
- <dd><code title=attr-form-accept-charset>accept-charset</code>
+ <dd><code title=attr-form-accept-charset><a
+ href="#accept-charset">accept-charset</a></code>
<dd><code title=attr-form-action>action</code>
@@ -26586,7 +26592,7 @@
<dd>
<pre
class=idl>interface <dfn id=htmlformelement>HTMLFormElement</dfn> : <a href="#htmlelement">HTMLElement</a> {
- attribute DOMString <span title=dom-form-accept-charset>accept-charset</span>;
+ attribute DOMString <a href="#accept-charset0" title=dom-form-accept-charset>accept-charset</a>;
attribute DOMString <span title=dom-form-action>action</span>;
attribute DOMString <span title=dom-form-enctype>enctype</span>;
attribute DOMString <span title=dom-form-method>method</span>;
@@ -26607,8 +26613,25 @@
};</pre>
</dl>
+ <p>The <code><a href="#form">form</a></code> element represents a
+ collection of <a href="#field" title=category-field>data fields</a> that
+ can be submitted to a server for processing.
+
+ <p>The <dfn id=accept-charset
+ title=attr-form-accept-charset><code>accept-charset</code></dfn> attribute
+ gives the character encodings that are to be used for the submission. If
+ specified, the value must be an <span>ordered set of space-separated
+ tokens</span>, and each token must be the preferred name of an <a
+ href="#ascii-compatible">ASCII-compatible character encoding</a>. <a
+ href="#refsIANACHARSET">[IANACHARSET]</a>
+
<p class=big-issue>...
+ <p>The <dfn id=accept-charset0
+ title=dom-form-accept-charset><code>accept-charset</code></dfn> DOM
+ attribute must <a href="#reflect">reflect</a> the content attribute of the
+ same name.
+
<p>The <dfn id=elements3
title=dom-form-elements><code>elements</code></dfn> DOM attribute must
return an <code><a
@@ -28347,7 +28370,7 @@
<p>Otherwise, let <var><a href="#the-scripts0">the script's character
encoding</a></var> for this <code><a href="#script1">script</a></code>
- element be the same as <a href="#character3" title="document's character
+ element be the same as <a href="#character4" title="document's character
encoding">the encoding of the document itself</a>.</p>
<li>
@@ -33503,7 +33526,7 @@
XXXDOCURL -->
is <code><a href="#aboutblank">about:blank</a></code><!-- XXX xref -->,
which is marked as being an <a href="#html-" title="HTML documents">HTML
- document</a>, and whose <a href="#character3" title="document's character
+ document</a>, and whose <a href="#character4" title="document's character
encoding">character encoding</a> is UTF-8. The <code>Document</code> must
have a single child <code><a href="#html">html</a></code> node, which
itself has a single child <code><a href="#body0">body</a></code> node. If
@@ -38671,7 +38694,7 @@
or implied by the algorithms given in this specification, are the ones
that must be used when determining the character encoding according to the
rules given in the above specifications. Once the character encoding is
- established, the <a href="#character3">document's character encoding</a>
+ established, the <a href="#character4">document's character encoding</a>
must be set to that character encoding.
<p>If the root element, as parsed according to the XML specifications cited
@@ -38737,7 +38760,7 @@
versions thereof. <a href="#refsRFC2046">[RFC2046]</a> <a
href="#refsRFC2646">[RFC2646]</a>
- <p>The <a href="#character3">document's character encoding</a> must be set
+ <p>The <a href="#character4">document's character encoding</a> must be set
to the character encoding used to decode the document.
<p>Upon creation of the <code>Document</code> object, the user agent must
@@ -47095,7 +47118,7 @@
described below.
<p>RCDATA elements can have <a href="#text2" title=syntax-text>text</a> and
- <a href="#character5" title=syntax-charref>character references</a>, but
+ <a href="#character6" title=syntax-charref>character references</a>, but
the text must not contain an <a href="#ambiguous"
title=syntax-ambiguous-ampersand>ambiguous ampersand</a>. There are also
<a href="#cdata-rcdata-restrictions">further restrictions</a> described
@@ -47105,7 +47128,7 @@
any contents (since, again, as there's no end tag, no content can be put
between the start tag and the end tag). Foreign elements whose start tag
is <em>not</em> marked as self-closing can have <a href="#text2"
- title=syntax-text>text</a>, <a href="#character5"
+ title=syntax-text>text</a>, <a href="#character6"
title=syntax-charref>character references</a>, <a href="#cdata1"
title=syntax-cdata>CDATA sections</a>, other <a href="#elements5"
title=syntax-elements>elements</a>, and <a href="#comments0"
@@ -47115,7 +47138,7 @@
ampersand</a>.
<p>Normal elements can have <a href="#text2" title=syntax-text>text</a>, <a
- href="#character5" title=syntax-charref>character references</a>, other <a
+ href="#character6" title=syntax-charref>character references</a>, other <a
href="#elements5" title=syntax-elements>elements</a>, and <a
href="#comments0" title=syntax-comments>comments</a>, but the text must
not contain the character U+003C LESS-THAN SIGN (<code><</code>) or an
@@ -47211,7 +47234,7 @@
<p><dfn id=attribute4 title=syntax-attribute-value>Attribute values</dfn>
are a mixture of <a href="#text2" title=syntax-text>text</a> and <a
- href="#character5" title=syntax-charref>character references</a>, except
+ href="#character6" title=syntax-charref>character references</a>, except
with the additional restriction that the text cannot contain an <a
href="#ambiguous" title=syntax-ambiguous-ampersand>ambiguous
ampersand</a>.
@@ -47602,7 +47625,7 @@
that is not itself in an <a href="#escaping" title=syntax-escape>escaping
text span</a>, and ends at the next <a href="#escaping1"
title=syntax-escape-end>escaping text span end</a>. There cannot be any <a
- href="#character5" title=syntax-charref>character references</a> inside an
+ href="#character6" title=syntax-charref>character references</a> inside an
<a href="#escaping" title=syntax-escape>escaping text span</a>.
<p>An <dfn id=escaping0 title=syntax-escape-start>escaping text span
@@ -47644,10 +47667,10 @@
FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR), U+000A LINE
FEED (LF) characters in that order.
- <h4 id=character><span class=secno>8.1.4 </span>Character references</h4>
+ <h4 id=character0><span class=secno>8.1.4 </span>Character references</h4>
<p>In certain cases described in other sections, <a href="#text2"
- title=syntax-text>text</a> may be mixed with <dfn id=character5
+ title=syntax-text>text</a> may be mixed with <dfn id=character6
title=syntax-charref>character references</dfn>. These can be used to
escape characters that couldn't otherwise legally be included in <a
href="#text2" title=syntax-text>text</a>.
@@ -48258,12 +48281,12 @@
heuristically decide which to use as a default.
</ol>
- <p>The <a href="#character3">document's character encoding</a> must
+ <p>The <a href="#character4">document's character encoding</a> must
immediately be set to the value returned from this algorithm, at the same
time as the user agent uses the returned value to select the decoder to
use for the input stream.
- <h5 id=character0><span class=secno>8.2.2.2. </span>Character encoding
+ <h5 id=character1><span class=secno>8.2.2.2. </span>Character encoding
requirements</h5>
<p>User agents must at a minimum support the UTF-8 and Windows-1252
@@ -48275,7 +48298,11 @@
<p>User agents must support the preferred MIME name of every character
encoding they support that has a preferred MIME name, and should support
all the IANA-registered aliases. <a
- href="#refsIANACHARSET">[IANACHARSET]</a>
+ href="#refsIANACHARSET">[IANACHARSET]</a></p>
+ <!-- XXX should all this be abstracted out so it can be used for
+ <script charset=""> and <form accept-charset="">? Maybe move this
+ stuff and the 'character encodings' section of the terminology
+ section into its own infrastructure subsection? -->
<p>When comparing a string specifying a character encoding with the name or
alias of a character encoding to determine if they are equal, user agents
@@ -48526,7 +48553,7 @@
have the same Unicode interpretations in both the current encoding and
the new encoding, and if the user agent supports changing the converter
on the fly, then the user agent may change to the new converter for the
- encoding on the fly. Set the <a href="#character3">document's character
+ encoding on the fly. Set the <a href="#character4">document's character
encoding</a> and the encoding used to convert the input stream to the new
encoding, set the <a href="#confidence"
title=concept-encoding-confidence>confidence</a> to <i>confident</i>, and
@@ -49133,7 +49160,7 @@
<dd>When the <a href="#content4">content model flag</a> is set to one of
the PCDATA or RCDATA states and the <a href="#escape">escape flag</a> is
- false: switch to the <a href="#character6">character reference data
+ false: switch to the <a href="#character7">character reference data
state</a>.
<dd>Otherwise: treat it as per the "anything else" entry below.
@@ -49190,8 +49217,8 @@
href="#data-state0">data state</a>.
</dl>
- <h5 id=character1><span class=secno>8.2.4.2. </span><dfn
- id=character6>Character reference data state</dfn></h5>
+ <h5 id=character2><span class=secno>8.2.4.2. </span><dfn
+ id=character7>Character reference data state</dfn></h5>
<p><em>(This cannot happen if the <a href="#content4">content model
flag</a> is set to the CDATA state.)</em>
@@ -49624,7 +49651,7 @@
<dt>U+0026 AMPERSAND (&)
- <dd>Switch to the <a href="#character7">character reference in attribute
+ <dd>Switch to the <a href="#character8">character reference in attribute
value state</a>, with the <a href="#additional">additional allowed
character</a> being U+0022 QUOTATION MARK (").
@@ -49653,7 +49680,7 @@
<dt>U+0026 AMPERSAND (&)
- <dd>Switch to the <a href="#character7">character reference in attribute
+ <dd>Switch to the <a href="#character8">character reference in attribute
value state</a>, with the <a href="#additional">additional allowed
character</a> being U+0027 APOSTROPHE (').
@@ -49688,7 +49715,7 @@
<dt>U+0026 AMPERSAND (&)
- <dd>Switch to the <a href="#character7">character reference in attribute
+ <dd>Switch to the <a href="#character8">character reference in attribute
value state</a>, with no <a href="#additional">additional allowed
character</a>.
@@ -49717,8 +49744,8 @@
Stay in the <a href="#attribute8">attribute value (unquoted) state</a>.
</dl>
- <h5 id=character2><span class=secno>8.2.4.13. </span><dfn
- id=character7>Character reference in attribute value state</dfn></h5>
+ <h5 id=character3><span class=secno>8.2.4.13. </span><dfn
+ id=character8>Character reference in attribute value state</dfn></h5>
<p>Attempt to <a href="#consume">consume a character reference</a>.
@@ -50463,8 +50490,8 @@
<p>This section defines how to <dfn id=consume>consume a character
reference</dfn>. This definition is used when parsing character references
- <a href="#character6" title="character reference data state">in text</a>
- and <a href="#character7" title="character reference in attribute value
+ <a href="#character7" title="character reference data state">in text</a>
+ and <a href="#character8" title="character reference in attribute value
state">in attributes</a>.
<p>The behavior depends on the identity of the next character (the one
@@ -50821,7 +50848,7 @@
<p>If the last character matched is not a U+003B SEMICOLON (<code
title="">;</code>), there is a <a href="#parse2">parse error</a>.</p>
- <p>If the character reference is being consumed <a href="#character7"
+ <p>If the character reference is being consumed <a href="#character8"
title="character reference in attribute value state">as part of an
attribute</a>, and the last character matched is not a U+003B SEMICOLON
(<code title="">;</code>), and the next character is in the range U+0030
Modified: source
===================================================================
--- source 2008-09-12 10:07:56 UTC (rev 2171)
+++ source 2008-09-12 23:25:42 UTC (rev 2172)
@@ -535,7 +535,18 @@
agent.</p>
+ <h4>Character encodings</h4>
+ <p>An <dfn>ASCII-compatible character encoding</dfn> is one that is
+ a superset of US-ASCII (specifically, ANSI_X3.4-1968) for bytes in
+ the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C -
+ 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any
+ character sets we want to support do things outside that range?
+ -->. <!-- XXX #refs RFC1345 ? --></p>
+
+
+
+
<h3>Conformance requirements</h3>
<p>All diagrams, examples, and notes in this specification are
@@ -8678,13 +8689,6 @@
state</span>, then the character encoding used must be an
<span>ASCII-compatible character encoding</span>.</p>
- <p>An <dfn>ASCII-compatible character encoding</dfn> is one that is
- a superset of US-ASCII (specifically, ANSI_X3.4-1968) for bytes in
- the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C -
- 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any
- character sets we want to support do things outside that range?
- -->. <!-- XXX #refs RFC1345 ? --></p>
-
<p>Authors should not use JIS_X0212-1990, x-JIS0208, and encodings
based on EBCDIC. Authors should not use UTF-32. Authors must not use
the CESU-8, UTF-7, BOCU-1 and SCSU encodings. <a
@@ -23964,8 +23968,25 @@
</dd>
</dl>
+ <p>The <code>form</code> element represents a collection of <span
+ title="category-field">data fields</span> that can be submitted to a
+ server for processing.</p>
+
+ <p>The <dfn
+ title="attr-form-accept-charset"><code>accept-charset</code></dfn>
+ attribute gives the character encodings that are to be used for the
+ submission. If specified, the value must be an <span>ordered set of
+ space-separated tokens</span>, and each token must be the preferred
+ name of an <span>ASCII-compatible character encoding</span>. <a
+ href="#refsIANACHARSET">[IANACHARSET]</a></p>
+
<p class="big-issue">...</p>
+ <p>The <dfn
+ title="dom-form-accept-charset"><code>accept-charset</code></dfn>
+ DOM attribute must <span>reflect</span> the content attribute of the
+ same name.</p>
+
<p>The <dfn title="dom-form-elements"><code>elements</code></dfn>
DOM attribute must return an <code>HTMLFormControlsCollection</code>
rooted at the <code>Document</code> node, whose filter matches <span
@@ -45445,6 +45466,11 @@
should support all the IANA-registered aliases. <a
href="#refsIANACHARSET">[IANACHARSET]</a></p>
+ <!-- XXX should all this be abstracted out so it can be used for
+ <script charset=""> and <form accept-charset="">? Maybe move this
+ stuff and the 'character encodings' section of the terminology
+ section into its own infrastructure subsection? -->
+
<p>When comparing a string specifying a character encoding with the
name or alias of a character encoding to determine if they are
equal, user agents must ignore all characters in the ranges U+0009
More information about the Commit-Watchers
mailing list