[html5] r1910 - [] (0) Make the coercions section not invent a new syntax. (Bug 5808) (credit: hs)
whatwg at whatwg.org
whatwg at whatwg.org
Wed Jul 23 01:40:07 PDT 2008
Author: ianh
Date: 2008-07-23 01:40:07 -0700 (Wed, 23 Jul 2008)
New Revision: 1910
Modified:
index
source
Log:
[] (0) Make the coercions section not invent a new syntax. (Bug 5808) (credit: hs)
Modified: index
===================================================================
--- index 2008-07-23 07:41:48 UTC (rev 1909)
+++ index 2008-07-23 08:40:07 UTC (rev 1910)
@@ -51014,122 +51014,80 @@
is not compatible with the XML tool chain in certain subtle ways. For
example, an XML toolchain might not be able to represent attributes with
the name <code title="">xmlns</code>, since they conflict with the
- Namespaces in XML syntax. <a href="#refsXMLNS">[XMLNS]</a>
+ Namespaces in XML syntax. There is also some data that the <a
+ href="#html-0">HTML parser</a> generates that isn't included in the DOM
+ itself. This section specifies some rules for handling these issues.
- <p>There is also some data that the <a href="#html-0">HTML parser</a>
- generates that isn't included in the DOM itself.
+ <p>If the XML API being used doesn't support DOCTYPEs, tools may drop
+ DOCTYPEs altogether.
- <p>To allow tools to apply a consistent set of adjustments to the output of
- their <a href="#html-0">HTML parser</a> to allow for compatibility with
- the rest of their XML toolchain, this section documents a set of mutations
- and conventions that will convert the output of the <a href="#html-0">HTML
- parser</a> for any arbitrary input into an XML Infoset that doesn't have
- any problematic characteristics.
+ <p>If the XML API doesn't support attributes in no namespace that are named
+ "<code title="">xmlns</code>", attributes whose names start with "<code
+ title="">xmlns:</code>", or attributes in the <a href="#xmlns">XMLNS
+ namespace</a>, then the tool may drop such attributes.
- <p>Tools that cannot convey the out-of-band information using out-of-band
- mechanisms, or that cannot convey the DOM exactly as prescribed by this
- specification, may either ignore the offending information or DOM feature,
- or may represent it internally in the DOM using the conventions described
- below.
+ <p>The tool may annotate the output with any namespace declarations
+ required for proper operation.
- <p>These conventions are not conforming HTML, and user agents must not
- output such syntax outside of their XML pipeline.
+ <p>If the XML API being used restricts the allowable characters in the
+ local names of elements and attributes, then the tool may map all element
+ and attribute local names that the API wouldn't support to a set of names
+ that <em>are</em> allowed, by replacing any character that isn't supported
+ with the upper case letter U and the five digits of the character's
+ Unicode codepoint when expressed in hexadecimal.
- <dl>
- <dt>The <code>DocumentType</code> node's <code title="">name</code>, <code
- title="">publicId</code>, and <code title="">systemId</code> attributes
+ <p class=example>For example, the element name <code
+ title="">.foo<bar</code>, which can be output by the <a
+ href="#html-0">HTML parser</a>, though it is neither a legal HTML element
+ name nor a well-formed XML element name, would be converted into <code
+ title="">U0002EfooU0003Cbar</code>, which <em>is</em> a well-formed XML
+ element name (though it's still not legal in HTML by any means).
- <dd>If the XML API being used doesn't support DOCTYPEs, tools may drop
- DOCTYPEs altogether or create a set of three attributes on the root
- element, named <code title="">__doctype_name__</code>, <code
- title="">__doctype_publicid__</code>, and <code
- title="">__doctype_systemid__</code>, respectively, whose values are the
- values that would have been put on the <code>DocumentType</code> node.
+ <p class=example>As another example, consider the attribute
+ <code>xlink:href</code>. Used on a MathML element, it becomes, after being
+ <span title="adjust foreign attributes</span>, an attribute with a prefix
+ "><code title="">xlink</code>" and a local name "<code
+ title="">href</code>". However, used on an HTML element, it becomes an
+ attribute with no prefix and the local name "<code
+ title="">xlink:href</code>", which is not a valid NCName, and thus might
+ not be accepted by an XML API. It could thus get converted, becoming
+ "<code title="">xlinkU0003Ahref</code>".</span>
- <dt>The document being set to <i><a href="#no-quirks">no quirks
- mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or
- <i><a href="#quirks">quirks mode</a></i>
+ <p class=note>The resulting names from this conversion conveniently can't
+ clash with any attribute generated by the <a href="#html-0">HTML
+ parser</a>, since those are all either lowercase or those listed in the <a
+ href="#adjust">adjust foreign attributes</a> algorithm's table.
- <dd>To convey this information, create an attribute <code
- title="">__mode__</code> on the root element, with values "noquirks",
- "limitedquirks", or "quirks" respectively.
+ <p>If the XML API restricts comments from having two consecutive U+002D
+ HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE
+ character between any such offending characters.
- <dt>Elements that have a namespace without appropriate <code
- title="">xmlns</code> attributes being in scope
+ <p>If the XML API restricts allowed characters in character data, the tool
+ may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE
+ character, and any other literal non-XML character with a U+FFFD
+ REPLACEMENT CHARACTER.
- <dd>Construct the DOM as if appropriate namespace declarations were in
- scope.
+ <p>If the tool has no way to convey out-of-band information, then the tool
+ may drop the following information:
- <dt>Elements whose names contain U+003A COLON (:) characters or characters
- that cannot be represented in XML element names
+ <ul>
+ <li>Whether the document is set to <i><a href="#no-quirks">no quirks
+ mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or
+ <i><a href="#quirks">quirks mode</a></i>
- <dd>Drop the element and all its children, or replace any offending
- characters with a U+005F LOW LINE (_) character.
+ <li>The association between form controls and forms that aren't their
+ nearest <code>form</code> element ancestor (use of the <a
+ href="#form-element"><code>form</code> element pointer</a> in the parser)
+ </ul>
- <dt>Attributes named <code title="">xmlns</code> whose values match the
- namespace of the element node
+ <p class=note>The mutatiosn allowed by this section apply <em>after</em>
+ the <a href="#html-0">HTML parser</a>'s rules have been applied. For
+ example, a <code title=""><a::></code> start tag will be closed by a
+ <code title=""></a::></code> end tag, and never by a <code
+ title=""></aU0003AU0003A></code> end tag, even if the user agent is
+ using the rules above to then generate an actual element in the DOM with
+ the name <code title="">aU0003AU0003A</code> for that start tag.
- <dd>Construct the DOM as if these were default namespace declarations.
-
- <dt>Attributes named <code title="">xmlns:xlink</code> whose values match
- the <a href="#xlink">XLink namespace</a>, on elements whose namespace is
- not the <a href="#html-namespace0">HTML namespace</a>
-
- <dd>Construct the DOM as if these were namespace prefix declarations.
-
- <dt>Other attributes whose names are <code title="">xmlns</code> or start
- with <code title="">xmlns:</code>
-
- <dd>Drop the attributes or add two U+005F LOW LINE (_) characters to the
- start of the attributes' names and replace any U+003A COLON (:)
- characters with a U+005F LOW LINE (_) character.
-
- <dt>Other attributes in no namespace whose names contain U+003A COLON (:)
- characters
-
- <dt>Attributes whose names contain characters that cannot be represented
- in XML attribute names
-
- <dd>Drop the attributes or replace any offending characters with a U+005F
- LOW LINE (_) character, dropping any attributes where doing this would
- cause an attribute name clash.
-
- <dt>Form controls associated with forms that aren't their nearest ancestor
- (use of the <a href="#form-element"><code>form</code> element
- pointer</a>)
-
- <dd>Create an attribute <code title="">__formid__</code> on the form, with
- a value unique amongst <code title="">__formid__</code> attributes in the
- document, and create an attribute <code title="">__form__</code> on the
- form control, whose value matches the unique identifier given to the
- form.
-
- <dt>Any U+000C FORM FEED (FF) character
-
- <dd>Replace the character with a U+0020 SPACE character.
-
- <dt>Any other literal non-XML character
-
- <dd>Replace the character with a U+FFFD REPLACEMENT CHARACTER.
-
- <dt>A comment that contains two adjacent U+002D HYPHEN-MINUS characters
- (--).
-
- <dd>Insert a U+0020 SPACE character between them.
- </dl>
-
- <p>Tools that use these conventions should guard against documents that
- include markup that clashes with them by always dropping all attributes in
- the document that start with two U+005F LOW LINE (_) characters.
-
- <p class=note>These conventions apply <em>after</em> the <a
- href="#html-0">HTML parser</a>'s rules have been applied. For example, a
- <code title=""><a::></code> start tag will be closed by a <code
- title=""></a::></code> end tag, and never by a <code
- title=""></a__></code> end tag, even if the user agent is using the
- rules above to then generate an actual element in the DOM with the name
- <code title="">a__</code> for that start tag.
-
<h3 id=namespaces><span class=secno>8.3 </span>Namespaces</h3>
<p>The <dfn id=html-namespace0>HTML namespace</dfn> is:
Modified: source
===================================================================
--- source 2008-07-23 07:41:48 UTC (rev 1909)
+++ source 2008-07-23 08:40:07 UTC (rev 1910)
@@ -48102,141 +48102,89 @@
constructed DOM is not compatible with the XML tool chain in certain
subtle ways. For example, an XML toolchain might not be able to
represent attributes with the name <code title="">xmlns</code>,
- since they conflict with the Namespaces in XML syntax. <a
- href="#refsXMLNS">[XMLNS]</a></p>
+ since they conflict with the Namespaces in XML syntax. There is also
+ some data that the <span>HTML parser</span> generates that isn't
+ included in the DOM itself. This section specifies some rules for
+ handling these issues.</p>
- <p>There is also some data that the <span>HTML parser</span>
- generates that isn't included in the DOM itself.</p>
+ <p>If the XML API being used doesn't support DOCTYPEs, tools may
+ drop DOCTYPEs altogether.</dd>
- <p>To allow tools to apply a consistent set of adjustments to the
- output of their <span>HTML parser</span> to allow for compatibility
- with the rest of their XML toolchain, this section documents a set
- of mutations and conventions that will convert the output of the
- <span>HTML parser</span> for any arbitrary input into an XML Infoset
- that doesn't have any problematic characteristics.</p>
+ <p>If the XML API doesn't support attributes in no namespace that
+ are named "<code title="">xmlns</code>", attributes whose names
+ start with "<code title="">xmlns:</code>", or attributes in the
+ <span>XMLNS namespace</span>, then the tool may drop such
+ attributes.</p>
- <p>Tools that cannot convey the out-of-band information using
- out-of-band mechanisms, or that cannot convey the DOM exactly as
- prescribed by this specification, may either ignore the offending
- information or DOM feature, or may represent it internally in the
- DOM using the conventions described below.</p>
+ <p>The tool may annotate the output with any namespace declarations
+ required for proper operation.</p>
- <p>These conventions are not conforming HTML, and user agents must
- not output such syntax outside of their XML pipeline.</p>
+ <p>If the XML API being used restricts the allowable characters in
+ the local names of elements and attributes, then the tool may map
+ all element and attribute local names that the API wouldn't support
+ to a set of names that <em>are</em> allowed, by replacing any
+ character that isn't supported with the upper case letter U and the
+ five digits of the character's Unicode codepoint when expressed in
+ hexadecimal.</p>
- <dl>
+ <p class="example">For example, the element name <code
+ title="">.foo<bar</code>, which can be output by the <span>HTML
+ parser</span>, though it is neither a legal HTML element name nor a
+ well-formed XML element name, would be converted into <code
+ title="">U0002EfooU0003Cbar</code>, which <em>is</em> a well-formed
+ XML element name (though it's still not legal in HTML by any
+ means).</p>
- <dt>The <code>DocumentType</code> node's <code
- title="">name</code>, <code title="">publicId</code>, and <code
- title="">systemId</code> attributes</dt>
+ <p class="example">As another example, consider the attribute
+ <code>xlink:href</code>. Used on a MathML element, it becomes, after
+ being <span title="adjust foreign attributes</span>, an attribute
+ with a prefix "<code title="">xlink</code>" and a local name "<code
+ title="">href</code>". However, used on an HTML element, it becomes
+ an attribute with no prefix and the local name "<code
+ title="">xlink:href</code>", which is not a valid NCName, and thus
+ might not be accepted by an XML API. It could thus get converted,
+ becoming "<code title="">xlinkU0003Ahref</code>".</p>
- <dd>If the XML API being used doesn't support DOCTYPEs, tools may
- drop DOCTYPEs altogether or create a set of three attributes on the
- root element, named <code title="">__doctype_name__</code>, <code
- title="">__doctype_publicid__</code>, and <code
- title="">__doctype_systemid__</code>, respectively, whose values
- are the values that would have been put on the
- <code>DocumentType</code> node.</dd>
+ <p class="note">The resulting names from this conversion
+ conveniently can't clash with any attribute generated by the
+ <span>HTML parser</span>, since those are all either lowercase or
+ those listed in the <span>adjust foreign attributes</span>
+ algorithm's table.</p>
+ <p>If the XML API restricts comments from having two consecutive
+ U+002D HYPHEN-MINUS characters (--), the tool may insert a single
+ U+0020 SPACE character between any such offending characters.</p>
- <dt>The document being set to <i>no quirks mode</i>, <i>limited
- quirks mode</i>, or <i>quirks mode</i></dt>
+ <p>If the XML API restricts allowed characters in character data,
+ the tool may replace any U+000C FORM FEED (FF) character with a
+ U+0020 SPACE character, and any other literal non-XML character with
+ a U+FFFD REPLACEMENT CHARACTER.</p>
- <dd>To convey this information, create an attribute <code
- title="">__mode__</code> on the root element, with values
- "noquirks", "limitedquirks", or "quirks" respectively.</dd>
+ <p>If the tool has no way to convey out-of-band information, then
+ the tool may drop the following information:</p>
+ <ul>
- <dt>Elements that have a namespace without appropriate <code
- title="">xmlns</code> attributes being in scope</dt>
+ <li>Whether the document is set to <i>no quirks mode</i>,
+ <i>limited quirks mode</i>, or <i>quirks mode</i></li>
- <dd>Construct the DOM as if appropriate namespace declarations were
- in scope.</dd>
+ <li>The association between form controls and forms that aren't
+ their nearest <code>form</code> element ancestor (use of the
+ <span><code>form</code> element pointer</span> in the parser)</li>
+ </ul>
- <dt>Elements whose names contain U+003A COLON (:) characters or
- characters that cannot be represented in XML element names</dt>
+ <p class="note">The mutatiosn allowed by this section apply
+ <em>after</em> the <span>HTML parser</span>'s rules have been
+ applied. For example, a <code title=""><a::></code> start tag
+ will be closed by a <code title=""></a::></code> end tag, and
+ never by a <code title=""></aU0003AU0003A></code> end tag, even
+ if the user agent is using the rules above to then generate an
+ actual element in the DOM with the name <code
+ title="">aU0003AU0003A</code> for that start tag.</p>
- <dd>Drop the element and all its children, or replace any offending
- characters with a U+005F LOW LINE (_) character.</dd>
- <dt>Attributes named <code title="">xmlns</code> whose values match
- the namespace of the element node</dt>
-
- <dd>Construct the DOM as if these were default namespace
- declarations.</dd>
-
-
- <dt>Attributes named <code title="">xmlns:xlink</code> whose values
- match the <span>XLink namespace</span>, on elements whose namespace
- is not the <span>HTML namespace</span></dt>
-
- <dd>Construct the DOM as if these were namespace prefix
- declarations.</dd>
-
-
- <dt>Other attributes whose names are <code title="">xmlns</code> or
- start with <code title="">xmlns:</code></dt>
-
- <dd>Drop the attributes or add two U+005F LOW LINE (_) characters
- to the start of the attributes' names and replace any U+003A COLON
- (:) characters with a U+005F LOW LINE (_) character.</dd>
-
-
- <dt>Other attributes in no namespace whose names contain U+003A
- COLON (:) characters</dt>
- <dt>Attributes whose names contain characters that cannot be
- represented in XML attribute names</dt>
-
- <dd>Drop the attributes or replace any offending characters with a
- U+005F LOW LINE (_) character, dropping any attributes where doing
- this would cause an attribute name clash.</dd>
-
-
- <dt>Form controls associated with forms that aren't their
- nearest ancestor (use of the <span><code>form</code> element
- pointer</span>)</dt>
-
- <dd>Create an attribute <code title="">__formid__</code> on the
- form, with a value unique amongst <code title="">__formid__</code>
- attributes in the document, and create an attribute <code
- title="">__form__</code> on the form control, whose value matches
- the unique identifier given to the form.</dd>
-
-
- <dt>Any U+000C FORM FEED (FF) character</dt>
-
- <dd>Replace the character with a U+0020 SPACE character.</dd>
-
-
- <dt>Any other literal non-XML character</dt>
-
- <dd>Replace the character with a U+FFFD REPLACEMENT CHARACTER.</dd>
-
-
- <dt>A comment that contains two adjacent U+002D HYPHEN-MINUS
- characters (--).</dt>
-
- <dd>Insert a U+0020 SPACE character between them.</dd>
-
- </dl>
-
- <p>Tools that use these conventions should guard against documents
- that include markup that clashes with them by always dropping all
- attributes in the document that start with two U+005F LOW LINE (_)
- characters.</p>
-
- <p class="note">These conventions apply <em>after</em> the
- <span>HTML parser</span>'s rules have been applied. For example, a
- <code title=""><a::></code> start tag will be closed by a <code
- title=""></a::></code> end tag, and never by a <code
- title=""></a__></code> end tag, even if the user agent is using
- the rules above to then generate an actual element in the DOM with
- the name <code title="">a__</code> for that start tag.</p>
-
-
-
<h3>Namespaces</h3>
<p>The <dfn>HTML namespace</dfn> is: <code>http://www.w3.org/1999/xhtml</code></p>
More information about the Commit-Watchers
mailing list