[html5] r960 - /
whatwg at whatwg.org
whatwg at whatwg.org
Sat Jun 23 02:53:30 PDT 2007
Author: ianh
Date: 2007-06-23 02:49:27 -0700 (Sat, 23 Jun 2007)
New Revision: 960
Modified:
index
source
Log:
[e] (1) Make a new section to put various charset support reqs together; fix a xref
Modified: index
===================================================================
--- index 2007-06-23 09:40:45 UTC (rev 959)
+++ index 2007-06-23 09:49:27 UTC (rev 960)
@@ -1475,10 +1475,13 @@
<li><a href="#determining0"><span class=secno>8.2.2.1.
</span>Determining the character encoding</a>
- <li><a href="#preprocessing"><span class=secno>8.2.2.2.
+ <li><a href="#character0"><span class=secno>8.2.2.2.
+ </span>Character encoding requirements</a>
+
+ <li><a href="#preprocessing"><span class=secno>8.2.2.3.
</span>Preprocessing the input stream</a>
- <li><a href="#changing"><span class=secno>8.2.2.3. </span>Changing
+ <li><a href="#changing"><span class=secno>8.2.2.4. </span>Changing
the encoding while parsing</a>
</ul>
@@ -32363,14 +32366,14 @@
described below.
<p>RCDATA elements can have <a href="#text1" title=syntax-text>text</a> and
- <a href="#character0" title=syntax-entities>character entity
+ <a href="#character1" title=syntax-entities>character entity
references</a>, but the text must not contain an <a href="#ambiguous"
title=syntax-ambiguous-ampersand>ambiguous ampersand</a>. There are also
<a href="#cdata-rcdata-restrictions">further restrictions</a> described
below.
<p>Normal elements can have <a href="#text1" title=syntax-text>text</a>, <a
- href="#character0" title=syntax-entities>character entity references</a>,
+ href="#character1" title=syntax-entities>character entity references</a>,
other <a href="#elements2" title=syntax-elements>elements</a>, and <a
href="#comments0" title=syntax-comments>comments</a>, but the text must
not contain the character U+003C LESS-THAN SIGN (<code><</code>) or an
@@ -32466,7 +32469,7 @@
<p><dfn id=attribute0 title=syntax-attribute-value>Attribute values</dfn>
are a mixture of <a href="#text1" title=syntax-text>text</a> and <a
- href="#character0" title=syntax-entities>character entity references</a>,
+ href="#character1" title=syntax-entities>character entity references</a>,
except with the additional restriction that the text cannot contain an <a
href="#ambiguous" title=syntax-ambiguous-ampersand>ambiguous
ampersand</a>.
@@ -32805,7 +32808,7 @@
<p>An <dfn id=escaping title=syntax-escape>escaping text span</dfn> is a
span of <a href="#text1" title=syntax-text>text</a> (in CDATA and RCDATA
- elements) and <a href="#character0" title=syntax-entities>character entity
+ elements) and <a href="#character1" title=syntax-entities>character entity
references</a> (in RCDATA elements) that starts with an <a
href="#escaping0" title=syntax-escape-start>escaping text span start</a>
that is not itself in an <a href="#escaping" title=syntax-escape>escaping
@@ -32854,7 +32857,7 @@
references</h4>
<p>In certain cases described in other sections, <a href="#text1"
- title=syntax-text>text</a> may be mixed with <dfn id=character0
+ title=syntax-text>text</a> may be mixed with <dfn id=character1
title=syntax-entities>character entity references</dfn>. These can be used
to escape characters that couldn't otherwise legally be included in <a
href="#text1" title=syntax-text>text</a>.
@@ -33466,6 +33469,9 @@
may heuristically decide which to use as a default.
</ol>
+ <h5 id=character0><span class=secno>8.2.2.2. </span>Character encoding
+ requirements</h5>
+
<p>User agents must at a minimum support the UTF-8 and Windows-1252
encodings, but may support more.
@@ -33477,13 +33483,6 @@
all the IANA-registered aliases. <a
href="#refsIANACHARSET">[IANACHARSET]</a>
- <h5 id=preprocessing><span class=secno>8.2.2.2. </span>Preprocessing the
- input stream</h5>
-
- <p>Given an encoding, the bytes in the input stream must be converted to
- Unicode characters for the tokeniser, as described by the rules for that
- encoding.
-
<p>When a user agent would otherwise use the ISO-8859-1 encoding, it must
instead use the Windows-1252 encoding. User agents must not support the
CESU-8, UTF-7, BOCU-1 and SCSU encodings. <a href="#refsCESU8">[CESU8]</a>
@@ -33493,6 +33492,13 @@
<p>Support for UTF-32 is not recommended. This encoding is rarely used, and
frequently misimplemented.
+ <h5 id=preprocessing><span class=secno>8.2.2.3. </span>Preprocessing the
+ input stream</h5>
+
+ <p>Given an encoding, the bytes in the input stream must be converted to
+ Unicode characters for the tokeniser, as described by the rules for that
+ encoding.
+
<p>Bytes or sequences of bytes in the original byte stream that could not
be converted to Unicode characters must be converted to U+FFFD REPLACEMENT
CHARACTER code points.
@@ -33532,7 +33538,7 @@
method) is consumed. Otherwise, the "EOF" character is not a real
character in the stream, but rather the lack of any further characters.
- <h5 id=changing><span class=secno>8.2.2.3. </span>Changing the encoding
+ <h5 id=changing><span class=secno>8.2.2.4. </span>Changing the encoding
while parsing</h5>
<p>When the parser requires the user agent to <dfn id=change>change the
@@ -36386,13 +36392,13 @@
<p><a href="#insert" title="insert an html element">Insert an HTML
element</a> for the token.</p>
- <p>If the element has a <code title=attr-meta-charset><a
- href="#charset0">charset</a></code> attribute, and its value is a
- supported encoding, and the <a href="#confidence"
- title=concept-encoding-confidence>confidence</a> is currently
- <i>tentative</i>, then <a href="#change">change the encoding</a> to
- the encoding given by the value of the <code
+ <p id=meta-charset-during-parse>If the element has a <code
title=attr-meta-charset><a href="#charset0">charset</a></code>
+ attribute, and its value is a supported encoding, and the <a
+ href="#confidence" title=concept-encoding-confidence>confidence</a>
+ is currently <i>tentative</i>, then <a href="#change">change the
+ encoding</a> to the encoding given by the value of the <code
+ title=attr-meta-charset><a href="#charset0">charset</a></code>
attribute.</p>
<p>Otherwise, if the element has a <code title=attr-meta-charset><a
Modified: source
===================================================================
--- source 2007-06-23 09:40:45 UTC (rev 959)
+++ source 2007-06-23 09:49:27 UTC (rev 960)
@@ -30976,6 +30976,9 @@
</ol>
+
+ <h5>Character encoding requirements</h5>
+
<p>User agents must at a minimum support the UTF-8 and Windows-1252
encodings, but may support more.</p>
@@ -30987,13 +30990,6 @@
should support all the IANA-registered aliases. <a
href="#refsIANACHARSET">[IANACHARSET]</a></p>
-
- <h5>Preprocessing the input stream</h5>
-
- <p>Given an encoding, the bytes in the input stream must be
- converted to Unicode characters for the tokeniser, as described by
- the rules for that encoding.</p>
-
<p>When a user agent would otherwise use the ISO-8859-1 encoding, it
must instead use the Windows-1252 encoding. User agents must not
support the CESU-8, UTF-7, BOCU-1 and SCSU encodings. <a
@@ -31003,6 +30999,14 @@
<p>Support for UTF-32 is not recommended. This encoding is rarely
used, and frequently misimplemented.</p>
+
+
+ <h5>Preprocessing the input stream</h5>
+
+ <p>Given an encoding, the bytes in the input stream must be
+ converted to Unicode characters for the tokeniser, as described by
+ the rules for that encoding.</p>
+
<p>Bytes or sequences of bytes in the original byte stream that
could not be converted to Unicode characters must be converted to
U+FFFD REPLACEMENT CHARACTER code points.</p>
@@ -33524,7 +33528,7 @@
<p><span title="insert an html element">Insert an HTML
element</span> for the token.</p>
- <p>If the element has a <code
+ <p id="meta-charset-during-parse">If the element has a <code
title="attr-meta-charset">charset</code> attribute, and its
value is a supported encoding, and the <span
title="concept-encoding-confidence">confidence</span> is
More information about the Commit-Watchers
mailing list