[html5] r5258 - [e] (0) Some more references to UTF-8.

Mon Aug 9 18:16:12 PDT 2010

Author: ianh
Date: 2010-08-09 18:16:10 -0700 (Mon, 09 Aug 2010)
New Revision: 5258

Modified:
   complete.html
   index
   source
Log:
[e] (0) Some more references to UTF-8.

Modified: complete.html
===================================================================

--- complete.html	2010-08-10 00:58:32 UTC (rev 5257)
+++ complete.html	2010-08-10 01:16:10 UTC (rev 5258)
@@ -13408,12 +13408,12 @@
   <a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a>.</p>
 
   <p>Authors are encouraged to use UTF-8. Conformance checkers may
-  advise authors against using legacy encodings.</p>
+  advise authors against using legacy encodings. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   <div class=impl>
 
   <p>Authoring tools should default to using UTF-8 for newly-created
-  documents.</p>
+  documents. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   </div>
 
@@ -27759,7 +27759,7 @@
 
   <p>A <dfn id=websrt-file>WebSRT file</dfn> must consist of a <a href=#websrt-file-body>WebSRT file
   body</a> encoded as UTF-8 and labeled with the <a href=#mime-type>MIME
-  type</a> <code><a href=#text/srt>text/srt</a></code>.</p>
+  type</a> <code><a href=#text/srt>text/srt</a></code>. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   <p>A <dfn id=websrt-file-body>WebSRT file body</dfn> consists of zero or more <a href=#websrt-line-terminator title="WebSRT line terminator">WebSRT line terminators</a>,
   followed by zero or more <a href=#websrt-cue title="WebSRT cue">WebSRT cues</a>
@@ -28027,7 +28027,7 @@
   interpreting them as UTF-8, and then must parse the resulting string
   according to the <a href=#websrt-parser-algorithm>WebSRT parser algorithm</a> below. This
   results in <a href=#timed-track-cue title="timed track cue">timed track cues</a>
-  being added to <var title="">output</var>.</p>
+  being added to <var title="">output</var>. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   <p>A <a href=#websrt-parser>WebSRT parser</a>, specifically its conversion and
   parsing steps, is typically run asynchronously, with the input byte
@@ -61630,7 +61630,7 @@
   encoded using UTF-8. Data in application cache manifests is
   line-based. Newlines must be represented by U+000A LINE FEED (LF)
   characters, U+000D CARRIAGE RETURN (CR) characters, or U+000D
-  CARRIAGE RETURN (CR) U+000A LINE FEED (LF) pairs.</p>
+  CARRIAGE RETURN (CR) U+000A LINE FEED (LF) pairs. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   <p class=note>This is a <a href=#willful-violation>willful violation</a> of two
   aspects of RFC 2046, which requires all <code title="">text/*</code>
@@ -61790,7 +61790,7 @@
    a U+FFFD REPLACEMENT CHARACTER. <!--All U+0000 NULL characters must
    be replaced by U+FFFD REPLACEMENT CHARACTERs. (this isn't black-box
    testable since neither U+0000 nor U+FFFD are valid anywhere in the
-   syntax and thus both will be treated the same anyway)--></li>
+   syntax and thus both will be treated the same anyway)--> <a href=#refsRFC3629>[RFC3629]</a></li>
 
    <li><p>Let <var title="">base URL</var> be the <a href=#absolute-url>absolute
    URL</a> representing the manifest.</li>
@@ -70765,7 +70765,7 @@
     steps.</p>
 
     <p>If the attempt succeeds, then convert the script resource to
-    Unicode by assuming it was encoded as UTF-8, to obtain its <var title="">source</var>.</p>
+    Unicode by assuming it was encoded as UTF-8, to obtain its <var title="">source</var>. <a href=#refsRFC3629>[RFC3629]</a></p>
 
     <p>Let <var title="">language</var> be JavaScript.</p>
 
@@ -71510,7 +71510,7 @@
       steps.</p>
 
       <p>If the attempt succeeds, then convert the script resource to
-      Unicode by assuming it was encoded as UTF-8, to obtain its <var title="">source</var>.</p>
+      Unicode by assuming it was encoded as UTF-8, to obtain its <var title="">source</var>. <a href=#refsRFC3629>[RFC3629]</a></p>
 
       <p>Let <var title="">language</var> be JavaScript.</p>
 
@@ -72091,7 +72091,7 @@
                 ; a Unicode character other than U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR)</pre>
 
   <p>Event streams in this format must always be encoded as
-  UTF-8.</p>
+  UTF-8. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   <p>Lines must be separated by either a U+000D CARRIAGE RETURN U+000A
   LINE FEED (CRLF) character pair, a single U+000A LINE FEED (LF)
@@ -72107,8 +72107,9 @@
 
   <h4 id=event-stream-interpretation><span class=secno>10.2.5 </span>Interpreting an event stream</h4>
 
-  <p>Bytes or sequences of bytes that are not valid UTF-8 sequences
-  must be interpreted as the U+FFFD REPLACEMENT CHARACTER.</p>
+  <p>Streams must be decoded as UTF-8 text. Bytes or sequences of
+  bytes that are not valid UTF-8 sequences must be interpreted as the
+  U+FFFD REPLACEMENT CHARACTER. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
   any are present.</p>
@@ -78703,7 +78704,7 @@
   <h5 id=character-encodings-0><span class=secno>12.2.2.2 </span>Character encodings</h5>
 
   <p>User agents must at a minimum support the UTF-8 and Windows-1252
-  encodings, but may support more.</p>
+  encodings, but may support more. <a href=#refsRFC3629>[RFC3629]</a> <a href=#refsWIN1252>[WIN1252]</a></p>
 
   <p class=note>It is not unusual for Web browsers to support dozens
   if not upwards of a hundred distinct character encodings.</p>

Modified: index
===================================================================
--- index	2010-08-10 00:58:32 UTC (rev 5257)
+++ index	2010-08-10 01:16:10 UTC (rev 5258)
@@ -13332,12 +13332,12 @@
   <a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a>.</p>
 
   <p>Authors are encouraged to use UTF-8. Conformance checkers may
-  advise authors against using legacy encodings.</p>
+  advise authors against using legacy encodings. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   <div class=impl>
 
   <p>Authoring tools should default to using UTF-8 for newly-created
-  documents.</p>
+  documents. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   </div>
 
@@ -27686,7 +27686,7 @@
 
   <p>A <dfn id=websrt-file>WebSRT file</dfn> must consist of a <a href=#websrt-file-body>WebSRT file
   body</a> encoded as UTF-8 and labeled with the <a href=#mime-type>MIME
-  type</a> <code><a href=#text/srt>text/srt</a></code>.</p>
+  type</a> <code><a href=#text/srt>text/srt</a></code>. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   <p>A <dfn id=websrt-file-body>WebSRT file body</dfn> consists of zero or more <a href=#websrt-line-terminator title="WebSRT line terminator">WebSRT line terminators</a>,
   followed by zero or more <a href=#websrt-cue title="WebSRT cue">WebSRT cues</a>
@@ -27954,7 +27954,7 @@
   interpreting them as UTF-8, and then must parse the resulting string
   according to the <a href=#websrt-parser-algorithm>WebSRT parser algorithm</a> below. This
   results in <a href=#timed-track-cue title="timed track cue">timed track cues</a>
-  being added to <var title="">output</var>.</p>
+  being added to <var title="">output</var>. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   <p>A <a href=#websrt-parser>WebSRT parser</a>, specifically its conversion and
   parsing steps, is typically run asynchronously, with the input byte
@@ -61566,7 +61566,7 @@
   encoded using UTF-8. Data in application cache manifests is
   line-based. Newlines must be represented by U+000A LINE FEED (LF)
   characters, U+000D CARRIAGE RETURN (CR) characters, or U+000D
-  CARRIAGE RETURN (CR) U+000A LINE FEED (LF) pairs.</p>
+  CARRIAGE RETURN (CR) U+000A LINE FEED (LF) pairs. <a href=#refsRFC3629>[RFC3629]</a></p>
 
   <p class=note>This is a <a href=#willful-violation>willful violation</a> of two
   aspects of RFC 2046, which requires all <code title="">text/*</code>
@@ -61726,7 +61726,7 @@
    a U+FFFD REPLACEMENT CHARACTER. <!--All U+0000 NULL characters must
    be replaced by U+FFFD REPLACEMENT CHARACTERs. (this isn't black-box
    testable since neither U+0000 nor U+FFFD are valid anywhere in the
-   syntax and thus both will be treated the same anyway)--></li>
+   syntax and thus both will be treated the same anyway)--> <a href=#refsRFC3629>[RFC3629]</a></li>
 
    <li><p>Let <var title="">base URL</var> be the <a href=#absolute-url>absolute
    URL</a> representing the manifest.</li>
@@ -71814,7 +71814,7 @@
   <h5 id=character-encodings-0><span class=secno>10.2.2.2 </span>Character encodings</h5>
 
   <p>User agents must at a minimum support the UTF-8 and Windows-1252
-  encodings, but may support more.</p>
+  encodings, but may support more. <a href=#refsRFC3629>[RFC3629]</a> <a href=#refsWIN1252>[WIN1252]</a></p>
 
   <p class=note>It is not unusual for Web browsers to support dozens
   if not upwards of a hundred distinct character encodings.</p>

Modified: source
===================================================================
--- source	2010-08-10 00:58:32 UTC (rev 5257)
+++ source	2010-08-10 01:16:10 UTC (rev 5258)
@@ -14071,12 +14071,13 @@
   <span>ASCII-compatible character encoding</span>.</p>
 
   <p>Authors are encouraged to use UTF-8. Conformance checkers may
-  advise authors against using legacy encodings.</p>
+  advise authors against using legacy encodings. <a
+  href="#refsRFC3629">[RFC3629]</a></p>
 
   <div class="impl">
 
   <p>Authoring tools should default to using UTF-8 for newly-created
-  documents.</p>
+  documents. <a href="#refsRFC3629">[RFC3629]</a></p>
 
   </div>
 
@@ -30126,7 +30127,7 @@
 
   <p>A <dfn>WebSRT file</dfn> must consist of a <span>WebSRT file
   body</span> encoded as UTF-8 and labeled with the <span>MIME
-  type</span> <code>text/srt</code>.</p>
+  type</span> <code>text/srt</code>. <a href="#refsRFC3629">[RFC3629]</a></p>
 
   <p>A <dfn>WebSRT file body</dfn> consists of zero or more <span
   title="WebSRT line terminator">WebSRT line terminators</span>,
@@ -30474,7 +30475,7 @@
   interpreting them as UTF-8, and then must parse the resulting string
   according to the <span>WebSRT parser algorithm</span> below. This
   results in <span title="timed track cue">timed track cues</span>
-  being added to <var title="">output</var>.</p>
+  being added to <var title="">output</var>. <a href="#refsRFC3629">[RFC3629]</a></p>
 
   <p>A <span>WebSRT parser</span>, specifically its conversion and
   parsing steps, is typically run asynchronously, with the input byte
@@ -69633,7 +69634,7 @@
   encoded using UTF-8. Data in application cache manifests is
   line-based. Newlines must be represented by U+000A LINE FEED (LF)
   characters, U+000D CARRIAGE RETURN (CR) characters, or U+000D
-  CARRIAGE RETURN (CR) U+000A LINE FEED (LF) pairs.</p>
+  CARRIAGE RETURN (CR) U+000A LINE FEED (LF) pairs. <a href="#refsRFC3629">[RFC3629]</a></p>
 
   <p class="note">This is a <span>willful violation</span> of two
   aspects of RFC 2046, which requires all <code title="">text/*</code>
@@ -69816,7 +69817,7 @@
    a U+FFFD REPLACEMENT CHARACTER. <!--All U+0000 NULL characters must
    be replaced by U+FFFD REPLACEMENT CHARACTERs. (this isn't black-box
    testable since neither U+0000 nor U+FFFD are valid anywhere in the
-   syntax and thus both will be treated the same anyway)--></p></li>
+   syntax and thus both will be treated the same anyway)--> <a href="#refsRFC3629">[RFC3629]</a></p></li>
 
    <li><p>Let <var title="">base URL</var> be the <span>absolute
    URL</span> representing the manifest.</p></li>
@@ -79552,7 +79553,7 @@
 
     <p>If the attempt succeeds, then convert the script resource to
     Unicode by assuming it was encoded as UTF-8, to obtain its <var
-    title="">source</var>.</p>
+    title="">source</var>. <a href="#refsRFC3629">[RFC3629]</a></p>
 
     <p>Let <var title="">language</var> be JavaScript.</p>
 
@@ -80425,7 +80426,7 @@
 
       <p>If the attempt succeeds, then convert the script resource to
       Unicode by assuming it was encoded as UTF-8, to obtain its <var
-      title="">source</var>.</p>
+      title="">source</var>. <a href="#refsRFC3629">[RFC3629]</a></p>
 
       <p>Let <var title="">language</var> be JavaScript.</p>
 
@@ -81105,7 +81106,7 @@
                 ; a Unicode character other than U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR)</pre>
 
   <p>Event streams in this format must always be encoded as
-  UTF-8.</p>
+  UTF-8. <a href="#refsRFC3629">[RFC3629]</a></p>
 
   <p>Lines must be separated by either a U+000D CARRIAGE RETURN U+000A
   LINE FEED (CRLF) character pair, a single U+000A LINE FEED (LF)
@@ -81121,8 +81122,9 @@
 
   <h4 id="event-stream-interpretation">Interpreting an event stream</h4>
 
-  <p>Bytes or sequences of bytes that are not valid UTF-8 sequences
-  must be interpreted as the U+FFFD REPLACEMENT CHARACTER.</p>
+  <p>Streams must be decoded as UTF-8 text. Bytes or sequences of
+  bytes that are not valid UTF-8 sequences must be interpreted as the
+  U+FFFD REPLACEMENT CHARACTER. <a href="#refsRFC3629">[RFC3629]</a></p>
 
   <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
   any are present.</p>
@@ -89841,7 +89843,9 @@
   <h5>Character encodings</h5>
 
   <p>User agents must at a minimum support the UTF-8 and Windows-1252
-  encodings, but may support more.</p>
+  encodings, but may support more. <a
+  href="#refsRFC3629">[RFC3629]</a> <a
+  href="#refsWIN1252">[WIN1252]</a></p>
 
   <p class="note">It is not unusual for Web browsers to support dozens
   if not upwards of a hundred distinct character encodings.</p>