[html5] r5545 - [giow] (2) Match Gecko for character encoding processing for <script> Fixing htt [...]

whatwg at whatwg.org whatwg at whatwg.org
Tue Sep 28 18:04:33 PDT 2010


Author: ianh
Date: 2010-09-28 18:04:32 -0700 (Tue, 28 Sep 2010)
New Revision: 5545

Modified:
   complete.html
   index
   source
Log:
[giow] (2) Match Gecko for character encoding processing for <script>
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=10656

Modified: complete.html
===================================================================
--- complete.html	2010-09-29 00:07:50 UTC (rev 5544)
+++ complete.html	2010-09-29 01:04:32 UTC (rev 5545)
@@ -14305,10 +14305,12 @@
   <code><a href=#document>Document</a></code> objects can also have this flag set; it's
   propagated to the <code><a href=#document>Document</a></code> when the script runs.</p>
 
-  <p>The fifth and sixth pieces of state are <dfn id="the-script-block's-type"><var>the script
-  block's type</var></dfn> and <dfn id="the-script-block's-character-encoding"><var>the script block's character
-  encoding</var></dfn>. They are determined when the script is run,
-  based on the attributes on the element at that time.</p>
+  <p>The last few pieces of state are <dfn id="the-script-block's-type"><var>the script block's
+  type</var></dfn>, <dfn id="the-script-block's-character-encoding"><var>the script block's character
+  encoding</var></dfn>, and <dfn id="the-script-block's-fallback-character-encoding"><var>the script block's fallback
+  character encoding</var></dfn>. They are determined when the script
+  is run, based on the attributes on the element at that time, and the
+  <code><a href=#document>Document</a></code> of the <code><a href=#script>script</a></code> element.</p>
 
   <p>When a <code><a href=#script>script</a></code> element that is not marked as being
   <a href=#parser-inserted>"parser-inserted"</a> experiences one of the events listed
@@ -14466,10 +14468,13 @@
     <var><a href="#the-script-block's-character-encoding">the script block's character encoding</a></var> for this
     <code><a href=#script>script</a></code> element be the encoding given by the <code title=attr-script-charset><a href=#attr-script-charset>charset</a></code> attribute.</p>
 
-    <p>Otherwise, let <var><a href="#the-script-block's-character-encoding">the script block's character encoding</a></var>
-    for this <code><a href=#script>script</a></code> element be the same as <a href="#document's-character-encoding" title="document's character encoding">the encoding of the document
-    itself</a>.</p>
+    <p>Otherwise, let <var><a href="#the-script-block's-fallback-character-encoding">the script block's fallback character
+    encoding</a></var> for this <code><a href=#script>script</a></code> element be the same as
+    <a href="#document's-character-encoding" title="document's character encoding">the encoding of the
+    document itself</a>.</p>
 
+    <p class=note>Only one of these two pieces of state is set.</p>
+
    </li>
 
    <li id=script-processing-src-prepare>
@@ -14495,13 +14500,6 @@
     user agent must act as if it had received an empty HTTP 400
     response.</p>
 
-    <p>Once the resource's <a href=#content-type title=Content-Type>Content Type
-    metadata</a> is available, if it ever is, apply the
-    <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for extracting an encoding from a
-    Content-Type</a> to it. If this returns an encoding, and the
-    user agent supports that encoding, then let <var><a href="#the-script-block's-character-encoding">the script
-    block's character encoding</a></var> be that encoding.</p>
-
     <p>For performance reasons, user agents may start fetching the
     script as soon as the attribute is set, instead, in the hope that
     the element will be inserted into the document. Either way, once
@@ -14648,44 +14646,64 @@
         <p>The contents of that file, interpreted as string of
         Unicode characters, are the script source.</p>
 
-        <p>For each of the rows in the following table, starting with
-        the first one and going down, if the file has as many or more
-        bytes available than the number of bytes in the first column,
-        and the first bytes of the file match the bytes given in the
-        first column, then set <var><a href="#the-script-block's-character-encoding">the script block's character
-        encoding</a></var> to the encoding given in the cell in the second
-        column of that row, irrespective of any previous value:</p>
+        <p>To obtain the string of Unicode characters, the user agent
+        run the following steps:</p>
 
-        <!-- this table is present in several forms in this file; keep them in sync -->
-        <table id=table-script-bom><thead><tr><th>Bytes in Hexadecimal
-           <th>Encoding
-         <tbody><!-- nobody uses this
-          <tr>
-           <td>00 00 FE FF
-           <td>UTF-32BE
-          <tr>
-           <td>FF FE 00 00
-           <td>UTF-32LE
---><tr><td>FE FF
-           <td>Big-endian UTF-16
-          <tr><td>FF FE
-           <td>Little-endian UTF-16
-          <tr><td>EF BB BF
-           <td>UTF-8
-<!-- nobody uses this
-          <tr>
-           <td>DD 73 66 73
-           <td>UTF-EBCDIC
--->
-        </table><p class=note>This step looks for Unicode Byte Order Marks
-        (BOMs).</p>
+        <ol><li><p>If the resource's <a href=#content-type title=Content-Type>Content
+         Type metadata</a>, if any, specifies a character encoding,
+         and the user agent supports that encoding, then let <var title="">character encoding</var> be that encoding, and jump
+         to the bottom step in this series of steps.</li>
 
-        <p>The file must then be converted to Unicode using the
-        character encoding given by <var><a href="#the-script-block's-character-encoding">the script block's character
-        encoding</a></var>.</p>
+         <li><p>If the algorithm above set <var><a href="#the-script-block's-character-encoding">the script block's
+         character encoding</a></var>, then let <var title="">character
+         encoding</var> be that encoding, and jump to the bottom step
+         in this series of steps.</li>
 
-       </dd>
+         <li><p>For each of the rows in the following table, starting
+         with the first one and going down, if the file has as many or
+         more bytes available than the number of bytes in the first
+         column, and the first bytes of the file match the bytes given
+         in the first column, then set <var title="">character
+         encoding</var> to the encoding given in the cell in the
+         second column of that row, and jump to the bottom step in
+         this series of steps:</p>
 
+          <!-- this table is present in several forms in this file; keep them in sync -->
+          <table id=table-script-bom><thead><tr><th>Bytes in Hexadecimal
+             <th>Encoding
+           <tbody><!-- nobody uses this
+            <tr>
+             <td>00 00 FE FF
+             <td>UTF-32BE
+            <tr>
+             <td>FF FE 00 00
+             <td>UTF-32LE
+  --><tr><td>FE FF
+             <td>Big-endian UTF-16
+            <tr><td>FF FE
+             <td>Little-endian UTF-16
+            <tr><td>EF BB BF
+             <td>UTF-8
+  <!-- nobody uses this
+            <tr>
+             <td>DD 73 66 73
+             <td>UTF-EBCDIC
+  -->
+          </table><p class=note>This step looks for Unicode Byte Order Marks
+          (BOMs).</p>
+
+         </li>
+
+         <li><p>Let <var title="">character encoding</var> be <var><a href="#the-script-block's-fallback-character-encoding">the
+         script block's fallback character encoding</a></var>.</li>
+
+         <li><p>Convert the file to Unicode using <var>character
+         encoding</var>, following the rules for doing so given by the
+         specification for <var><a href="#the-script-block's-type">the script block's
+         type</a></var>.</li>
+
+        </ol></dd>
+
        <dt>If the script is from an external file and <var><a href="#the-script-block's-type">the script block's type</a></var> is an XML-based language</dt>
 
        <dd>

Modified: index
===================================================================
--- index	2010-09-29 00:07:50 UTC (rev 5544)
+++ index	2010-09-29 01:04:32 UTC (rev 5545)
@@ -14282,10 +14282,12 @@
   <code><a href=#document>Document</a></code> objects can also have this flag set; it's
   propagated to the <code><a href=#document>Document</a></code> when the script runs.</p>
 
-  <p>The fifth and sixth pieces of state are <dfn id="the-script-block's-type"><var>the script
-  block's type</var></dfn> and <dfn id="the-script-block's-character-encoding"><var>the script block's character
-  encoding</var></dfn>. They are determined when the script is run,
-  based on the attributes on the element at that time.</p>
+  <p>The last few pieces of state are <dfn id="the-script-block's-type"><var>the script block's
+  type</var></dfn>, <dfn id="the-script-block's-character-encoding"><var>the script block's character
+  encoding</var></dfn>, and <dfn id="the-script-block's-fallback-character-encoding"><var>the script block's fallback
+  character encoding</var></dfn>. They are determined when the script
+  is run, based on the attributes on the element at that time, and the
+  <code><a href=#document>Document</a></code> of the <code><a href=#script>script</a></code> element.</p>
 
   <p>When a <code><a href=#script>script</a></code> element that is not marked as being
   <a href=#parser-inserted>"parser-inserted"</a> experiences one of the events listed
@@ -14443,10 +14445,13 @@
     <var><a href="#the-script-block's-character-encoding">the script block's character encoding</a></var> for this
     <code><a href=#script>script</a></code> element be the encoding given by the <code title=attr-script-charset><a href=#attr-script-charset>charset</a></code> attribute.</p>
 
-    <p>Otherwise, let <var><a href="#the-script-block's-character-encoding">the script block's character encoding</a></var>
-    for this <code><a href=#script>script</a></code> element be the same as <a href="#document's-character-encoding" title="document's character encoding">the encoding of the document
-    itself</a>.</p>
+    <p>Otherwise, let <var><a href="#the-script-block's-fallback-character-encoding">the script block's fallback character
+    encoding</a></var> for this <code><a href=#script>script</a></code> element be the same as
+    <a href="#document's-character-encoding" title="document's character encoding">the encoding of the
+    document itself</a>.</p>
 
+    <p class=note>Only one of these two pieces of state is set.</p>
+
    </li>
 
    <li id=script-processing-src-prepare>
@@ -14472,13 +14477,6 @@
     user agent must act as if it had received an empty HTTP 400
     response.</p>
 
-    <p>Once the resource's <a href=#content-type title=Content-Type>Content Type
-    metadata</a> is available, if it ever is, apply the
-    <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for extracting an encoding from a
-    Content-Type</a> to it. If this returns an encoding, and the
-    user agent supports that encoding, then let <var><a href="#the-script-block's-character-encoding">the script
-    block's character encoding</a></var> be that encoding.</p>
-
     <p>For performance reasons, user agents may start fetching the
     script as soon as the attribute is set, instead, in the hope that
     the element will be inserted into the document. Either way, once
@@ -14625,44 +14623,64 @@
         <p>The contents of that file, interpreted as string of
         Unicode characters, are the script source.</p>
 
-        <p>For each of the rows in the following table, starting with
-        the first one and going down, if the file has as many or more
-        bytes available than the number of bytes in the first column,
-        and the first bytes of the file match the bytes given in the
-        first column, then set <var><a href="#the-script-block's-character-encoding">the script block's character
-        encoding</a></var> to the encoding given in the cell in the second
-        column of that row, irrespective of any previous value:</p>
+        <p>To obtain the string of Unicode characters, the user agent
+        run the following steps:</p>
 
-        <!-- this table is present in several forms in this file; keep them in sync -->
-        <table id=table-script-bom><thead><tr><th>Bytes in Hexadecimal
-           <th>Encoding
-         <tbody><!-- nobody uses this
-          <tr>
-           <td>00 00 FE FF
-           <td>UTF-32BE
-          <tr>
-           <td>FF FE 00 00
-           <td>UTF-32LE
---><tr><td>FE FF
-           <td>Big-endian UTF-16
-          <tr><td>FF FE
-           <td>Little-endian UTF-16
-          <tr><td>EF BB BF
-           <td>UTF-8
-<!-- nobody uses this
-          <tr>
-           <td>DD 73 66 73
-           <td>UTF-EBCDIC
--->
-        </table><p class=note>This step looks for Unicode Byte Order Marks
-        (BOMs).</p>
+        <ol><li><p>If the resource's <a href=#content-type title=Content-Type>Content
+         Type metadata</a>, if any, specifies a character encoding,
+         and the user agent supports that encoding, then let <var title="">character encoding</var> be that encoding, and jump
+         to the bottom step in this series of steps.</li>
 
-        <p>The file must then be converted to Unicode using the
-        character encoding given by <var><a href="#the-script-block's-character-encoding">the script block's character
-        encoding</a></var>.</p>
+         <li><p>If the algorithm above set <var><a href="#the-script-block's-character-encoding">the script block's
+         character encoding</a></var>, then let <var title="">character
+         encoding</var> be that encoding, and jump to the bottom step
+         in this series of steps.</li>
 
-       </dd>
+         <li><p>For each of the rows in the following table, starting
+         with the first one and going down, if the file has as many or
+         more bytes available than the number of bytes in the first
+         column, and the first bytes of the file match the bytes given
+         in the first column, then set <var title="">character
+         encoding</var> to the encoding given in the cell in the
+         second column of that row, and jump to the bottom step in
+         this series of steps:</p>
 
+          <!-- this table is present in several forms in this file; keep them in sync -->
+          <table id=table-script-bom><thead><tr><th>Bytes in Hexadecimal
+             <th>Encoding
+           <tbody><!-- nobody uses this
+            <tr>
+             <td>00 00 FE FF
+             <td>UTF-32BE
+            <tr>
+             <td>FF FE 00 00
+             <td>UTF-32LE
+  --><tr><td>FE FF
+             <td>Big-endian UTF-16
+            <tr><td>FF FE
+             <td>Little-endian UTF-16
+            <tr><td>EF BB BF
+             <td>UTF-8
+  <!-- nobody uses this
+            <tr>
+             <td>DD 73 66 73
+             <td>UTF-EBCDIC
+  -->
+          </table><p class=note>This step looks for Unicode Byte Order Marks
+          (BOMs).</p>
+
+         </li>
+
+         <li><p>Let <var title="">character encoding</var> be <var><a href="#the-script-block's-fallback-character-encoding">the
+         script block's fallback character encoding</a></var>.</li>
+
+         <li><p>Convert the file to Unicode using <var>character
+         encoding</var>, following the rules for doing so given by the
+         specification for <var><a href="#the-script-block's-type">the script block's
+         type</a></var>.</li>
+
+        </ol></dd>
+
        <dt>If the script is from an external file and <var><a href="#the-script-block's-type">the script block's type</a></var> is an XML-based language</dt>
 
        <dd>

Modified: source
===================================================================
--- source	2010-09-29 00:07:50 UTC (rev 5544)
+++ source	2010-09-29 01:04:32 UTC (rev 5545)
@@ -15134,10 +15134,12 @@
   <code>Document</code> objects can also have this flag set; it's
   propagated to the <code>Document</code> when the script runs.</p>
 
-  <p>The fifth and sixth pieces of state are <dfn><var>the script
-  block's type</var></dfn> and <dfn><var>the script block's character
-  encoding</var></dfn>. They are determined when the script is run,
-  based on the attributes on the element at that time.</p>
+  <p>The last few pieces of state are <dfn><var>the script block's
+  type</var></dfn>, <dfn><var>the script block's character
+  encoding</var></dfn>, and <dfn><var>the script block's fallback
+  character encoding</var></dfn>. They are determined when the script
+  is run, based on the attributes on the element at that time, and the
+  <code>Document</code> of the <code>script</code> element.</p>
 
   <p>When a <code>script</code> element that is not marked as being
   <span>"parser-inserted"</span> experiences one of the events listed
@@ -15332,11 +15334,13 @@
     <code>script</code> element be the encoding given by the <code
     title="attr-script-charset">charset</code> attribute.</p>
 
-    <p>Otherwise, let <var>the script block's character encoding</var>
-    for this <code>script</code> element be the same as <span
-    title="document's character encoding">the encoding of the document
-    itself</span>.</p>
+    <p>Otherwise, let <var>the script block's fallback character
+    encoding</var> for this <code>script</code> element be the same as
+    <span title="document's character encoding">the encoding of the
+    document itself</span>.</p>
 
+    <p class="note">Only one of these two pieces of state is set.</p>
+
    </li>
 
    <li id="script-processing-src-prepare">
@@ -15363,13 +15367,6 @@
     user agent must act as if it had received an empty HTTP 400
     response.</p>
 
-    <p>Once the resource's <span title="Content-Type">Content Type
-    metadata</span> is available, if it ever is, apply the
-    <span>algorithm for extracting an encoding from a
-    Content-Type</span> to it. If this returns an encoding, and the
-    user agent supports that encoding, then let <var>the script
-    block's character encoding</var> be that encoding.</p>
-
     <p>For performance reasons, user agents may start fetching the
     script as soon as the attribute is set, instead, in the hope that
     the element will be inserted into the document. Either way, once
@@ -15536,52 +15533,77 @@
         <p>The contents of that file, interpreted as string of
         Unicode characters, are the script source.</p>
 
-        <p>For each of the rows in the following table, starting with
-        the first one and going down, if the file has as many or more
-        bytes available than the number of bytes in the first column,
-        and the first bytes of the file match the bytes given in the
-        first column, then set <var>the script block's character
-        encoding</var> to the encoding given in the cell in the second
-        column of that row, irrespective of any previous value:</p>
+        <p>To obtain the string of Unicode characters, the user agent
+        run the following steps:</p>
 
-        <!-- this table is present in several forms in this file; keep them in sync -->
-        <table id="table-script-bom">
-         <thead>
-          <tr>
-           <th>Bytes in Hexadecimal
-           <th>Encoding
-         <tbody>
-<!-- nobody uses this
-          <tr>
-           <td>00 00 FE FF
-           <td>UTF-32BE
-          <tr>
-           <td>FF FE 00 00
-           <td>UTF-32LE
--->
-          <tr>
-           <td>FE FF
-           <td>Big-endian UTF-16
-          <tr>
-           <td>FF FE
-           <td>Little-endian UTF-16
-          <tr>
-           <td>EF BB BF
-           <td>UTF-8
-<!-- nobody uses this
-          <tr>
-           <td>DD 73 66 73
-           <td>UTF-EBCDIC
--->
-        </table>
+        <ol>
 
-        <p class="note">This step looks for Unicode Byte Order Marks
-        (BOMs).</p>
+         <li><p>If the resource's <span title="Content-Type">Content
+         Type metadata</span>, if any, specifies a character encoding,
+         and the user agent supports that encoding, then let <var
+         title="">character encoding</var> be that encoding, and jump
+         to the bottom step in this series of steps.</p></li>
 
-        <p>The file must then be converted to Unicode using the
-        character encoding given by <var>the script block's character
-        encoding</var>.</p>
+         <li><p>If the algorithm above set <var>the script block's
+         character encoding</var>, then let <var title="">character
+         encoding</var> be that encoding, and jump to the bottom step
+         in this series of steps.</p></li>
 
+         <li><p>For each of the rows in the following table, starting
+         with the first one and going down, if the file has as many or
+         more bytes available than the number of bytes in the first
+         column, and the first bytes of the file match the bytes given
+         in the first column, then set <var title="">character
+         encoding</var> to the encoding given in the cell in the
+         second column of that row, and jump to the bottom step in
+         this series of steps:</p>
+
+          <!-- this table is present in several forms in this file; keep them in sync -->
+          <table id="table-script-bom">
+           <thead>
+            <tr>
+             <th>Bytes in Hexadecimal
+             <th>Encoding
+           <tbody>
+  <!-- nobody uses this
+            <tr>
+             <td>00 00 FE FF
+             <td>UTF-32BE
+            <tr>
+             <td>FF FE 00 00
+             <td>UTF-32LE
+  -->
+            <tr>
+             <td>FE FF
+             <td>Big-endian UTF-16
+            <tr>
+             <td>FF FE
+             <td>Little-endian UTF-16
+            <tr>
+             <td>EF BB BF
+             <td>UTF-8
+  <!-- nobody uses this
+            <tr>
+             <td>DD 73 66 73
+             <td>UTF-EBCDIC
+  -->
+          </table>
+
+          <p class="note">This step looks for Unicode Byte Order Marks
+          (BOMs).</p>
+
+         </li>
+
+         <li><p>Let <var title="">character encoding</var> be <var>the
+         script block's fallback character encoding</var>.</p></li>
+
+         <li><p>Convert the file to Unicode using <var>character
+         encoding</var>, following the rules for doing so given by the
+         specification for <var>the script block's
+         type</var>.</p></li>
+
+        </ol>
+
        </dd>
 
        <dt>If the script is from an external file and <var>the script block's type</var> is an XML-based language</dt>




More information about the Commit-Watchers mailing list