[whatwg] Please always use utf-8 for Web Workers

Ian Hickson ian at hixie.ch
Wed Oct 14 03:55:28 PDT 2009


On Fri, 25 Sep 2009, Simon Pieters wrote:
>
> Workers are new and seems very likely to be incompatible with existing 
> scripts. So it is not subject to legacy content with legacy encodings. 
> Therefore, we should be able to always use utf-8 for workers. Always 
> using utf-8 is simpler to implement and test and encourages people to 
> switch to utf-8 elsewhere.

On Fri, 25 Sep 2009, Jonathan Cook wrote:
>
> The importScripts portion of the Web Workers API is compatible with 
> existing scripts, but I'm all for more UTF-8 :)  If the restriction is 
> added to the spec, I'd want to know that a very clear error was going to 
> be thrown explaining the problem.

On Fri, 25 Sep 2009, Simon Pieters wrote:
> 
> I'm not sure that throwing an error is a good idea. Would you throw an 
> error when there's no declared encoding? That seems to be annoying for 
> the common case of just using ASCII characters. Throwing an error when 
> there is a declared encoding that is not utf-8 might work, but are there 
> many scripts that have a declared encoding and are not utf-8?
> 
> I think it is to just ignore any declared encoding and assume utf-8. If 
> people are using non-ascii in another encoding, then they would notice 
> by seeing that their text looks like garbage. Browsers could also log 
> messages to their error consoles about encoding declarations declaring 
> non-utf-8 and/or sequences of bytes that are not valid utf-8.

On Fri, 25 Sep 2009, Drew Wilson wrote:
>
> Are you saying that if I load a script via a <script> tag in a web page, 
> then load it via importScripts() in a worker, that the result of loading 
> that script in those two cases should/could be different because of 
> different decoding mechanisms?
>
> If that's what's being proposed, that seems bad.

On Fri, 25 Sep 2009, Anne van Kesteren wrote:
>
> That could happen already if the script loaded via <script> did not have 
> an encoding set and got it from <script charset>.

On Fri, 25 Sep 2009, Drew Wilson wrote:
>
> Certainly. If I explicitly override the charset, then that seems like 
> reasonable behavior. Having the default decoding vary between 
> importScripts() and <script> seems bad, especially since you can't 
> override charsets with importScripts().

On Fri, 25 Sep 2009, Anne van Kesteren wrote:
> 
> It does not need to be overridden per se. If the document character 
> encoding is different from UTF-8 then a script loaded through <script> 
> will be decoded differently from a script loaded through importScripts() 
> as well.

On Mon, 28 Sep 2009, Michael Nordman wrote:
>
> Leaving legacy encodings behind would be a good thing if we can get away 
> with it... jmho.

Ok, I've mode workers assume UTF-8 always.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


More information about the whatwg mailing list