[whatwg] Default encoding to UTF-8?
Leif Halvard Silli
xn--mlform-iua at xn--mlform-iua.no
Mon Dec 5 20:54:03 PST 2011
Boris Zbarsky Mon Dec 5 19:18:10 PST 2011:
> On 12/5/11 9:55 PM, Leif Halvard Silli wrote:
>> I said I agreed with him that Faruk's solution was not good. However, I
>> would not be against treating <DOCTYPE html> as a 'default to UTF-8'
>> declaration
>
> This might work, if there hasn't been too much cargo-culting yet. Data
> urgently needed!
Yeah, it would be a pity if it had already become an widespread
cargo-cult to - all at once - use HTML5 doctype without using UTF-8
*and* without using some encoding declaration *and* thus effectively
relying on the default locale encoding ... Who does have a data corpus?
Henri, as Validator.nu developer?
This change would involve adding one more step in the HTML5 parser's
encoding sniffing algorithm. [1] The question then is when, upon seeing
the HTML5 doctype, the default to UTF-8 ought to happen, in order to be
useful. It seems it would have to happen after the processing of the
explicit meta data (Step 1 to 5) but before the last 3 steps - step 6,
7 and 8:
Step 6: 'if the user agent has information on the likely encoding'
Step 7: UA 'may attempt to autodetect the character encoding'
Step 8: 'implementation-defined or user-specified default'
The role of the HTML5 DOCTYPE, encoding wise, would then be to ensure
that step 6 to 8 does not happen.
[1] http://dev.w3.org/html5/spec/parsing#encoding-sniffing-algorithm
--
Leif H Silli
More information about the whatwg
mailing list