[whatwg] Default encoding to UTF-8?
Leif Halvard Silli
xn--mlform-iua at xn--mlform-iua.no
Mon Dec 5 20:54:03 PST 2011
Boris Zbarsky Mon Dec 5 19:18:10 PST 2011:
> On 12/5/11 9:55 PM, Leif Halvard Silli wrote:
>> I said I agreed with him that Faruk's solution was not good. However, I
>> would not be against treating <DOCTYPE html> as a 'default to UTF-8'
> This might work, if there hasn't been too much cargo-culting yet. Data
> urgently needed!
Yeah, it would be a pity if it had already become an widespread
cargo-cult to - all at once - use HTML5 doctype without using UTF-8
*and* without using some encoding declaration *and* thus effectively
relying on the default locale encoding ... Who does have a data corpus?
Henri, as Validator.nu developer?
This change would involve adding one more step in the HTML5 parser's
encoding sniffing algorithm.  The question then is when, upon seeing
the HTML5 doctype, the default to UTF-8 ought to happen, in order to be
useful. It seems it would have to happen after the processing of the
explicit meta data (Step 1 to 5) but before the last 3 steps - step 6,
7 and 8:
Step 6: 'if the user agent has information on the likely encoding'
Step 7: UA 'may attempt to autodetect the character encoding'
Step 8: 'implementation-defined or user-specified default'
The role of the HTML5 DOCTYPE, encoding wise, would then be to ensure
that step 6 to 8 does not happen.
Leif H Silli
More information about the whatwg