[whatwg] Default encoding to UTF-8?
bzbarsky at MIT.EDU
Mon Dec 5 19:18:10 PST 2011
On 12/5/11 9:55 PM, Leif Halvard Silli wrote:
> If that is all they tested, then I'd said they did not test enough.
That's normal for the web.
>> (For the record, reading a particular page in a language is a much
>> simpler task than reading the language; I can't "read German", but I can
>> certainly read a German subway map.)
> Or Polish subway map - which doesn't default to said encoding.
Indeed. I don't think anyone thinks the existing situation is all fine
> I said I agreed with him that Faruk's solution was not good. However, I
> would not be against treating<DOCTYPE html> as a 'default to UTF-8'
This might work, if there hasn't been too much cargo-culting yet. Data
>> Not unless we change the authoring tools. Half the time these things
>> are just directly exported from a word processor.
> Please educate me. I'm perhaps 'handicapped' in that regard: I haven't
> used MS Word on a regular basis since MS Word 5.1 for Mac. Also, if
> "export" means "copy and paste"
It can mean that, or "save as HTML" followed by copy and paste.
> then on the Mac, everything gets
> converted via the clipboard
On Mac, the default OS encoding is UTF-8 last I checked. That's
decidedly not the case on Windows.
>>> OK: Quotation marks. However, in 'old web pages', then you also find
>>> much more use of HTML entities (such as“) than you find today.
>>> We should take advantage of that, no?
>> I have no idea what you're trying to say,
> Sorry. What I meant was that character entities are encoding
> And that lots of people - and authoring tools - have
> inserted non-ASCII letters and characters as character entities,
Sure. And lots have inserted them "directly".
> At any rate: A page which uses
> character entities for non-ascii would render the same regardless of
> encoding, hence a switch to UTF-8 would not matter for those.
Sure. We're not worried about such pages here.
More information about the whatwg