[whatwg] Null characters
Boris Zbarsky
bzbarsky at MIT.EDU
Tue Oct 9 07:21:40 PDT 2012
On 10/9/12 12:09 AM, Cameron Zemek wrote:
> How is it not web-compatible?
Because shipping it "breaks" sites. As in, makes them render
differently than they do in current browsers, sufficiently so that it's
a problem.
> Yeah I don't have any numbers to see if this is the case or not.
As Anne said, we tried shipping this and got user feedback indicating
that sufficiently many sites are broken that it was not acceptable to us.
> But just thinking about it logically what issues would there be in showing Null character as
> the replacement character instead? Visually would see some extra
> characters if the document author had Null characters. What is the big
> deal with doing that?
It makes text unreadable. Consider text that's actually UTF-16 but
being declared as ISO-8859-1. If you strip the nulls, it all works out.
But if you don't, every other character is a replacement character.
This is not a rare situation on the web, unfortunately.
> Why do authors even have null characters in
> their HTML documents?
Because they have UTF-16 text in their database that they dump into an
ISO-8859-1 document. They have no idea there are any "null characters"
involved.
> I assume I'm probably missing some historical reason for this
Yes, that reason is "the browsers all do it this way, so web sites
depend on it".
-Boris
More information about the whatwg
mailing list