[whatwg] Valid Unicode

Henri Sivonen hsivonen at iki.fi
Sun Dec 3 01:40:45 PST 2006

On Dec 3, 2006, at 03:47, Sam Ruby wrote:

>> What I am advocating is making sure that *conforming* HTML5 documents
>> can be serialized as XHTML5 without dataloss.
> Then you will also need to disallow newlines in attribute values.

I believe that is not the case. See the last line of the table at the  
end of section 3.3.3 in the XML 1.0 spec.

(Note that if some of this doesn't currently work in Gecko, Gecko has  
a bug. Expat does the XML-compliant thing but then nsExpatDriver runs  
whitespace normalization again, which is bogus. https:// 
bugzilla.mozilla.org/show_bug.cgi?id=343870 It doesn't make sense to  
fix it until bug 18333 has landed.)

> In any case, I understand the desire; my read is that the WG's desire
> for backwards compatibility is higher.  Limiting the character set to
> the allowable XML 1.1 character set should not be a problem for
> backwards compatibility purposes.

XML 1.1 doesn't really solve anything in this area. XML 1.1 is part  
of the problem. It creates incompatibility in corner cases without  
compelling benefits. The real XML that is known to work with any "XML  
tool chain" is XML 1.0.

I should point out that HTML5 proclaims non-conforming some things  
that no doubt exist on the Web and are far more common that form  
feeds. You can't even achieve any useful effect by including a form  
feed in HTML.

Henri Sivonen
hsivonen at iki.fi

More information about the whatwg mailing list