[whatwg] Valid Unicode
Sam Ruby
rubys at intertwingly.net
Sat Dec 2 17:47:15 PST 2006
On 12/2/06, Henri Sivonen <hsivonen at iki.fi> wrote:
> On Dec 2, 2006, at 18:24, Sam Ruby wrote:
>
> > It would not be wise for HTML5 to limit itself to the more constrained
> > character set of XML. In particular, the form feed character is
> > pretty popular,
BTW, I copy and pasted the wrong table. The characters I mentioned
were discouraged (and include such things as Microsoft smart quotes
mislabeled as iso-8859-1). The actual allowed set in XML 1.0 is as
follows:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
For XML 1.1 the list is as follows:
[#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
> > This is yet another case where "take HTML5, read it into a DOM, and
> > serialize it as XML, and voilà: you have valid XHTML" doesn't work.
>
> What I am advocating is making sure that *conforming* HTML5 documents
> can be serialized as XHTML5 without dataloss.
Then you will also need to disallow newlines in attribute values.
In any case, I understand the desire; my read is that the WG's desire
for backwards compatibility is higher. Limiting the character set to
the allowable XML 1.1 character set should not be a problem for
backwards compatibility purposes.
- Sam Ruby
More information about the whatwg
mailing list