[whatwg] Control and Undefined Characters
ian at hixie.ch
Wed Oct 10 16:07:57 PDT 2012
On Thu, 11 Oct 2012, Cameron Zemek wrote:
> The spec states:
> "Any occurrences of any characters in the ranges U+0001 to U+0008,
> U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters
> U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE,
> U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF,
> U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE,
> U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF,
> U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are parse
> errors. These are all control characters or permanently undefined
> Unicode characters (noncharacters)."
> Additionally character references for these codepoints also will
> return these unicode characters. Therefore these characters are passed
> to the tree construction stage as far as I can tell. And I so no
> handling of them in the tree contruction.
> Elsewhere in the specification it says:
> "Text nodes and attribute values must consist of Unicode characters,
> must not contain U+0000 characters, must not contain permanently
> undefined Unicode characters (noncharacters), and must not contain
> control characters other than space characters."
All these requirements relate to authoring conformance criteria and
User agents are required to treat U+0001 the same as, say, "A".
> And testing in Firefox and Chrome it appears these characters are
> ignored. But I see no mention of this anywhere to ignore them or how to
> handle them.
Do you have a test case demonstrating this? When I tested it it seemed
like the characters were not ignored:
(This test is testing whether a U+0001 is lost either in the JS parser,
document.write(), the HTML tokeniser, the HTML parser, the DOM API, or the
JS string API, and it seems to get through all of those fine.)
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg