[whatwg] Parsing processing instructions in HTML syntax: Bogus comment state

Ian Hickson ian at hixie.ch
Tue Mar 2 02:54:56 PST 2010

On Tue, 2 Mar 2010, Elliotte Rusty Harold wrote:
> The handling of processing instructions in the XHTML syntax seems
> reasonably well-defined; but it feels a little off in the HTML syntax.

There's no such thing as processing instructions in text/html.

There was such a thing in HTML4, because of its SGML heritage, though it 
was explicitly deprecated.

> Briefly it seems that <? causes the parser to go into Bogus comment 
> state, which is fair enough. (I wouldn't really recommend that anyone 
> use processing instructions in HTML syntax anyway.) However the parser 
> comes out of that state at the first >. Because processing instructions 
> can contain > and terminate only at the two character sequence ?> this 
> could cause PI processing to terminate early and leave a lot more error 
> handling and a confused parser state in the text yet to come.

In HTML4, PIs ended at the first >, not at ?>. "<?target data>" is the 
syntax of PIs when the SGML options used by HTML4 are applied.

In any case, the parser in HTML5 is based on what browsers do, which is 
also to terminate at the first >. It's unlikely that we can change that, 
given backwards-compatibility needs.

There's a simple workaround: don't use PIs in text/html, since they don't 
exist in HTML5 at all, and don't send XML as text/html, since XML and HTML 
have different syntaxes and aren't compatible.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list