[whatwg] Parsing processing instructions in HTML syntax: Bogus comment state

Brett Zamir brettz9 at yahoo.com
Wed Mar 3 02:55:46 PST 2010

On 3/2/2010 6:54 PM, Ian Hickson wrote:
> On Tue, 2 Mar 2010, Elliotte Rusty Harold wrote:
>> Briefly it seems that<? causes the parser to go into Bogus comment
>> state, which is fair enough. (I wouldn't really recommend that anyone
>> use processing instructions in HTML syntax anyway.) However the parser
>> comes out of that state at the first>. Because processing instructions
>> can contain>  and terminate only at the two character sequence ?>  this
>> could cause PI processing to terminate early and leave a lot more error
>> handling and a confused parser state in the text yet to come.
> In HTML4, PIs ended at the first>, not at ?>. "<?target data>" is the
> syntax of PIs when the SGML options used by HTML4 are applied.
> In any case, the parser in HTML5 is based on what browsers do, which is
> also to terminate at the first>. It's unlikely that we can change that,
> given backwards-compatibility needs.

Are there really a lot of folks out there depending on old HTML4-style 
processing instructions not being broken? Given that as I understand it 
such HTML4 processing instructions were not even used by any standard at 
that time, and with XHTML 1.0+ processing instructions bringing into 
practice the XML form, and especially with all of the progress made in 
X/HTML5 on harmonizing HTML and XHTML, I'd think that it'd really be 
ideal if this issue would not get in the way (along with the unfortunate 
loss of external DTDs)...

As long as website creators have the freedom to be sloppy, why not go a 
little further to make XML compatibility better? It'd be a whole lot 
more appealing to work in both environments out of the box than deal 
with complex server-side conversion solutions...


More information about the whatwg mailing list