[whatwg] Parsing processing instructions in HTML syntax: Bogus comment state

Brett Zamir brettz9 at yahoo.com
Thu Mar 4 18:23:46 PST 2010

On 3/3/2010 7:06 PM, Philip Taylor wrote:
> On Wed, Mar 3, 2010 at 10:55 AM, Brett Zamir<brettz9 at yahoo.com>  wrote:
>> On 3/2/2010 6:54 PM, Ian Hickson wrote:
>>> On Tue, 2 Mar 2010, Elliotte Rusty Harold wrote:
>>>> Briefly it seems that<? causes the parser to go into Bogus comment
>>>> state, which is fair enough. (I wouldn't really recommend that anyone
>>>> use processing instructions in HTML syntax anyway.) However the parser
>>>> comes out of that state at the first>. Because processing instructions
>>>> can contain>    and terminate only at the two character sequence ?>    this
>>>> could cause PI processing to terminate early and leave a lot more error
>>>> handling and a confused parser state in the text yet to come.
>>> In HTML4, PIs ended at the first>, not at ?>. "<?target data>" is the
>>> syntax of PIs when the SGML options used by HTML4 are applied.
>>> In any case, the parser in HTML5 is based on what browsers do, which is
>>> also to terminate at the first>. It's unlikely that we can change that,
>>> given backwards-compatibility needs.
>> Are there really a lot of folks out there depending on old HTML4-style
>> processing instructions not being broken?
> Yes, e.g. a load of pages like
> http://www.forex.com.cn/html/2008-01/821561.htm (to pick one example
> at random) say:
>    <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
> and don't have the string "?>" anywhere.

Ok, fair enough.  But while it is great that HTML5 seeks to be 
transitional and backwards compatible, HTML5 (thankfully) already breaks 
compatibility for the sake of XML compatibility (e.g., localName or 
getElementsByTagNameNS). It seems to me that there should still be a 
role of eventually transitioning into something more full-featured in a 
fundamental, language-neutral way (e.g., supporting a fuller subset of 
XML's features such as external entities and yes, XML-style processing 
instructions); extensible, including the ability to include XML from 
other namespaces which may also encourage or rely on using their own XML 
processing instructions, for those who wish to experiment or supplement 
the HTML standard behavior; and more harmonious and compatible with a 
simpler syntax (i.e., XML's)--even if the more complex syntax is more 
prominent and continues to be supported indefinitely.


More information about the whatwg mailing list