[whatwg] Parsing processing instructions in HTML syntax: Bogus comment state

Philip Taylor excors+whatwg at gmail.com
Wed Mar 3 03:06:52 PST 2010

On Wed, Mar 3, 2010 at 10:55 AM, Brett Zamir <brettz9 at yahoo.com> wrote:
> On 3/2/2010 6:54 PM, Ian Hickson wrote:
>> On Tue, 2 Mar 2010, Elliotte Rusty Harold wrote:
>>> Briefly it seems that<? causes the parser to go into Bogus comment
>>> state, which is fair enough. (I wouldn't really recommend that anyone
>>> use processing instructions in HTML syntax anyway.) However the parser
>>> comes out of that state at the first>. Because processing instructions
>>> can contain>  and terminate only at the two character sequence ?>  this
>>> could cause PI processing to terminate early and leave a lot more error
>>> handling and a confused parser state in the text yet to come.
>> In HTML4, PIs ended at the first>, not at ?>. "<?target data>" is the
>> syntax of PIs when the SGML options used by HTML4 are applied.
>> In any case, the parser in HTML5 is based on what browsers do, which is
>> also to terminate at the first>. It's unlikely that we can change that,
>> given backwards-compatibility needs.
> Are there really a lot of folks out there depending on old HTML4-style
> processing instructions not being broken?

Yes, e.g. a load of pages like
http://www.forex.com.cn/html/2008-01/821561.htm (to pick one example
at random) say:

  <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

and don't have the string "?>" anywhere.

Philip Taylor
excors at gmail.com

More information about the whatwg mailing list