[whatwg] <% text %> and <? text ?> in corporate intranet html content
Boris Zbarsky
bzbarsky at MIT.EDU
Tue Feb 9 21:26:01 PST 2010
On 2/9/10 11:56 PM, Tab Atkins Jr. wrote:
> On Tue, Feb 9, 2010 at 9:05 PM, Biju<bijumaillist at gmail.com> wrote:
>> What should a user agent display when html content is...
>>
>> <html><body>
>> <%@ page language="java" %>
>> </body></html>
>>
>> At present IE and Safari display blank
>>
>> Firefox display<%@ page language="java" %>
As does Opera, and Firefox with the HTML5 parser enabled.
>> But for
>> <html><body>
>> abc<? echo ">" ?> xyz
>> </body></html>
>>
>> Firefox display...
>> abc " ?> xyz
As does Opera, and Firefox with the HTML5 parser enabled.
> Can someone else with more familiarity with the parser algorithm help
> out here?
For the "<%@" case, it looks like the state machine will go through the
following states:
Data state -> Tag open state
[1]. When encountering a '%' in the "Tag open" state, the specification
says:
Parse error. Emit a U+003C LESS-THAN SIGN character token
and reconsume the current input character in the data state.[2]
So the state will then remain "Data state" until the next '&' or '<' or
EOF is seen, so the entire string up to the </body> will be treated as
literal text.
For the "<?" case, the state transitions will be:
Data state -> Tag open state -> Bogus comment state
[1],[2]. Then the specification says to:
Consume every character up to and including the first U+003E
GREATER-THAN SIGN character (>) or the end of the file (EOF),
whichever comes first. Emit a comment token whose data is the
concatenation of all the characters starting from and including
the character that caused the state machine to switch into the bogus
comment state, up to and including the character immediately before
the last consumed character (i.e. up to the character just before the
U+003E or EOF character). (If the comment was started by the end of
the file (EOF), the token is empty.)
Switch to the data state. [3]
Or in other words, stop the bogus comment at the first '>' you see and
then start parsing normally again. In this case, that means treating
everything up to the next '<' or '&' or EOF as literal text.
So the currently-specified behavior in fact matches the observed Firefox
behavior (with either parser) on these simple testcases.
-Boris
[1]
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#data-state
[2]
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tag-open-state
[3]
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#bogus-comment-state
More information about the whatwg
mailing list