[whatwg] Should ambiguous ampersand be a parse error?
ian at hixie.ch
Wed Jan 22 13:48:06 PST 2014
On Tue, 10 Dec 2013, Boris Zbarsky wrote:
> On 12/10/13 11:11 AM, Peter Cashin wrote:
> > Is the specification intended to have compliant HTML agents stop
> > parsing ambiguous ampersands?
> Compliant HTML agents are allowed to do so, I guess, per the technical
> rules about parse errors, just like for any other parse error. But I
> expect that this is at least partly for conformance classes other than
> "browsers"; all browsers press on through parse errors in HTML. Maybe
> the allowed behavior for parse errors should be made conditional on
> conformance class...
While I agree that it's unlikely that any browser will ever make use of
this in its default mode, I've still allowed it, because it can be a
useful mode to use in an authoring or educational environment.
On Tue, 10 Dec 2013, Jukka K. Korpela wrote:
> Authoring requirements as such are just policy statements, therefore
> regularly ignored.
Conformance requirements for authors are really just a way to try to help
authors avoid making what they would consider mistakes. The specification
actually has a whole section that explains why we bother to have them:
> Allowing user agents to stop parsing after a parse error (BTW, where
> exactly does the WHATWG HTML Living Standard allow that?)
It's in the sentence that follows the one that defines "parse error":
> is really just avoidance.
Not sure what you mean by "avoidance". What does it avoid?
> If browsers actually apply some specific error recovery, what’s the
> excuse for not making that mandatory?
We allow these two implementation strategies because not all tools
actually need to recover. For example, an HTML publishing pipeline might
want to assume that its input is valid, and simply refuse to handle
invalid input, rather than applying the error handling rules (which can
cause a big mess, e.g. reordering content!).
> Different user agents can really do very different things. But I don’t
> think it’s a good idea to make that a rule of *parsing HTML*.
It's not really different things, it's either doing what the spec says, or
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg