[whatwg] Parser-related philosophy

Fri Jul 26 14:41:05 PDT 2013

(I think these e-mails have largely been overtaken by events, but I 
promised to reply to all substantive feedback, so here we go.)

On Sat, 12 Jan 2013, Vipul S. Chawathe wrote:
>
> It's okay for authors who leave deploying content to publisher to stop 
> with looking at html appearance from browser to users. Xhtml's fewer 
> publishers maybe bonded abit over-tightly with it if their quantity is 
> lesser considering how helpful transforms are. Repetitive content 
> over-counted is more likelier for html than transformable xml 
> serializations. The publisher may favour plug-ins for flash, jvm, 
> Silverlight and whichever else. However, small publishers who are 
> impacted by semantic significance of content grasped by search engine, 
> oft deliver same data using link tag with rel="alternate" attribute than 
> difficult to index proprietary plug-in based formats. The alternate 
> representation might be atom, rdf, ... using grddl xslt or some such 
> html sibling spec, so xhtml may not be well-supported but vanilla 
> support is another matter. For my personal interest, I'm looking forward 
> to seamless iframes, though styled iframe does hide the frame appearance 
> for javascript that breaks on main xhtml page, and place it in another 
> page that's plain html. My point is, if the spec can be precise w. r. t. 
> DOM to avoid usability breakage in xhtml, then the spec hopefully will 
> be precise, leaving aside when xhtml should be considered dead to 
> user-supporters at present.

I agree that the spec should be precise. If there's concrete examples 
where it's not, please do bring them up so we can fix them.

On Mon, 14 Jan 2013, Henri Sivonen wrote:
> On Fri, Jan 11, 2013 at 10:00 PM, Ian Hickson <ian at hixie.ch> wrote:
> > On Fri, 11 Jan 2013, Henri Sivonen wrote:
> > >
> > > I understand that supporting XML alongside HTML is mainly a burden 
> > > for browser vendors and I understand that XML currently doesn't get 
> > > much love from browser vendors.
> >
> > Not just browser vendors. Authors rarely if ever use XML for HTML 
> > either.
>
> When you say "use XML", do you mean serving content using an XML content 
> type?

I mean using XML that is processed as XML by one or more parts of the 
toolchain, as opposed to being treated as HTML (parsed by an HTML parser) 
or text (parsed by custom parsers or hand-edited in a way that doesn't 
check at any point for XML well-formedness), and contains nodes in the 
HTML namespace during this processing.

> > Anyway, I'm not suggesting that they diverge beyond the syntax (which 
> > is already a lost cause). All I've concretely proposed is syntax for 
> > binding Web components in text/html; I haven't described how this 
> > should be represented in the DOM, for instance. If we define <foo/bar> 
> > as being a text/html syntactic shorthand for <foo 
> > xml:component="bar">, or <foo xmlcomponent="bar">, in much the same 
> > way as we say that <svg> is a shorthand for <svg 
> > xmlns="http://www.w3.org/2000/svg">, then the DOM remains the same for 
> > both syntaxes, and (as far as I can tell) we're fine.
>
> I didn't realize you were suggesting that HTML parsers in browsers 
> turned <bar/foo> into <bar xml:component="foo"> in the DOM. How is 
> xml:component="foo" better than is="foo"?

It's exactly the same, as far as I can tell. It's the syntax that'd be
better, IMHO.

> Why not <bar foo="">, which is what <bar/foo> parses into now? (I can 
> think of some reasons against, but I'd like to hear your reasons.)

Namespace collision. That space is taken up by defined attributes.

> > Some of the constraints are:
> >
> >  - The binding has to be done at element creation time
> >  - The binding has to be immutable during element lifetime
> >  - The syntax must not make authors think the binding is mutable
> >    (hence why the <select is="map"> proposal was abandoned)
>
> How does xml:component="map" suggest mutability less than is="map"?

It doesn't, but you wouldn't see that in the markup.

> - It must be possible to generate the syntax using a serializer that
> exposes (only) the SAX2 ContentHandler interface to an XML system and
> generates text/html in response to calls to the methods of the
> ContentHandler interface and the XML system may enforce the calls to
> ContentHandler representing a well-formed XML document (i.e. would
> produce a well-formed XML doc if fed into an XML serializer). The
> syntax must round-trip if the piece of software feeding the serializer
> is an HTML parser that produces SAX2 output in a way that's consistent
> with the way the parsing spec produces DOM output. (This is a concrete
> way to express “must be producable with Infoset-oriented systems
> without having a different Infoset mapping than the one implied by the
> DOM mapping in browsers”. As noted, dealing with <template> already
> bends this requirement but in a reasonably straightforward way.)

Personally I don't think this is a requirement we should worry about.

> - It must be possible to generate the syntax with XSLT. (Remember, we 
> already have <!DOCTYPE html SYSTEM "about:legacy-compat">, because this 
> is important enough a case.)

I _really_ don't think this is a use case we should worry about. I'd be 
happy to dump all XSLT support. The legacy DOCTYPE is there because people 
offered it as a cheap way of making them happy. It doesn't constrain the 
language's development.

On Mon, 14 Jan 2013, Henri Sivonen wrote:
> > >
> > > As for <command> behavior in the parser, all major browsers have 
> > > shipped releases with <command> as void, so we won't be able to 
> > > reliably introduce a non-void element called "command" in the future 
> > > anyway. Therefore, I don't see value in removing the voidness of 
> > > "command" from parsing or serialization.
> >
> > The element doesn't exist, so there's no value in having it. We can 
> > easily introduce a non-void <command> in ten years if we need to, 
> > since by then the current parsers will be gone.
>
> Even if we accept, for the sake of the argument, that the current 
> parsers will be gone in 10 years, it is incorrect to consider only 
> parsers. Considering serializers is also relevant.

Serialisers have to be updated when we add new void elements (or drop 
them), certainly, but just as with parsers, over the years I think the 
number of legacy serialisers will be manageable, and we can thus extend 
the language over time without much difficulty.

> The voidness of "command" has already propagated to various 
> places—including serializer specs like 
> http://www.w3.org/TR/xslt-xquery-serialization-30/ . (No doubt the XSLT 
> folks will be super-happy when we tell them that the list of void 
> elements has changed again.)

I don't think it makes sense for XSLT to be hard-coding anything about 
HTML.

> At any point of the future, it is more likely that picking a new element 
> name for a newly-minted non-void element will cause less (maybe only an 
> epsilon less but still less) hassle than trying to re-introduce 
> "command" as non-void. Why behave as if finite-length strings were in 
> short supply? Why not treat "command" as a burned name just like 
> "legend" and pick something different the next time you need something 
> of the same theme when interpreted as an English word?

I still think treated "legend" as a burnt out name was a mistake. I don't 
think we should repeat that mistake.

> What makes an element "exist" for you? Evidently, basefont and bgsound 
> exist enough to get special parsing and serialization treatment. Is 
> multiple interoperable parsing and serialization implementations not 
> enough of existence and you want to see deployment in existing content, 
> too?

What matters is deployed content, right. Deployed user agents are often a 
convenient proxy for deployed content. Bug reports filed in response to 
chagnes to user agents, in particular, are rather useful as such a proxy.

> Did you measure the non-deployment of <command> on the Web or are we 
> just assuming it hasn't been used in the wild? Even if only a few 
> authors have put <command> in <head>, changing parsing to make <command> 
> break out of <head> is bad.

I'm aware of no data that shows that pages break in browsers that treat 
<command> as non-void or non-head-friendly.

> What do we really gain except for test case churn, makework in code and 
> potential breakage from changing "command" as opposed to treating it as 
> a used-up identifier and minting a new identifier in the future if a 
> non-void element with a "command"-like name is needed in the future?

We gain the ability to use "command" as a non-void (or, indeed, void) 
element in the future, and we gain epsilon more simplicity and leanness 
in the code (and spec) in the meantime.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'