[whatwg] xml:lang and xmlns in HTML

Fri Dec 1 12:11:40 PST 2006

On Fri, 1 Dec 2006, Sam Ruby wrote:
> > 
> > Except that wouldn't be backwards compatible since xml:lang="" isn't 
> > treated as a language attribute in legacy UAs.
> 
> I thought that the HTML definition of backwards compatibility was "If a 
> user agent encounters an attribute it does not recognize, it should 
> ignore the entire attribute specification (i.e., the attribute and its 
> value)."

In the context of features that already work in legacy UAs, backwards 
compatibility also implies that the feature should work as much as 
possible in legacy UAs. (Graceful fallback.)

> > > This would make it possible to have documents conformant with both 
> > > syntaxes at the same time.
> > 
> > I thought XHTML-sent-as-text/html had explained in painful detail why 
> > that's not a desirable end goal. Why would we want this?
> 
> Perhaps the problem is that your reformulation of Michel's assertion 
> doesn't capture the essence of the perceived requirement.

I don't understand.

> And given the frequency with which this question comes up, there 
> probably is a kernel of validity hiding in there somewhere.

It actually doesn't come up that much, compared to other things (e.g. "how 
do I find what the selection in a text field is?" is more common than "how 
do I make my document conformant to XML and HTML at the same time?").

> > > This could also help reinforce the idea that it's the media type 
> > > that differentiate HTML from XHTML. It'd make many valid XHTML1 
> > > documents out there conformant with HTML5 with a mere modification 
> > > to the doctype.
> > 
> > Not if they use things like <![CDATA[...]]> or the empty element 
> > syntax on non-void elements, or any number of other XMLisms.
> 
> Until yesterday, empty element syntax on void elements was also an 
> XMLism. Perhaps the question as to whether <![CDATA[..]]> should be 
> allowed should be explored with the same pragmatism as the empty/void 
> question was pursued.

But with <![CDATA[..]]>, namespaces, and most other XMLisms, the proposals 
fall down at the first step: they aren't compatible with legacy handling 
of HTML. The void element trailing "/" proposal only got considered 
because it was compatible with legacy UA handling, and would be of 
considerable help to authors who had fallen prey to the XHTML1 Appendix C 
fallacy and were trying to move to HTML5.

> > > What do you think?
> > 
> > I don't think it's a goal for the two serialisations to have a common 
> > subset.
> 
> Whether it is a goal or not, it is a reality that the two serializations 
> are enough similar to confuse many.

I agree. However, until such time as all browsers support XHTML, I don't 
see any reason to use it. When all browsers support XHTML, then we can 
dump text/html altogether. Trying to use XHTML before XHTML is supported 
is putting the cart before the horse.

So, one of the two serialisations can be ignored, and authors need only 
use the "latest version", namely HTML5.

Furthermore, HTML and XML are _different formats_. People don't use the 
same parser for RDF n3 and RDF XML, or the same parser for PNG and GIF, or 
the same parser for RelaxNG XML and RelaxNG Compact Syntax. Why would you 
use the same parser for XML and HTML? Treat them as different syntaxes. 
They have their own idiosyncrasies, conformance rules, parsing rules, and 
they only have a tiny amount of overlap. Treating them as the same 
language is not good design practice.

(There also seems to be an implicit assumption that XML is better than 
text/html. This certainly was true back when XML was well-defined and HTML 
was a mess of undefined reverse-engineering. However, HTML5 changes this; 
now, HTML is as well, if not better, defined than XML. The assumption that 
XML is intrinsically better should be revisited.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'