[whatwg] several messages about XML syntax and HTML5

Tue Dec 5 14:03:34 PST 2006

On Tue, 5 Dec 2006, Mike Schinkel wrote:
> > 
> > XHTML5 is not really intended to be used, it's only defined for the 
> > purposes of making sure XML users don't try to each invent their own 
> > version, resulting in dozens of incompatible versions. HTML5 as 
> > text/html is the main serialisation format for HTML5.
> 
> So am I to understand that, moving forward, the W3C will recommend HTML5 
> for web pages and XHTML only for special cases?

I can't speak for the W3C. Regarding the WHATWG, the spec says:

| [HTML5] is the format recommended for most authors. [...] Generally 
| speaking, authors are discouraged from trying to use XML on the Web, 
| because XML has much stricter syntax rules than the "HTML5" variant 
| described above [...]
  -- http://www.whatwg.org/specs/web-apps/current-work/#html-vs

> If so, and there are freely available conformant parsers on all 
> platforms that the spec explicit recognizes, I'd be happy with that.

Work is ongoing to ensure that this happens.

> > but I encourage you to speak to browser vendors and search engines and 
> > see what they say.
> 
> It's nothing a browser vendor would need to support. OTOH, from a search 
> engine perspective, you work for Google...? :)

And I've already told you my opinion. :-)

> > > How about *real* XML Data Islands then?
> >
> > What would those be?
> 
> For example:
> 
> <XMLDATA>
> 
> 	Data in XML format goes here.
> 
> </XMLDATA>
> 
> The HTML5 parser would pass anything within <XMLDATA> elements to an XML 
> parser and insert whatever it returns into the response stream.  This 
> could allow SVG and MathML to work, no?

What's the use case? What's the processing model?

See:

   http://blog.whatwg.org/proposing-features

It's not clear to me what the problem is we're trying to solve here, not 
what the proposal for solving it is. What are the parsing requirements? 

Note that any feature that, when misused, will "work better" in browsers 
that _don't_ support the feature than in browsers that _do_ support the 
feature, are doomed to failure, because browsers will be forced to emulate 
the browsers that don't support the feature instead. This basically 
implies that any syntax checking in a text/html document that results in 
fatal error (even for a subpart) but that renders ok in legacy browsers is 
a non-starter.

> I mean it will take a long time for people to start using HTML5 until 
> the time at which HTML5 pipeline tools become ubiquitious if left up to 
> the free market to develop them and they are not specified as part of 
> the spec meaning lots of string contationations apps will be built in 
> the meantime and create another slate of legacy apps.

Yes, that's why we are _already_ working on the tools. You are welcome to 
take part in this work; ask in #whatwg on Freenode IRC.

> > > XHTML, which introduced a new format, provides a single direction?
> >
> > I'm confused. I thought it was the introduction of XHTML that 
> > introduced multiple formats!
> 
> Actually, text/plain came even before text/html. :)  Anyway, XHTML was 
> presented by the W3C as an eventual replacement on text/html.  I'm 
> ideally hoping that we can have one target with the rest marked as 
> legacy, not multiple incompatible ones.

The "one target" now is HTML5.

XHTML5 is just a reserialisation of HTML5 using the XML syntax; the XML 
syntax isn't described by the HTML5 spec. You could easily imagine a JSON 
serialisation of HTML5, or a BXML serialisation, or a XAML serialisation, 
or a perl Data::Dumper serialisation, or an SGML version, or any number of 
other formats. However, HTML5 is the recommended one.

> > Anyway, with HTML5, you have a single direction: HTML5-as-text/html.
> 
> What's the position of XHTML?  It *seems* like it will still be 
> presented as a viable option by the W3C.

I can't speak for the W3C. From the WHATWG perspective it's a viable 
option only because XML exists and we can't stop it from existing (just 
like we can't stop text/html from existing). However, the recommended 
format is HTML5.

> > Anyway. Just consider HTML5-as-text/html to be your only language, and 
> > you'll be set.
> 
> No man is an island, especially on the Internet.  I can't consider HTML5 
> as the only one to target for future if others head down the XHTML path.  
> For one of 1000 considerations, how do I know it the website I'm posting 
> a comment to used HTML5 or XHTML (as a non-technical user)?

They're using HTML5. Anything using text/html is HTML5, and everyone 
basically uses text/html. There are exceptions (Sam, e.g.), but there are 
_always_ exceptions.

If you write content and transmit it as text/html, then it is HTML5. You 
don't have the choice to send XHTML5 as text/html, and nor does anyone 
else -- merely sending the content as text/html makes it HTML5, regardless 
of what it looks like. (And since xmlns="http://www.w3.org/1999/xhtml" and 
"/>" syntax are now allowed in HTML5, there really is no way to tell which 
the author _intended_; the MIME type is what matters.)

> > HTML has a well-defined extensionability model, as used by the 
> > Microformats community. It's even got a good accessibility story.
> 
> One of the limitations on Microformat design is the lack of available 
> tags.

How do you mean?

> This is a this issue I'm bringing up is new (from me) but what about 
> allowing several more attributes to be added to the standard attribute 
> list for all elements?  For example, if would be really nice if 
> attributes like abbr, href, name, rel, rev, scope, size, src, type, and 
> value were available on ALL elements. (Please, pretty please... :)

Could you elaborate on what each one of these attributes would mean?

Taking "abbr", though, if you want to extend HTML with some custom 
features and one of the thing you have to add is an abbreviation, then use 
the <abbr> element. It already supports abbreviations.

> > > Until then, the preferred technique for extracting things like 
> > > trackback metadata will continue to be screen scraping with regular 
> > > expressions.
> >
> > I believe pingback shows quite clearly that extension mechanisms for 
> > such things already exist and that the fact that trackback doesn't use 
> > them is not a fault of HTML.
> 
> Mind if I ask for clarification on this?  I am not advocating anything 
> here, you just peaked my interest in learning what you meant.

The pingback specification does exactly what the trackback specification 
does, but without relying on RDF blocks in comments or anything silly like 
that. It just uses the Microformats approach, and is far easier to use, 
and doesn't require any additional bits to add to HTML.

> > Ok. HTML supports this today, in both the HTML and XML serialisations, 
> > using class values and rel types. Microformats.org is the community 
> > that is most actively working with these mechanisms, but the 
> > mechanisms are open to anyone to use. As I mentioned above, this even 
> > has a pretty decent accessibility story (which is unusual for 
> > extension mechanisms).
> 
> Let me applify the need to have more attributes available for extension. 
> Without more attributes, the story is really not that decent. ;-)

Why not? Could you elaborate on the problem you would like us to solve? I 
don't understand your use case.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'