[whatwg] several messages about XML syntax and HTML5

Mike Schinkel mikeschinkel at gmail.com
Wed Dec 6 20:52:50 PST 2006


Ian Hickson wrote:
| [HTML5] is the format recommended for most authors. [...] Generally 
| speaking, authors are discouraged from trying to use XML on the Web, 
| because XML has much stricter syntax rules than the "HTML5" variant 
| described above [...]
  -- http://www.whatwg.org/specs/web-apps/current-work/#html-vs

Well, sounds like HTML5 should become the "preferred" web authoring format.
Let's see if community and industry embraces that or fights it.

>> > It's nothing a browser vendor would need to support. OTOH, from a 
>> > search engine perspective, you work for Google...? :)
>> And I've already told you my opinion. :-)

Then is there anyone else at Google that I can speak with?  ;-)


>> > The HTML5 parser would pass anything within <XMLDATA> elements to an 
>> > XML parser and insert whatever it returns into the response stream.  
>> > This could allow SVG and MathML to work, no?
>> What's the use case? 

The use-case is to allow abitrary XML to be embedded into HTML, just like an
Excel Speadsheet can be embedded into a Word doc with the container behaving
as expected and that which is contained behaving as expected.  Specifically
MathML or SVG, but also anything new that someone comes up with after the
HTML5 spec is sealed.

>> What's the processing model?

I don't understand what you are asking.

>> It's not clear to me what the problem is we're trying to solve here, not
what the proposal for solving it is. 

Ability to insert XML-based solutions into HTML and have then processed as
XML. By default it could reference and an attributing with a URL to an XSLT
file, but browsers could also implement a plug-in architecture to process
known schemas contained within. This would allow almost ultimate flexibility
moving forward and not require an HTML6 for many things. What's more, it
would help to see what the world at large has created for extensions to
understand what interests people. And because it would be required to be
valid (or at least well formed) XML you'd give HTML publishers a chance to
learn the rules of XML.

It would also allow embedding of what I'll call "Microdirectives", i.e.
basically metadata, but that visible like Microformats.  It would let
someone publish both a human readable document and a machine readable
document.

Like XHTML which just won't display with invalid, this XML wouldn't display
either, if not valid, but as it would be contained in a displaying HTML
file, that would be okay because it wouldn't be an all or nothing situation.

Oh, and it would be a great place to store RDF. ;-)

>> What are the parsing requirements? 

Again, not exactly sure what you are asking.  Have I answered already?

>> Note that any feature that, when misused, will 
>> "work better" in browsers that _don't_ support 
>> the feature than in browsers that _do_ support 
>> the feature, are doomed to failure, because 
>> browsers will be forced to emulate the browsers 
>> that don't support the feature instead. This 
>> basically implies that any syntax checking in a 
>> text/html document that results in fatal error 
>> (even for a subpart) but that renders ok in 
>> legacy browsers is a non-starter.

I don't follow this. Did you state it correctly?  Does it apply to what I'm
talking about?  And if so, why?  I legacy browser would have to ignore an
<xmldata> element.  Why would it be bad if older browsers worked better?  I
can't even conceive of an example.

>> Yes, that's why we are _already_ working on 
>> the tools. You are welcome to take part in this 
>> work; ask in #whatwg on Freenode IRC.

I'd love to, but I don't have any sponsoring me to participate on this list,
and I'm getting to the point where I need to start spending my time making
money via other activities again, or look for someone to sponsor me. :)

>> The "one target" now is HTML5.

If that comes across LOUD AND CLEAR to the public upon release, then my
concerns in that regard are addressed.  Thanks.

>> They're using HTML5. Anything using text/html 
>> is HTML5, and everyone basically uses text/html. 
>> There are exceptions (Sam, e.g.), but there are 
>> _always_ exceptions.

Clarification then; what happens if IE8 supports application/xhtml+xml.
Sounds like that would actually be a bad thing.  Otherwise we might end up
with two camps: the HTML5 camp and the XHTML camp, and all the associated
chaos.

>> > This is a this issue I'm bringing up is new (from me) but what about 
>> > allowing several more attributes to be added to the standard attribute 
>> > list for all elements?  For example, if would be really nice if 
>> > attributes like abbr, href, name, rel, rev, scope, size, src, type, and

>> > value were available on ALL elements. (Please, pretty please... :)
>> Could you elaborate on what each one of these attributes would mean?

I don't have specifics, but I know from participating on the Microformat
list that one of the biggest problems if lack of available attributes. If
there were more attributes, the Microformat community could develop much
less verbose markup; i.e. instead of having three sets of <DIV> tags around
an element each with one attribute that can be used, they could define
Microformats that only required one <DIV> tag.  But I can't give you exactly
what they would be used for, just they the Microformat community could learn
to apply them effectively. By analogy, it would be like someone requiring
TimBL to specify upfront all the kinds of content people would put on a web
page before giving him the green light to work on the web.  I'm asking for
building blocks; the Microformat community would define how they apply.

>> Taking "abbr", though, if you want to extend HTML with 
>> some custom features and one of the thing you have to 
>> add is an abbreviation, then use the <abbr> element. 
>> It already supports abbreviations.

For one example, if I already have <td> tags enclosing a value, why do I
need to add almost 50% more characters when I could instead do it more
cleanly?  

	<td><abbr title="United States">USA</abbr></td>
vs.
	<td abbr="United States">USA</td>

I haven't been active on the Microformat list for over a month, but I
remember the thing that I found to be the biggest problem was lack of
available and _semantically_correct_ attributes for tagging metadata.   We
ended up having to two wrap two tags where one would have sufficed:

	<abbr class="currency" title="USD">
		<span class="amount">54.97</span>
	</abbr>
vs.
	<span class="currency" type="USD" description="amount">54.97</span>

Another thing that would be nice would be to add a <uf> tag (for u=Micro &
f=Format) and give it lots of attributes with short names for semantic
application:

       <span class="money">
               <span class="symbol" title="dollar">$</span>
               <abbr class="currency" title="USD">
                       <span class="amount">54.97</span>
               </abbr>
       </span>
vs.
       <uf c="money">
               <uf c="symbol" t="dollar">$</uf>
               <uf a="currency" t="USD" n="amount">54.97</uf>
       </uf>

But I know you'll probably consider <uf> too strange...

>> The pingback specification does exactly what the trackback 
>> specification does, but without relying on RDF blocks in 
>> comments or anything silly like that. It just uses the 
>> Microformats approach, and is far easier to use, and doesn't 
>> require any additional bits to add to HTML.

[offtopic]
I'd never heard of pingback. I googled for it and found your website first,
but couldn't find the RFC number.  You have a copyright of 2002, and it
appears that Trackback was also developed in 2002. So are you implying they
should have used Pingback instead?  It appears they were developed in
parallel?

BTW, why did you use XMLRPC with an simple RESTful POST would have sufficed
(and been easier to implement?)

-Mike Schinkel
http://www.mikeschinkel.com/blogs/
http://www.welldesignedurls.org/





More information about the whatwg mailing list