[whatwg] Annotating structured data that HTML has no semantics for

Fri May 15 04:11:51 PDT 2009

On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak <mjs at apple.com> wrote:
> [...]
> From my cursory study, I think microdata could subsume many of the use cases
> of both microformats and RDFa.
Maybe. But microformats and RDFa can handle *all* of these cases.
Again, which are the benefits of creating something entirely new to
replace what already exists while it can't even handle all the cases
of what it is replacing? Both the new syntax, and the cases
restrictions, are costs: what are these costs buying? If it's not
clear what we are getting for these costs, it is impossible to
evaluate whether the costs are worth it or not.

> It seems to me that it avoids much of what microformats advocates find objectionable
Could you specify, please? Do you mean anything else than WHATWG's
almost irrational hate toward CURIEs and everything that involves
prefixes?

> but at the same time it seems it can represent a full RDF data
> model.
No, it *can't* represent a full RDF model: it has already been shown
several times on this thread.

> Thus, I think we have the potential to get one solution that works for everyone.
RDFa itself doesn't work for everyone; but microdata is even more
restricted: it leaves out the cases that RDFa leaves out, but it also
leaves out some cases that RDFa was able to handle. So, where do you
see such potential?

> I'm not 100% sure microdata can really achieve this, but I think making the
> attempt is a positive step.
What do you mean by "making the attempt"? If there is something
microdata can't handle, it won't be able to handle it without changing
the spec. If you meant that evolving that microdata proposal towards
something that works for everyone is a positive step, then I agree;
but if you meant that engraving this microdata approach into the spec
and set it into stone, then attempt for everyone to accept it, then I
definitelly disagree. So, please, could you clarify the meaning of
that statement? Thanks.

> One other detail that it seems not many people have picked up on yet is that
> microdata proposes a DOM API to extract microdata-based info from a live
> document on the client side. In my opinion this is huge and has the
> potential to greatly increase author interest in semantic markup.
Allright, an API may be a benefit. Most probably it is. However, a
similar API could have been built from RDFa, or eRDF, or EASE, or any
other already existing or new solution; so it doesn't justify creating
a new syntax. I have to insist: which are the benefits from such
built-from-the-ground, restrictive *syntax*? That's what we need to
know to evaluate it against its costs.

> Now, it may be that microdata will ultimately fail, either because it is
> outcompeted by RDFa, or because not enough people care about semantic
> markup, or whatever. But at least for now, I don't see a reason to strangle
> it in the cradle.
At least for now, I don't see a reason why it was created to begin
with. Maybe if somebody could enlighten us with this detail, this
discussion could evolve into something more useful and productive.

On Fri, May 15, 2009 at 6:53 AM, Maciej Stachowiak <mjs at apple.com> wrote:
>
> On May 14, 2009, at 1:30 PM, Shelley Powers wrote:
>
>> So, if I'm pushing for RDFa, it's not because I want to "win". It's
>> because I have things I want to do now, and I would like to make sure have a
>> reasonable chance of working a couple of years in the future. And yeah, once
>> SVG is in HTML5, and RDFa can work with HTML5, maybe I wouldn't mind giving
>> old HTML a try again. Lord knows I'd like to user ampersands again.
>
> It sounds like your argument comes down to this: you have personally
> invested in RDFa, therefore having a competing technology is bad, regardless
> of the technical merits.
Pause, please. Before going on, I need to ask again: which are those
technical merits??

> I don't mean to parody here - I am somewhat sympathetic to this line of argument.
I think I'm interpreting Shelley's argument slightly differently. She
didn't chose RDFa because it was better than microdata. She chose RDFa
because it was better than other options, and microdata didn't even
exist yet. Now microdata comes out, some drawbacks are highlighted in
comparison with RDFa (lack of typing, inability to depict the full RDF
model, Reversed domains are as ugly as CURIEs (but at least CURIEs
resolve to something useful, while reversed domains often don't
resolve at all), and you ask RDFa proponents to give microdata a
chance, to not "strangle it in the cradle"; but nobody seems willing
to answer the one question: what does microdata provide to make up for
its drawbacks?

> Often pragmatic concerns mean that an incremental improvement just isn't worth the cost of switching
Wait. Are you refering to microdata as an incremental improvement over
RDFa?? IMO, it's rather a decremental enworsement.

> My personally judgment is that we're not past the point of
> no return on data embedding. There's microformats, RDFa, and then dozens of
> other serializations of RDF (some of which you cited). This doesn't seem
> like a space on the verge of picking a single winner, and the players seem
> willing to experiment with different options.
Of course. I'm willing to experiment. I experimented with RDFa some
time ago and found it unsuitable for my needs. I have taken a look at
microdata and it isn't any more suitable. Some people experimented
with RDFa and found it good or even perfect for their needs; now
microdata is simply less good: do you expect it to get any relevant
ammount of experimentation?

> Supporting XHTML 1.1 has about 0.00000000001% as much value as supporting
>  text/html.
Could you please point to the source of such figure?? It widelly
differs from my experience browsing the web. Surely, there is more
HTML tag soup than X-whatever out there, but I still encounter
XHTML1.1 quite often.

> XHTML 2.0 is completely irrelevant to the Web, and looks on
> track to remain so. So I don't find this point very persuasive.
That might be a whish for some, but it's definitelly not a fact. Many
web authoring communities have been taking a look to the evolution of
both XHTML2 and HTML5. Sure, HTML5 has currently the advantage that
vendors are willing to support it. Yet XHTML2 simply doesn't need
vendor support: it can be entirely implemented on the author's side
via XSLT + Javascript (which both are already supported by all
vendors), and some people have already done some work on that line.
So, XHTML2 is quite relevant to the web: if HTML5 doesn't do a good
enough job, authors will be building their "browser-in-a-browser" to
make their pages on XHTML2 instead, or just stay with XHTML1/HTML4. So
I don't find your point persuasive at all.

>> Why you think something completely brand new, no vendor support, drummed
>> up in a few hours or a day or so is more robust, and a better option than a
>> mature spec in wide use, well frankly boggles my mind.
>
> I haven't evaluated it enough to know for sure (as I said). I do think
> avoiding CURIEs is extremely valuable from the point of view of sane
> text/html semantics and ease of authoring; and RDF experts seem to think it
> works fine for representing RDF data models. So tentatively, I don't see any
> gaping holes. If you see a technical problem, and not just potential
> competition for the technology you've invested in, then you should
> definitely cite it.
¬¬'
Technical problems have already been cited. Here's a summary, just
review the thread for further details on each:
- Microdata can't handle non-string objects: even if I use the <time>
element to mark up a date with microdata, it will be taken as a string
rather than as a date. Since tools may be relying on explicit typing
for some tasks, this limitation renders microdata less usable than
RDFa.
- Microdata is an entirely new syntax. It requires implementation and
deployment of consumers before it can be used at all; while RDFa has
already gone through this step. (This is not a very serious problem,
but is a cost to be considered.)
- Microdata can't represent the full RDF data model (while RDFa can):
some complex structures are just not expressable with microdata.
- Microdata inherits all of the flaws from RDFa (except for the use of
CURIEs, which some people here consider an inconvenient). For example,
marking up a list of items requires lots of redundant code.
- Microdata relies on reversed domains. While some people argue these
to be better than CURIEs, they are equally horrendous for the average
user, and have the additional disadvantage that they don't map to
anything useful (if they map to something at all), while CURIEs map to
the descriptions and/or definitions of what they represent.

So, here they are, again. Technical problems with microdata. I'm eager
to hear your answers to them.

>>> One other detail that it seems not many people have picked up on yet is
>>> that microdata proposes a DOM API to extract microdata-based info from a
>>> live document on the client side. In my opinion this is huge and has the
>>> potential to greatly increase author interest in semantic markup.
>>>
>>
>> Not really. Can do this now with RDFa in XHTML. And I don't need any new
>> DOM to do it.
>>
>> The power of semantic markup isn't really seen until you take that markup
>> data _outside_ the document. And merge that data with data from other
>> documents. Google rich snippets. Yahoo searchmonkey. Heck, even an
>> application that manages the data from different subsites of one domain.
>
> I respectfully disagree. An API to do things client-side that doesn't
> require an external library is extremely powerful, because it lets content
> authors easily make use of the very same semantic markup that they are
> vending for third parties, so they have more incentive to use it and get it
> right.
Sure, an API is quite valuable. But, besides the fact that defining an
API didn't require to create a brand new syntax (it could have been
defined for any of the already existing syntaxes, see my comments of
this above), it doesn't change the fact that the value of the metadata
once taken outside the document is extremelly important. Creating a
new way to exploit the data doesn't render the other ways irrelevant.

> With due respect, you're the one who brought competition into this
> discussion by saying there can only be one winner. I don't really think
> that's true, in this case.
With due respect, it was the WHATWG who brought competition into the
whole web spec environment, due to disagreements with the W3C (I find
the WHATWG's reasons quite valid, don't want to discuss on them now);
but now this competition seems to be going to extremes. While some
people here seem to be in the position "it's RDFa or nothing", there
are others who seem to be in the "everything is fine except RDFa"
pole. Extremes are never good; and this discussion is not an
exception.

Regards,
Eduard Pascual