[whatwg] Annotating structured data that HTML has no semantics for

Mon May 18 06:05:51 PDT 2009

On Mon, May 18, 2009 at 10:38 AM, Henri Sivonen <hsivonen at iki.fi> wrote:
> On May 14, 2009, at 23:52, Eduard Pascual wrote:
>
>> On Thu, May 14, 2009 at 3:54 PM, Philip Taylor <excors+whatwg at gmail.com>
>> wrote:
>> It doesn't matter one syntax or another. But if a syntax already
>> exists (RDFa), building a new syntax should be properly justified.
>
> It was at the start of this thread:
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html
Ian's initial message goes step by step through the creation of this
new syntax; but does *not* mention at all *why* it was being created
on the first place. The insight into the choices taken is indeed a
good think, and I thank Ian for it; but he omitted to provide insight
into the first choice taken: discarding the multiple options already
available (not only Microformats and RDFa, but also other less
discussed ones such as eRDF, EASE, etc). Sure, there has been a lot of
discussion on this topic; and it's possible that the choice was taken
as part of such discussions. In any case, I think Ian should have
clearly stated the reasons to build a brand new solution when many
others have been out for a while and users have been able to try and
test them.
Please keep in mind that I'm not critizicing the choice itself (at
least, not now), but the lack of information and reasoning behind that
choice.
>
>> As
>> of now, the only supposed benefit I have heard of for this syntax is
>> that it avoids CURIEs... yet it replaces them with reversed domains??
>> Is that a benefit?
>
> There's no indirection. A decade of Namespaces in XML shows that both
> authors and implementors have trouble getting prefix-based indirection
> right.
Really? I haven't seen any hint about that. Sure, there will be some
people who have trouble understanding namespaces, just like there is
some people who have trouble understanding why something like
"<tr><td>foo</td><td>bar</tr></td>" is wrong.
Please, could you quote a source for that claim? I could also claim
something like "fifteen years of Java show that reversed domains are
error-prone and harmful", and even argue about it; but this kind of
arguments, without a serious analisis or study to back them, are
completely meaningless and definitely subjective.
>
> (If we were limited to reasoning about something that we don't have
> experience with yet, I might believe that people can't be too inept to use
> prefix-based indirection. However, a decade of actual evidence shows that
> actual behavior defies reasoning here and prefix-based indirection is
> something that both authors and implementors get wrong over and over again.)
Curious: you refer to "a decade of actual evidence", but you fail to
refer to any actual evidence. I'm eager to see that evidence; could
you share it with us? Thank you.
>
>> I have been a Java programmer for some years, and
>> still find that convention absurd, horrible, and annoying. I'll agree
>> that CURIEs are ugly, and maybe hard to understand, but reversed
>> domains are equally ugly and hard to understand.
>
> Problems shared by CURIEs, URIs and reverse DNS names:
>  * Long.
>  * Identifiers outlive organization charts.
Ehm. CURIEs ain't really long: the main point of prefixes is to make
them as short as reasonably possible.
Good identifiers outlive bad organization charts. Good organization
outlives bad identifiers. Good organization and good identifier tend
to outlive the context they are used in.
>
> Problems that reverse DNS names don't have but CURIEs and URIs do have:
>  * "http://" 7 characters of even extra length.
>  * Affordance of dereferencability when mere identifier sementics are meant.
A CURIE (at least as typed by an author) doesn't have the "http://":
it is a prefix, a colon, and whatever goes after it. Once resolved
(ie: after replacing the prefix and colon by what the prefix
represents) what you get is no longer a CURIE, but a URI like the ones
you'd type in your browser or inside a link's href attribute.
Derefercability is not a problem on itself: having more than what is
strictly needed can be either irrelevant or an advantage, not a
problem. Of course, it *may* be the cause of some actual problem, but
in that case you should rather describe the problem itself, so it can
be evaluated.
>
> Problems that reverse DNS names and URIs don't have but CURIEs have:
>  * Prefix-based indirection.
Indirection can't be taken as a problem when most currently used RDFa
tools don't use it at all (which proves that they can work without
relying on it). Sure, it's not as big an advantage as some may claim
it to be. But the ability of indirection itself, even if not 100%
guaranteed to work, it is an actual advantage. As a real world
example, I have been able to learn about vocabularies I didn't know by
following the "links" on prefix declarations in documents using them.
>  * Violation of the DOM Consistency Design Principle if xmlns:foo used.
*if* xmlns:foo is used. Very strong emphasis on the conditional, and
on the multiple possibilities that have already been proposed to deal
with this.

>>> (I understand that if the microdata syntax offered no advantages over
>>> RDFa,
>>> then it would be a wasted effort to diverge.
>>
>> Which are the advantages it offers?
>
> The syntax is simpler for the use cases it was designed for. It uses a
> simpler conceptual model (trees as opposed to graphs). It allows short token
> identifiers. It doesn't use prefix-based indirection. It doesn't violate the
> DOM Consistency Design Principle.
Ok, the syntax is simpler for a subset of the use cases; but it leaves
entirely out the rest of use cases. The conceptual model is more of
the same: simplifies the things for a subset at the expense of failing
for the rest of cases.
If these cases adressed actually are a majority, then I find ok and
advisable to favor them from the syntax, but not at the expense of
leaving other cases out.
The point of not using indirection is not an advantge because, as
stated above, such indirection is not a real problem.
The DOM Consistency again is not an advantage of the microdata syntax
because this could have been fulfilled with other syntaxes as well.

> On May 15, 2009, at 14:11, Eduard Pascual wrote:
>
>> On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>>>
>>> [...]
>>> From my cursory study, I think microdata could subsume many of the use
>>> cases
>>> of both microformats and RDFa.
>>
>> Maybe. But microformats and RDFa can handle *all* of these cases.
>> Again, which are the benefits of creating something entirely new to
>> replace what already exists while it can't even handle all the cases
>> of what it is replacing?
>
> Compared to microformats, microdata defines the processing model and
> conformance criteria. The microformats community has failed to provide
> processing model and conformance criteria on similar level of detail. The
> processing model side is perceived to be such a serious issue that the lack
> of a unified microformats parsing spec is cited as a motivation to use RDFa
> instead of microformats.
A processing model and conformance criteria for RDFa in HTML5 could
have been perfectly defined, so it is just not enough to justify the
new syntax. The syntax needs to be justified on its own merits, rather
than adding extra work to it that could have been done for any other
syntax.

>>> It seems to me that it avoids much of what microformats advocates find
>>> objectionable
>>
>> Could you specify, please? Do you mean anything else than WHATWG's
>> almost irrational hate toward CURIEs and everything that involves
>> prefixes?
>
> RDFa uses a data model that is an overkill for the use cases.
Which use cases? If you arbitrarily define a subset of use cases that
exclude everything that microdata can't handle, of course a more
complete approach looks like an overkill; but this would be an act of
self-deceit. For anyone concerned by those excluded use cases,
resolving all others is the same as not resolving anything.
Besides that, the data model of RDFa is the RDF data model, which is
extremely simple (just "node to node" relations, where relations can
go in one direction or the other between any two nodes). To top it up,
these multi-dimensional graphs can be serialized as just a mere list
of relationships; so the data model is actually clean, simple, and
still incredibly flexible. The data model in microdata is actually
more complex, since it is "RDF's plus this and that restriction"; and
definitely far less flexible.
>>> but at the same time it seems it can represent a full RDF data
>>> model.
>>
>> No, it *can't* represent a full RDF model: it has already been shown
>> several times on this thread.
>
> That's a feature.
What?? Being unable to deal with all the use cases is a feature??
Please, elaborate on that; because the way you state it is a blatant
falacy.

>> Wait. Are you refering to microdata as an incremental improvement over
>> RDFa?? IMO, it's rather a decremental enworsement.
>
> That depends on the point of view. I'm sensing two major points of view:
>
> 1) Graphs are more general than trees. Hence, being able to serialize graphs
> is better.
>
> 2) Graphs are more general than trees. Hence, graphs are harder to design
> UIs for, harder to traverse and harder for authors to grasp. Hence, if trees
> are enough to address use cases, we should only enable trees to be
> serialized.
¬¬ Again, what's your basis to decide that "trees are enough to
address use cases"?? Of course, they are enough to solve some use
cases, but the convenience of dealing with just trees is not worth
sacrificing the needs of those use cases you are arbirarily deciding
to ignore.
> I subscribe to view #2, and it seems that trees are indeed enough for the
> use cases (that were stipulated by the pro-graph people!).
>
>> - Microdata can't represent the full RDF data model (while RDFa can):
>> some complex structures are just not expressable with microdata.
>
> That's not a use case. That's "theoretical purity".
It's not "theoretical purity", it's something simpler:
*extensibility*. And, with over two decades between versions of the
specs, this is a strong requirement: if a problem is noticed after
HTML5 becomes "the standard", it's essential to be able to solve it
without waiting 10 or 20 years for HTML6 to come out. In addition,
your alleged "simplified" data model is actually an over-complication,
as it is defined in the form of restrictions and/or limitations over
RDF's model. Try to explain what can be represented in RDF, and what
can be represented with microdata, and you'll see what's simpler.

>
>> - Microdata relies on reversed domains. While some people argue these
>> to be better than CURIEs, they are equally horrendous for the average
>> user, and have the additional disadvantage that they don't map to
>> anything useful (if they map to something at all), while CURIEs map to
>> the descriptions and/or definitions of what they represent.
>
> I consider it an advantage that reverse domains don't suggest that you
> should try dereferencing identifiers as if they were addresses.
But they still try to look address-like enough. Lots of users will try
to "reorder" the domain and recover the address behind it (without
knowing that there isn't a real address behind it), with the obvious
confussion when they fail at all attempts (since something that
doesn't exist can't be recovered).
If you really want identifiers that are not addresses at all, you need
identifiers that don't look at all as addresses. In other words, the
advantage you suggest becomes a fake one when the identifiers are
acually made of the same elements as addresses are made of. Does
"foaf:Person" looks more like an address than
"org.foaf-project.Person"? A CURIE doesn't suggest you that you should
try dereferencing it; but if you actually do you get something. Your
advantage actually applies to most CURIEs. Of course, someone could
try to follow a prefix definition (which looks much more like an
address, and most often a browser would manage to resolve it to
something), but what they get then (a formal, yet human readable,
description of the vocabulary such a prefix refers to) is definitely
useful.
>
> --
> Henri Sivonen
> hsivonen at iki.fi
> http://hsivonen.iki.fi/
>
>
>