[whatwg] Micro-data/Microformats/RDFa Interoperability Requirement

Thu May 7 19:40:49 PDT 2009

Ian Hickson wrote:
> On Thu, 7 May 2009, Manu Sporny wrote:
>> So, this argument isn't "Don't use @class at all", but rather "Don't 
>> create ambiguity in @class where there is none currently."
> 
> I agree that if HTML5 re-uses vocabulary "root" terms from Microformats, 
> it should either not use class="", or should do so in a way that does not 
> cause existing consumers of Microformats to treat existing content in a 
> way that is not compliant with HTML5.

Good, there's agreement there, then. :)

>> Not re-defining things to mean something different than the Microformats
>> community has already defined should be a design requirement.
> 
> That requirement has been broken for some time, because the definitions of 
> rel="license" and rel="tag" in HTML5 differ from those on the Microformats 
> wiki. (The definitions in HTML5 are closer to what existing 
> implementations and content rely on, though, so I don't think it violates 
> the requirement to "not cause existing consumers of Microformats to treat 
> existing content in a way that is not compliant with HTML5".)

That's certainly not what the WHATWG blog stated just 20 days ago for
rel="license":

http://blog.whatwg.org/the-road-to-html-5-link-relations#rel-license
http://blog.whatwg.org/the-road-to-html-5-link-relations#rel-tag

and the spec doesn't seem to clearly outline the difference in
definition either (at least, that's not my reading of the spec):

http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#link-type-license
http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#link-type-tag

Am I looking at the wrong spec? How can the definitions of rel="license"
and rel="tag" be different when Mark Pilgrim has stated very clearly
that they were standardized in the Microformats community and the HTML5
spec makes no mention of any sort of difference in definition from HTML5
and Microformats?

>>> and RDFa because any use of the "rel" attribute can do the same.
>> No, not /any/ use - /specific/ uses of rel, and then only if the HTML5 
>> micro-data solution does something that is counter to how RDFa uses the 
>> attribute or the value.
> 
> Actually as far as I can tell, it really is any use.

No it isn't, more below. :)

> The RDFa specification is very confusing to me (e.g. I don't understand 
> how the normative processing model is separate from the section "RDFa 
> Processing in detail"), so I may be misinterpreting things, but as far as 
> I can tell:
> 
>   <html xmlns="http://www.w3.org/1999/xhtml">
>    <head>
>     <base href="http://example.com/"/>
>     <link about="http://example.net/"
>           rel="dc.author" 
>           href="http://a.example.org/"/>
>    ...
> 
> ...will result in the following triple:
> 
>    <http://example.net/> <http://example.com/dc.author> <http://a.example.org/> .

Two corrections:

The first is that an RDFa processor would not generate this triple. It
wouldn't know what "dc.author" meant. As a general design principle,
RDFa ignores all values it doesn't know anything about. The processor
would detect the subject being set via the @about, then it would go
searching for a known CURIE value or (in XHTML1.1+RDFa) a known
@rel/@rev reserved attribute value (such as "next", "stylesheet", etc.).
Since "dc.author" is neither a known CURIE value or a known reserved
word, it would exit the LINK element without creating any triples.

The second is that you are assuming that @rel/@rev reserved attribute
values are the same for all XML and non-XML family languages - they
aren't. Reserved words such as "next" and "prev" can be defined per
language - or, none can be defined... it's up to the language adopting
RDFa to define those values. For a list of the current values, you can see:

http://www.w3.org/TR/rdfa-syntax/#relValues

>> The most important issue with RDFa is not re-using attributes already 
>> defined by XHTML1.1+RDFa without them having the exact same use in 
>> HTML5. Attributes like @about, @property, @datatype, @resource, @content 
>> and @typeof.
> 
> Given that HTML4 already has five of RDFa's 10 attributes (not counting 
> prefix declaration mechanisms), and defines processing for these that 
> conflicts with RDFa's (e.g. as described below), it's not clear to me what 
> benefit there would be in completely avoiding the other five attributes if 
> there was a need to use such an attribute.

Note that I didn't say "completely avoid" attributes... I said "re-using
attributes already defined by XHTML1.1+RDFa without them having the
exact same use in HTML5". At this point, I don't see any conflict
between XHTML1.1+RDFa and HTML5.

I was asking for semantic equivalence when adopting attributes, not
complete avoidance.

> For example, it would be somewhat presumptious of RDFa to prevent any 
> future version of HTML from being able to use the word "resource" as an 
> attribute name. What if we want to extend the forms features to have an 
> XForms "datatype" compatibility layer; why should we not be able to use 
> the "datatype" and "typeof" attributes?

As long as their legacy nature was preserved, and those uses didn't
create ambiguity in RDFa processors and semantic equivalence was
ensured, I don't see why they shouldn't be re-used.

> Surely this is what namespaces were intended for.

Uhh, what sort of namespaces are we talking about here? xmlns-style,
namespaces? I thought those were on their way out in HTML5? Henri seems
to hate them with a passion.

Are we talking about the implied HTML5 document namespace? If so, in a
world where there is no RDFa in HTML5, I'm guessing that web developers
are going to shove XHTML1.1+RDFa into HTML5 documents anyway, and at
that point the document namespace isn't going to mean much - which is
one of the reasons that I'm so concerned about HTML5 re-defining
attributes and attribute values.

>>> Similarly, the rules for handling CURIEs in RDFa, especially in 
>>> rel="", are already incompatible with HTML4 and HTML5 rules. 

No, they are not, as explained above in the @rel reserved attribute
values discussion.

>>> For 
>>> example, the way that "n:next" and "next" can end up being equivalent 
>>> in RDFa processors despite being different per HTML rules (assuming an 
>>> "n" namespace is appropriately declared).
>> If they end up being equivalent in RDFa, the RDFa author did so 
>> explicitly when declaring the 'n' prefix to the default prefix mapping 
>> and we should not second-guess the authors intentions.
> 
> My only point is that it is not compatible with HTML4 and HTML5, because 
> they end up with different results in the same situation (one can treat 
> two different values as the same, while the other can treat two different 
> values as different).

It is only not compatible with HTML5 if this community chooses for it to
not be compatible with HTML5. Do you agree or disagree that we shouldn't
second guess the authors intentions if they go out of their way to
declare a mapping for 'n'?

> Another example would be the following:
> 
>   <html xmlns="http://www.w3.org/1999/xhtml">
>    <head about="">
>     <link rel="stylesheet" href="...">
>     <link rel="STYLESHEET" href="...">
>     ...
> 
> ...which would be treated as two different triples in an RDFa processor, 
> but treated as two identical imports according to an HTML4/5 processor.

Hmmm, this is a good technical point, but may be a bug in the spec. My
processor doesn't do a case-insensitive match, but perhaps the RDFa
processor should do a case-insensitive match for reserved words. I know
we discussed this at one point, but can't remember what we decided. I'll
post something about this to the RDFa mailing list to see if anyone
there can remember.

If we do a case-insensitive match, which we probably should, the point
still stands - the semantic meaning of the statements are equivalent.

> Another example would be:
> 
>   <html xmlns="http://www.w3.org/1999/xhtml">
>    <head about="">
>     <link rel="stylesheet alternate next" href="...">
>     ...
> 
> ...which in RDFa would cause the following triples to be created:
> 
>    <> <http://www.w3.org/1999/xhtml/vocab#stylesheet> <...> .
>    <> <http://www.w3.org/1999/xhtml/vocab#alternate> <...> .
>    <> <http://www.w3.org/1999/xhtml/vocab#next> <...> .
> 
> ...but according to HTML4/5, is really only two relations (an alternativee 
> stylesheet and the next document).

That's a very strained argument. The contents of @rel are supposed to be
LinkTypes, which are space-separated keywords:

http://www.w3.org/TR/html4/types.html#type-links

It just so happens that when you use "alternate" and "stylesheet"
together that the browser is supposed to recognize that an alternate
stylesheet exists.

I've always thought that this was an abuse of the rel attribute - there
should have been an "alternate-stylesheet" LinkType, but what's done is
done.

AFAIK, there is nothing in the HTML4 or HTML5 spec that states that for
rel="alternate stylesheet" that there is only one relation. There are
three relationships because there are three separate LinkTypes  specified.

>>> I don't think there's much that can be done about this (this isn't 
>>> something that we can change HTML5 rules for; browser vendors would 
>>> not accept having to resolve QNames in rel="" attributes as part of 
>>> processing, for one).
>>
>> This has been explained many[2] times[3] now[4], CURIEs are not QNames. 
>> If you have an issue with CURIEs, please state the exact issue that you 
>> have with CURIEs and don't use a false analogy.
> 
> Browser vendors would not accept having to resolve prefixes in attribute 
> values as part of processing link relations.

Why not? They're already required to resolve values using an external
document for CSS when processing all @class attribute. That is a far
steeper requirement than being required to use an internal mapping
mechanism when processing a few attributes.

> (What's the difference between mapping to a full URI and mapping to a 
> namespace,local pair? 

Not much, but a QName is technically much more than that - which is why
I take issue with members of this community continuing to assert that
CURIEs and QNames are equivalent. A QName can be used anywhere in an XML
document, except for in attributes:

http://www.w3.org/TR/REC-xml-names/#ns-qualnames

The namespace and local part are required to be validated.

A CURIE can only be used in very specific attributes defined by a
language (such as XHTML1.1+RDFa or HTML5) and are not required to be
validated:

http://www.w3.org/TR/curie/#sec_1.1.

They differ in:

- The location that they can be used.
- Whether or not they are required to be validated.

The only similarity they have is that they can be expanded to full URIs.

Saying that QNames and CURIEs are equivalent is like saying XHTML2 and
HTML5 are equivalent because they both use <a href="...">. Just because
two technologies have one thing in common doesn't make them equivalent.

> The problem with QNames in attributes is that they 
> require the attribute processor to have information from the namespace 
> processor, and as far as I can tell this continues to exist in RDFa.)

If that's really the problem, why don't you just have a prefix processor
that the attribute processor relies on and drop the namespace processor
entirely?

Why is it such a big deal for the attribute processor to have a
reference to the prefix processor?

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: A Collaborative Distribution Model for Music
http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/