[whatwg] Trying to work out the problems solved by RDFa
Benjamin Hawkes-Lewis
bhawkeslewis at googlemail.com
Wed Feb 4 00:13:05 PST 2009
On 4/2/09 03:15, Calogero Alex Baldacchino wrote:
> For what concerns XHTML, I disagree with the introduction of RDFa
> attribute into the basic namespace, and I wouldn't encourage the same in
> HTML5 spec. In first place, I think there is a possible conflict with
> respect to the "content" attribute semantics, because it now requires a
> different processing when used as an RDFa attribute and as a <meta>
> attribute associated to an "http-equiv" or a "name" value (for instance).
What conflict?
1. Attributes in XHTML can be distinguished by the elements they apply
to as well as their name (e.g. the "name" attribute).
2. In XHTML+RDFa, "content" actually means the same thing on "meta" as
on any other element in XHTML, which is presumably why they reused that
attribute rather than introducing a new (better-named?) one:
http://www.w3.org/TR/rdfa-syntax/#rdfa-attributes
> In second place, it might be confusing for authors and lead to the
> misconception that every xhtml 1.x processor is also capable to process
> rdfa metadata (this is a limit of namespace + dtd/schema based
> modularization, because one can define the structure of a document, but
> not "orthogonal" behaviours requiring a specific support, not covered by
> the basic document model - such as collecting rdf triples declared by
> rdfa attributes, or calling a plugin and embedding its output - however,
> defining a proper namespace, maybe including its creation date somehow,
> may suggest what to expect from UAs).
There's no way to query a user agent about support for the
specifications associated with a particular namespace, and namespaces
are an unreliable guide to what user agents actually support, so I don't
buy this concern.
Existing XHTML 1.x user agents don't always implement all the features
of XHTML 1.x (e.g. exposing "longdesc" and "cite" to the user). HTML5 is
introducing new elements and attributes into the same namespace, and
authors would be wrong to assume that any XHTML-supporting browser will
know what to do with them beyond inserting them into the DOM. XHTML
modularization means you can't count on an XHTML user agent to implement
any particular feature in the XHTML namespace.
A more reliable guide to what user agents support is looking at the list
of supported features (as opposed to namespaces or modules or any other
proxy) in their documentation.
> In third place, creating a different namespace would have resulted in a
> far easier introduction of RDFa attributes into other xml languages
> without having to change the language to host them (by the way, the
> xhtml namespace and a related prefix can be used, but this require a
> more specific support due to the "content" attribute issue, especially
> by UAs not supporting DTDs or schemata - that is, what should happen if
> an element were declared with both xhtml:name or xhtml:http-equiv,
> xhtml:content and xhtml:datatype, in an xml document accepting any
> attributes from external namespaces?
I cannot understand how RDFa attributes in a different namespace would
be easier to reuse either in another language or a XML document where
the host is not XHTML.
"content" and "datatype" mean the same on all elements, so your
particular example seems like a non-problem to me - at least from the
perspective of RDFa, which doesn't define processing for "name" or
"http-equiv".
In so far as there is a problem, it's already a problem with
bog-standard XHTML. How should <myml:bar xhtml:name="foo"
xhtml:http-equiv="baz" xhtml:content="quux"> be processed?
> of course, this is solvable, but
> rdfa:content, rdfa:datatype and so on would make things easier, or at
> least _cleaner_ and less confusing for authors having to understand that
> an XML and RDF processor can/must support the xhtml namespace and its
> _whole_ semantics, not just dom-related structures, but limited to RDFa
> attributes, so that no <meta> or <object> or <link> can be used hoping
> their semantics is supported, despite the support for the xhtml
> namespace...).
An "XML and RDF processor" doesn't have to support XHTML or RDFA - XML
and RDF are independent specifications.
A conforming XHTML+RDFa UA "user agent MUST support all of the features
required in this specification. A conforming user agent must also
support the User Agent conformance requirements as defined in XHTML
Modularization [XHTMLMOD] section on "XHTML Family User Agent Conformance".
http://www.w3.org/TR/rdfa-syntax/#uaconf
Those further requirements can be read at:
http://www.w3.org/TR/xhtml-modularization/conformance.html#s_conform_user_agent
An XHTML+RDFa conforming user agent does not have to implement "meta",
"object", or "link", and as a explained above, authors cannot assume
support for particular features based on namespaces.
> Also there might have been fewer attributes, each one
> with a different semantic (assuming someone might not find useful to
> have a link with rel="stylesheet" representing a triple, for instance).
I don't follow. link with rel="stylesheet" _does_ represent information
expressible as a triple, why would it be useful to pretend otherwise?
And how would doing so make for fewer attributes?
> If there were a general agreement, a new element/attribute would be
> introduced as a result of a "bottom up" process (starting from
> experimentations) integrated with a "top down" community evaluation -
> for specific purposes, not generic machine exposure, I mean.
There is no general agreement to that AFAICT, and I doubt think using
unstandardized elements or attributes or using data-* for public use
would be good approaches to extending HTML: the former blocks potential
extension points (e.g. "canvas") and the later pointlessly introduces
the risk that a private use might be confused with a public one.
> (I'm not sure a generic machine data attribute - in general, not just
> referring to rdfa - would solve that, because each new occurrence of the
> problem might require a "brand new" datatype that only newer, updated
> UAs would understand (older ones would just parse the attribute and
> provide it as a string for further elaboration by a script, at most, but
> this might not be much better than using a data-* attribute for private
> script consumption), therefore, that wouldn't be necessarily different
> than creating a new appropriate attribute/element as needed and
> providing such new feature in newer, compliant UAs).
It would be very different in practice, because (like new "class"
names), new "content" values wouldn't need to go through the W3C/WHATWG
standards process.
That has a cost of course. You might end up with a worse design,
especially if you don't go through a community like microformats. But
that cost arguably isn't so bad when you're talking about embedding
arbitrary data rather than features like "canvas" or "datagrid" that
require new parsing, DOM APIs, and user interface from popular user
agents. This cost appears to be acceptable in the case of microformat
"class" names, for example. Now, you could already embed data with a bad
design using HTML5's other extension mechanisms (e.g. "script"). It's
just that microformats choose to abuse other attributes ("title")
instead, partly because they allow you to wrap some human-readable
content with its machine-readable equivalent (i.e. it's a more
"markup-like" way of doing things). My feeling is that the cost of bad
designs for embedded data is (1) unavoidable and (2) less than the
benefits of avoiding misuse of other (X)HTML features for embedding data.
--
Benjamin Hawkes-Lewis
More information about the whatwg
mailing list