[whatwg] Trying to work out the problems solved by RDFa

Tue Feb 3 19:15:42 PST 2009

Benjamin Hawkes-Lewis ha scritto:
> On 12/1/09 20:26, Calogero Alex Baldacchino wrote:
>> I just mean that, as far as I know, there is no official standard
>> requiring UAs to support (parse and expose through the DOM) attributes
>> and elements which are not part of the HTML language but are found in
>> text/html documents.
>
> Perhaps, but then prior to HTML5, much of what practical user agents 
> must do with HTML has not been required by any official standard. ;)
>
> RFC 2854 does say that "Due to the long and distributed development of 
> HTML, current practice on the Internet includes a wide variety of HTML 
> variants. Implementors of text/html interpreters must be prepared to 
> be 'bug-compatible' with popular browsers in order to work with many 
> HTML documents available the Internet."
>
> http://tools.ietf.org/html/rfc2854
>
> HTML 4.01 does recommend that "[i]f a user agent encounters an element 
> it does not recognize, it should try to render the element's content" 
> and "[i]f a user agent encounters an attribute it does not recognize, 
> it should ignore the entire attribute specification (i.e., the 
> attribute and its value)".
>
> http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.2
>
> Clearly these suggestions are incompatible with respect to attributes; 
> AFAIK all popular UAs insert unrecognized attributes into the DOM and 
> plenty of web content depends on that behaviour.
>

Very, very true. HTML 4.01 also says the recommended behaviours are ment 
"to facilitate experimentation and interoperability between 
implementations of various versions of HTML", whereas the "specification 
does not define how conforming user agents handle general error 
conditions, including how user agents behave when they encounter 
elements, attributes, attribute values, or entities not specified in 
this document", and since "user agents may vary in how they handle error 
conditions, authors and users must not rely on specific error recovery 
behavior". I just think the last sentence defines a best practice 
everyone should follow instead of relying on a common quirk supporting 
invalid markup. However, beside something being a good or bad practice, 
there will always be authors doing whatever they please, therefore it is 
quite safe to assume UAs will always expose invalid/unrecognized 
attributes (that's unavoidable, given the need for backward compatibility).

>
> Just like proprietary elements/attributes introduced with user agent 
> behaviours (marquee, autocomplete, canvas), scripted uses of "data-*" 
> might suggest new features to be added to HTML, which would then 
> become requirements for UAs.
>
> But unlike proprietary elements/attributes introduced with user agent 
> behaviors, scripted uses of "data-*" do not impose new processing 
> requirements on UAs.
>
> Therefore, unlike proprietary elements/attributes introduced with user 
> agent behaviors, scripted uses of "data-*" impose _no_ design 
> constraints on new features.
>
> Establishing user agent behaviours with "data-*" attributes, on the 
> other hand, imposes almost as many design constraints as establishing 
> them with proprietary elements and attributes. (There's just less 
> pollution of the primary HTML "namespace".)
>
> If no RDFa was in deployment, you could argue it would be less wrong 
> (from this perspective) to abuse "data-*" than introduce new attributes.

Oh, well, I don't want to argue about that. For me the idea to use 
"data-rdfa-*" can rest in peace, since in practice it's not different 
from using RDFa attributes as they are, at least as far as they're 
handled by scripts, either client- or server-side. However I think that,

* actually it seems not to be enough clear what UAs not involved in a 
particular project should do with RDFa attributes, beside exposing their 
content for the purpose of a script elaboration, whereas a precise 
behaviour should be defined, as well as an eventual class of UAs clearly 
identified as not required to support it, and eventual caveats on 
possible problems and relative solutions, before introducing any new 
elements/attributes in a formal specification;

* actual deployment might be harmed by the use of xml namespaces in html 
serialization.

Also, I see design suggestions more than impositions. If a new (and 
proprietary/private) attribute/element/convention is convincingly 
useful/needed, it is supported by other UAs and introduced in a 
specification, otherwise, if a not enough significant number of pages 
would be broken, it might even be redefined for use with a different 
semantics. And a possible process involving data-* attributes 
would/could be experiment privately => extend the scale involving other 
people finding it useful for their needs => get it in the primary 
namespace of an official specification (discarding the "data-" part and 
any other useless parts of the experimental name), so that existing 
pages may still work with their custom scripts or easily migrate to the 
new standard (and benefit of the new default support) by running a 
simple regex.

>
> But to the extent that these attributes are already in use in 
> text/html and standardized within the "http://www.w3.org/1999/xhtml" 
> namespace, processing requirements are effectively already being 
> imposed on user agents (such as not introducing conflicting treatment 
> of the "about" attribute). All that adding user agent behaviours with 
> "data-rdfa*" attributes would do at this point is add _more_ 
> requirements, without rescuing the polluted attributes.
>

For what concerns html serialization, introducing xml namespaces (and, 
thus, xml extensibility - as a whole or partly) might be worse than 
breaking current experimentaions. Since xhtml about all W3C production 
has converged towards XML, suggesting a direction the web didn't 
embraced completely, and instead causing objections with respect to xml 
features felt as useless or unwanted by a good number of people, herein 
namespaces and extensibility, hence the need to evolve html 
serialization to address new demands without forcing a migration towards 
xml. Therefore, introducing pieces of xml inside text/html documents may 
be problematic; of course, other surrogate mechanisms might be defined 
to indicate a namespace for the sole purposes of RDFa, but this would 
rise consitence issues between html and xhtml (as reported by Henri 
Sivonen), perhaps solvable by specifing a double mechanism as working 
for xhtml (the html specific one, and the "classic" xml one), but such a 
choice might add complexity to UAs and be confusing for authors.

For what concerns XHTML, I disagree with the introduction of RDFa 
attribute into the basic namespace, and I wouldn't encourage the same in 
HTML5 spec. In first place, I think there is a possible conflict with 
respect to the "content" attribute semantics, because it now requires a 
different processing when used as an RDFa attribute and as a <meta> 
attribute associated to an "http-equiv" or a "name" value (for instance).

In second place, it might be confusing for authors and lead to the 
misconception that every xhtml 1.x processor is also capable to process 
rdfa metadata (this is a limit of namespace + dtd/schema based 
modularization, because one can define the structure of a document, but 
not "orthogonal" behaviours requiring a specific support, not covered by 
the basic document model - such as collecting rdf triples declared by 
rdfa attributes, or calling a plugin and embedding its output - however, 
defining a proper namespace, maybe including its creation date somehow, 
may suggest what to expect from UAs).

In third place, creating a different namespace would have resulted in a 
far easier introduction of RDFa attributes into other xml languages 
without having to change the language to host them (by the way, the 
xhtml namespace and a related prefix can be used, but this require a 
more specific support due to the "content" attribute issue, especially 
by UAs not supporting DTDs or schemata - that is, what should happen if 
an element were declared with both xhtml:name or xhtml:http-equiv, 
xhtml:content and xhtml:datatype, in an xml document accepting any 
attributes from external namespaces? of course, this is solvable, but 
rdfa:content, rdfa:datatype and so on would make things easier, or at 
least _cleaner_ and less confusing for authors having to understand that 
an XML and RDF processor can/must support the xhtml namespace and its 
_whole_ semantics, not just dom-related structures, but limited to RDFa 
attributes, so that no <meta> or <object> or <link> can be used hoping 
their semantics is supported, despite the support for the xhtml 
namespace...). Also there might have been fewer attributes, each one 
with a different semantic (assuming someone might not find useful to 
have a link with rel="stylesheet" representing a triple, for instance).

Of course, this is my opinion.

> > I also guess that,
>> if microformats experience (or the "realworld semantics" they claim to
>> be based on) had suggested the need to add a new element/attribute to
>> the language, a new element/attribute would have been added.
>
> I'm not really sure what you mean.
>
> (It's watching the microformats community struggle with the problem of 
> encoding machine data equivalents, for things like dates and telephone 
> number types and measurements, that persuaded me HTML5 should include 
> a generic machine data attribute, because it seems likely to me that 
> the problem will be recurrent.)
>
> -- 
> Benjamin Hawkes-Lewis

If there were a general agreement, a new element/attribute would be 
introduced as a result of a "bottom up" process (starting from 
experimentations) integrated with a "top down" community evaluation - 
for specific purposes, not generic machine exposure, I mean.

(I'm not sure a generic machine data attribute - in general, not just 
referring to rdfa - would solve that, because each new occurrence of the 
problem might require a "brand new" datatype that only newer, updated 
UAs would understand (older ones would just parse the attribute and 
provide it as a string for further elaboration by a script, at most, but 
this might not be much better than using a data-* attribute for private 
script consumption), therefore, that wouldn't be necessarily different 
than creating a new appropriate attribute/element as needed and 
providing such new feature in newer, compliant UAs).

WBR, Alex

 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

 Sponsor:
 Blu American Express: gratuita a vita! 
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8613&d=4-2