[whatwg] Trying to work out the problems solved by RDFa
Calogero Alex Baldacchino
alex.baldacchino at email.it
Tue Feb 3 19:15:42 PST 2009
Benjamin Hawkes-Lewis ha scritto:
> On 12/1/09 20:26, Calogero Alex Baldacchino wrote:
>> I just mean that, as far as I know, there is no official standard
>> requiring UAs to support (parse and expose through the DOM) attributes
>> and elements which are not part of the HTML language but are found in
>> text/html documents.
>
> Perhaps, but then prior to HTML5, much of what practical user agents
> must do with HTML has not been required by any official standard. ;)
>
> RFC 2854 does say that "Due to the long and distributed development of
> HTML, current practice on the Internet includes a wide variety of HTML
> variants. Implementors of text/html interpreters must be prepared to
> be 'bug-compatible' with popular browsers in order to work with many
> HTML documents available the Internet."
>
> http://tools.ietf.org/html/rfc2854
>
> HTML 4.01 does recommend that "[i]f a user agent encounters an element
> it does not recognize, it should try to render the element's content"
> and "[i]f a user agent encounters an attribute it does not recognize,
> it should ignore the entire attribute specification (i.e., the
> attribute and its value)".
>
> http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.2
>
> Clearly these suggestions are incompatible with respect to attributes;
> AFAIK all popular UAs insert unrecognized attributes into the DOM and
> plenty of web content depends on that behaviour.
>
Very, very true. HTML 4.01 also says the recommended behaviours are ment
"to facilitate experimentation and interoperability between
implementations of various versions of HTML", whereas the "specification
does not define how conforming user agents handle general error
conditions, including how user agents behave when they encounter
elements, attributes, attribute values, or entities not specified in
this document", and since "user agents may vary in how they handle error
conditions, authors and users must not rely on specific error recovery
behavior". I just think the last sentence defines a best practice
everyone should follow instead of relying on a common quirk supporting
invalid markup. However, beside something being a good or bad practice,
there will always be authors doing whatever they please, therefore it is
quite safe to assume UAs will always expose invalid/unrecognized
attributes (that's unavoidable, given the need for backward compatibility).
>
> Just like proprietary elements/attributes introduced with user agent
> behaviours (marquee, autocomplete, canvas), scripted uses of "data-*"
> might suggest new features to be added to HTML, which would then
> become requirements for UAs.
>
> But unlike proprietary elements/attributes introduced with user agent
> behaviors, scripted uses of "data-*" do not impose new processing
> requirements on UAs.
>
> Therefore, unlike proprietary elements/attributes introduced with user
> agent behaviors, scripted uses of "data-*" impose _no_ design
> constraints on new features.
>
> Establishing user agent behaviours with "data-*" attributes, on the
> other hand, imposes almost as many design constraints as establishing
> them with proprietary elements and attributes. (There's just less
> pollution of the primary HTML "namespace".)
>
> If no RDFa was in deployment, you could argue it would be less wrong
> (from this perspective) to abuse "data-*" than introduce new attributes.
Oh, well, I don't want to argue about that. For me the idea to use
"data-rdfa-*" can rest in peace, since in practice it's not different
from using RDFa attributes as they are, at least as far as they're
handled by scripts, either client- or server-side. However I think that,
* actually it seems not to be enough clear what UAs not involved in a
particular project should do with RDFa attributes, beside exposing their
content for the purpose of a script elaboration, whereas a precise
behaviour should be defined, as well as an eventual class of UAs clearly
identified as not required to support it, and eventual caveats on
possible problems and relative solutions, before introducing any new
elements/attributes in a formal specification;
* actual deployment might be harmed by the use of xml namespaces in html
serialization.
Also, I see design suggestions more than impositions. If a new (and
proprietary/private) attribute/element/convention is convincingly
useful/needed, it is supported by other UAs and introduced in a
specification, otherwise, if a not enough significant number of pages
would be broken, it might even be redefined for use with a different
semantics. And a possible process involving data-* attributes
would/could be experiment privately => extend the scale involving other
people finding it useful for their needs => get it in the primary
namespace of an official specification (discarding the "data-" part and
any other useless parts of the experimental name), so that existing
pages may still work with their custom scripts or easily migrate to the
new standard (and benefit of the new default support) by running a
simple regex.
>
> But to the extent that these attributes are already in use in
> text/html and standardized within the "http://www.w3.org/1999/xhtml"
> namespace, processing requirements are effectively already being
> imposed on user agents (such as not introducing conflicting treatment
> of the "about" attribute). All that adding user agent behaviours with
> "data-rdfa*" attributes would do at this point is add _more_
> requirements, without rescuing the polluted attributes.
>
For what concerns html serialization, introducing xml namespaces (and,
thus, xml extensibility - as a whole or partly) might be worse than
breaking current experimentaions. Since xhtml about all W3C production
has converged towards XML, suggesting a direction the web didn't
embraced completely, and instead causing objections with respect to xml
features felt as useless or unwanted by a good number of people, herein
namespaces and extensibility, hence the need to evolve html
serialization to address new demands without forcing a migration towards
xml. Therefore, introducing pieces of xml inside text/html documents may
be problematic; of course, other surrogate mechanisms might be defined
to indicate a namespace for the sole purposes of RDFa, but this would
rise consitence issues between html and xhtml (as reported by Henri
Sivonen), perhaps solvable by specifing a double mechanism as working
for xhtml (the html specific one, and the "classic" xml one), but such a
choice might add complexity to UAs and be confusing for authors.
For what concerns XHTML, I disagree with the introduction of RDFa
attribute into the basic namespace, and I wouldn't encourage the same in
HTML5 spec. In first place, I think there is a possible conflict with
respect to the "content" attribute semantics, because it now requires a
different processing when used as an RDFa attribute and as a <meta>
attribute associated to an "http-equiv" or a "name" value (for instance).
In second place, it might be confusing for authors and lead to the
misconception that every xhtml 1.x processor is also capable to process
rdfa metadata (this is a limit of namespace + dtd/schema based
modularization, because one can define the structure of a document, but
not "orthogonal" behaviours requiring a specific support, not covered by
the basic document model - such as collecting rdf triples declared by
rdfa attributes, or calling a plugin and embedding its output - however,
defining a proper namespace, maybe including its creation date somehow,
may suggest what to expect from UAs).
In third place, creating a different namespace would have resulted in a
far easier introduction of RDFa attributes into other xml languages
without having to change the language to host them (by the way, the
xhtml namespace and a related prefix can be used, but this require a
more specific support due to the "content" attribute issue, especially
by UAs not supporting DTDs or schemata - that is, what should happen if
an element were declared with both xhtml:name or xhtml:http-equiv,
xhtml:content and xhtml:datatype, in an xml document accepting any
attributes from external namespaces? of course, this is solvable, but
rdfa:content, rdfa:datatype and so on would make things easier, or at
least _cleaner_ and less confusing for authors having to understand that
an XML and RDF processor can/must support the xhtml namespace and its
_whole_ semantics, not just dom-related structures, but limited to RDFa
attributes, so that no <meta> or <object> or <link> can be used hoping
their semantics is supported, despite the support for the xhtml
namespace...). Also there might have been fewer attributes, each one
with a different semantic (assuming someone might not find useful to
have a link with rel="stylesheet" representing a triple, for instance).
Of course, this is my opinion.
> > I also guess that,
>> if microformats experience (or the "realworld semantics" they claim to
>> be based on) had suggested the need to add a new element/attribute to
>> the language, a new element/attribute would have been added.
>
> I'm not really sure what you mean.
>
> (It's watching the microformats community struggle with the problem of
> encoding machine data equivalents, for things like dates and telephone
> number types and measurements, that persuaded me HTML5 should include
> a generic machine data attribute, because it seems likely to me that
> the problem will be recurrent.)
>
> --
> Benjamin Hawkes-Lewis
If there were a general agreement, a new element/attribute would be
introduced as a result of a "bottom up" process (starting from
experimentations) integrated with a "top down" community evaluation -
for specific purposes, not generic machine exposure, I mean.
(I'm not sure a generic machine data attribute - in general, not just
referring to rdfa - would solve that, because each new occurrence of the
problem might require a "brand new" datatype that only newer, updated
UAs would understand (older ones would just parse the attribute and
provide it as a string for further elaboration by a script, at most, but
this might not be much better than using a data-* attribute for private
script consumption), therefore, that wouldn't be necessarily different
than creating a new appropriate attribute/element as needed and
providing such new feature in newer, compliant UAs).
WBR, Alex
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
Blu American Express: gratuita a vita!
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8613&d=4-2
More information about the whatwg
mailing list