[whatwg] foreign attributes Re: several messages about XML syntax and HTML5

Elias Torres elias at torrez.us
Tue Dec 5 14:01:40 PST 2006

I'm back. Sorry for the delay.

Ian Hickson wrote:
> On Tue, 5 Dec 2006, Elias Torres wrote:
>> In one of the products, we need two things: one to specify our own piece
>> of structure data (call it microformat, call it RDFa data).
> Could you give an example of the kind of data you're talking about and how 
> you'd use it? Obviously I don't mean to ask you for details that would 
> compromise an NDA or customer information, but I'm curious what level of 
> structure you're talking about. Are we talking about inline data that you 
> want to annotate, like a list of tracks on a CD album which needs artist 
> information and beats-per-minute? Are we talking about manufacturing 
> parameters like instructions for an automated lathe? What's the kind of 
> schema that the data needs? Is it just a list of strings? Strings with 
> associated flags? A two-dimensional row/column deal? Arbitrary tuples with 
> arbitrarily deep nested data?

At the moment we have data defined using by XML schemas that are used by
customers to describe industry-specific information such as automotive
parts. Therefore, we would like to specify objects with multiple levels
deep containing some datatype information. However, it's not limited to
that, we are also working on db roundtripping using RDFa-like mechanisms.

>> Our first problem is parsing microformats. We must somehow explain in 
>> prose how to parse our specific microformats just like hCard/hEvent do 
>> or do it via GRDDL (XSLT transform) or custom JavaScript parsing code. 
>> If you can read in the uF wiki [1] there's really not much guidance on 
>> how to parse one or all of them.
> "Parse" is the wrong word here. How to process the data is not well 
> defined; I and others have sent this feedback to the Microformats 
> community several times in the past.
> _Parsing_ of Microformats is actually well defined; you end up with a DOM, 
> as described by the HTML5 parser spec, and the DOM is a tree, each node 
> of which can have attributes, certain of which -- class, rel, title, id, 
> e.g. -- are especially relevant.

I totally agree that parsing is done by the browser/library and what we
have then is access to the DOM in the case of JavaScript. In other
cases, we could use a python-based SGML parser or custom HTML5 parser.

>> RDFa on the other hand gives a generic parsing mechanism to extract 
>> properties (hopefully as a JSON object) for our [...] JS libraries to 
>> enhance the UI of our applications.
> RDFa gives you no more than HTML5's parsing algorithm does -- you still 
> just end up with an arbitrary blob of data, the meaning of which you have 
> to define.

I respectfully disagree. I'm not sure how familiar you are with RDFa but
it gives specific instructions on how to find/extract tagged data within
the page. In a nutshell, you look for either rel, rev or property in any
element which tell you the relationship between two resources. In most
cases the subject is the web page itself, in others, its specific
elements identified by the id or about attribute and finally, the object
is contents of the element itself. These set of simple rules allows us
to write a single /extractor/ of metadata within the HTML document. In
uF, we have really no way of differentiating between a style and a
property within a class attribute for example therefore extracting
information comes down to a case-by-case basis.

>> Secondly, we need our customers to safely express details about objects 
>> also via HTML using a metadata approach that allows this, currently 
>> RDFa, without worrying about class names collisions.
> While this sounds interesting, I am very skeptical (sorry) that this 
> actually happens, and even more skeptical that if it does, that it should.

Across our Lotus product line, developers are specifying attributes that
they add to a person object for example. These attributes are mostly
associated with each of the components. Let's take URL for example. A
person might have a URL in each one of our offerings: community URL,
blog(s) URL, dogear links URL, activity URL, etc. We can't simply use
the 'url' property in an hCard to denote the special case of URL. An
obvious choice would be to call them blog-url, activity-url, etc.
However, Microformats lack a mechanisms for us to add "new" properties
to an existing microformat like hCard without having to write custom
JavaScript code. RDFa allow us to mix and match properties from
different "schemas" yet (if you believe my explanation of RDFa's parsing
algorithm) provide a common extraction mechanism possibly materialized
as a JSON model.

> Obviously within a walled garden it doesn't matter what the 
> industry-standard specifications say, since interoperability isn't 
> required. And similarly, once you're on the wide Web, proprietary 
> extensions are undesirable, since the end user wouldn't be able to make 
> user of them. I'm not sure which case this is.

HTML pages are one possible representation of resources.  These
resources have data models that exist beyond the html page, frequently
they exist as xml. When these resources are rendered as html we would
like to still be able to tie the visual representation back to the
underlying data model.  This allows us, for example, to deduce that a
person, an event or a customer order is on the page.

Understanding the type of the data on the page allows us to, for
example, dynamically render additional information about a person, or
provide links to their artifacts e.g. their blog.  In the same way our
customers purchase orders can be recognized in a page and additional
information can be dynamically rendered, such as further information
about the supplier.  All this can be achieved without deep integration
as it can occur lazily right within the browser.

We see huge advantages to this late binding model and already have
similar "server-side" approaches in our portal offerings.

>> I hope this gets a bit more concrete, but as I check around tomorrow at 
>> work, I'll try to get more details. Thanks.
> More details would be very useful, yes please.
> Cheers,

There are many different areas of interest in RDFa at IBM including
accessibility work on browsers, our portal-enabled solutions,
collaborative products, search engine, etc. I'm really stretching the
freedom I have to share about these efforts with the hope that we can
help you see that microformats is not adequate enough for a www-wide
mechanism for metadata in HTML IOHO. I know Sam Ruby is interested in
working on a validator and so am I. We see here an opportunity to give
HTML a few missing attributes that will in fact enable us to embed
metadata in a general way as opposed to the approach microformats are
taking today.

For all intensive purposes, we can drop the name RDFa, what we are
really after is a *validated* handful of attributes that are ignored by
the browsers that are specific for encoding "subject/property
relationships" data like microformat without having to depend on
extractor with a priori knowledge of the class values and level they
should be found on.


More information about the whatwg mailing list