[whatwg] Ghosts from the past and the semantic Web

Fri Aug 29 12:21:43 PDT 2008

(Note: I've been discussing just such a CSS-like rdf format with Ben
offlist, inspired directly by Eduard's proposal earlier.)

On Fri, Aug 29, 2008 at 1:09 PM, Ben Adida <ben at adida.net> wrote:

> Greg Houston wrote:
> > My suggestion keeps the metadata code tidy, and more human readable.
> > Sprawling out all the different metadata properties just makes a huge
> > mess of the markup.
>
> I understand how it would seem messy the first few times you look at it,
> but I'm having trouble seeing how your proposal is any less messy. If
> you want a lot of metadata, you've got a lot of markup. "Messy" is
> subjective.
>
> > This sort of mess isn't acceptable to me:
> >
> > <div class="vcard" id="weborganics"
> >                xmlns:foaf="http://xmlns.com/foaf/0.1/"
> >                typeof="foaf:Person"
> >                about="#weborganics">
> > <p><span property="foaf:name" class="fn">Martin McEvoy</span></p>
> > <p rel="foaf:img">
> > <img alt="weborganics" src="http://weborganics.co.uk/images/me.jpg"
> > class="photo"/>
> > </p>
> > <p>Contact: <a rel="foaf:mbox" title="Email" class="email"
> > href="mailto:info at weborganics.co.uk">Email</a>
> > Web: <a rel="foaf:weblog me" class="url"
> > href="http://weborganics.co.uk/index.xhtml">WebOrganics</a></p>
> > <div class="geo" id="weblog" rel="foaf:based_near"
> >        xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
> >        <span typeof="geo:Point" about="#weblog">
> >                <abbr property="geo:lat" content="53.7552" title="53.7552"
> > class="latitude">N 53.7552</abbr>,
> >                <abbr property="geo:long" content="-2.3675"
> title="-2.3675"
> > class="longitude">W -2.3675</abbr>
> >        </span>
> > </div>
>
> Which part of this is "messy" and "unacceptable?"
>
> This particular example is trying to do both microformats and quite a
> bit of RDFa. Think of it as an advanced example, which certainly does
> complicate things a bit. But other than the whitespace which isn't
> conducive to easy reading, what are the problems?

It's... the whole thing.  Does that not put you *directly* in mind of the
bad old days of presentational html, when every file was fat with useless
cruft?  CSS pushed all that out of the html.  Very often it saves us effort,
as we need only say something once in the CSS and it applies to every page
we link it into.  Even when it doesn't, though, even when we're crafting a
single specifically targeted chunk of CSS  for one single page on our
website, the combination of CSS+HTML makes the entire thing cleaner and more
maintainable.  Sometimes it actually requires *more* coding, but that code
is generally much easier to understand and to edit later.  Arguably, it
makes it more aesthetically pleasing too.

Exact same thing here.  As you say, if we're going to have metadata, we
*can't* separate it from the content - it must *be* the content.  What we
want is merely a way of specifying which content is which metadata.  This
exemplifies DRY, as you're not repeating the important data - it appears
only once in the actual content, so if it's updated there the metadata is
also updated for free (being the same thing).

It's just that... syntax!  It's horrible!  It's ugly!  It's bloated!  It's
useful when you want to present a single copy-paste-able chunk of html for
clueless web authors to paste into their blog or facebook account, but us
professional authors recoil in horror at such a thing, and with good
reason.  You *can* pull this stuff out into a separate chunk.  We can even
follow the useful example of CSS and allow partially-inline coding with a
<metadata> tag, or fully separated stuff in another file.

This is *not* separation of content and metadata.  It's separation of
content and *classification*.  We don't need classification inline to reduce
data rot, because the data is still in a single, primary location - the
content itself.  Nor do we need it inline to enable cross-format
metadata-faithful transfer - if this ever becomes a convenient reality, it
will be performed by rdf-aware user agents anyway, which have already built
up an internal RDF graph of the data in the page.  They can export in
appropriate formats by themselves.

As an added bonus, this kills the entire qname/curie/uri debate completely,
as that data is pulled out to the classification format where you don't have
to worry about it messing with dom parsing or whatever.

I've converted Greg Houston's complex example into my early-draft syntax
that I discussed with you off-list, and put it up on my site.[1]  In this
example the CRDF code is taller (though the metadata will likely be put away
in a separate file and just linked in, so that's not an issue) and larger
(by 90 bytes), but if you use a single extra point or another vcard on your
site, you immediately see savings.  What's more, the code is much
*cleaner*.  It has a bit more elements than I'd prefer to see, but that's a
tradeoff you have to make with any categorization scheme because it's
difficult to target text directly.  The whole thing is much easier to
understand now, though, and that's the sort of cognitive savings that are
*really* important to an author.

(Note that I'm going off of pure intuition as to how foaf and geo work, so I
may have some details wrong.  Corrections are invited offlist.)

[1]: http://www.xanthir.com/rdfa-vs-crdf.php

~TJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20080829/be6a14fb/attachment.htm>