[whatwg] Trying to work out the problems solved by RDFa
Calogero Alex Baldacchino
alex.baldacchino at email.it
Fri Jan 9 12:07:47 PST 2009
Julian Reschke ha scritto:
> Calogero Alex Baldacchino wrote:
>> ...
>> This is why I was thinking about somewhat "data-rdfa-about",
>> "data-rdfa-property", "data-rdfa-content" and so on, so that, for the
>> purposes of an RDFa processor working on top of HTML5 UAs (perhaps in
>> a test phase, if needed at all, of course), an element dataset would
>> give access to "rdfa-about", instead of just "about", that is using
>> the prefix "rdfa-" as acting as a namespace prefix in xml (hence, as
>> if there were "rdfa:about" instead of "data-rdfa-about" in the markup).
>> ...
>
> That clashed with the documented purpose of data-*.
Hmm, I'm not sure there is a clash, since I was suggesting a *custom*
and essentially *private* mechanism to experiment with RDFa in
conjunction with HTML serialization, for the *small-scale* needs of some
organizations willing to embed RDFa metadata in text/html documents, and
to exchange them with each other by using a convention likely avoiding
name clashes with other private metadata. Since I think it's unlikely to
find data-rdfa-* used with different semantics in the very same page,
and in a small-scale scenario involving a few *selected* sources for
RDFa-modelled information, it should be likely to know in advance that
someone else is using the same conventions. Such a modelled document
might be used in conjunction with an external RDFa processor, thus
avoiding any direct support in a browser.
However, such a convention might be enough "clash-free" to work on a
wider scale, thus it might become widespread and provide an evidence
that the web /needs/, or at least /has chosen/ to use RDFa as (one of)
the most common way to embed metadata in a document, and such might be
enough to add a native support for the whole range of "RDFa" attributes,
eventually along with support for earlier experimental ones (such as
"data-rdfa-*" and "rdfa:*" ones, for backward compatibility). And
actually I can't see much of a problem if a private-born feature became
the base of a widespread and widely accepted convention (I'm not saying
the spec should name data-rdfa-* as a mean to implement RDFa, instead I
think that, if a general agreement on if and how RDFa must be spec'ed
out and implemented can't be found, such an experiment might be proposed
to the semantic web industry and wait for the results - given a lack in
support might prevent any interested party to use RDFa and HTML5
altogether).
>
> *If* we want to support RDFa, why not add the attributes the way they
> are already named???
>
For instance, to experiment whether it is worth to change the "if we
want" into "we do want", without requiring an early implementation and
specification, nor relying on if and what a certain browser vendor might
want to experiment differently from others (such a convention would only
require support for HTML5 datasets and a script or a plugin capable to
handle them as representing RDFa metadata). -- the point here is that
after introducing data-* attributes as a mean to support custom
attributes any browser vendors might decide to drop support for other
kind of custom attributes in html serialization (that is, for attributes
being neither part of the language nor data-* ones), therefore if they
(or any of them) decided to avoid to support RDFa attributes until they
were introduced in a specification there might be no mean to experiment
with them (in general, that is cross-browser) without resorting either
to data-* or to "rdfa:*" (the latter in xhtml).
Anyway, /in general/ what should a browser do with RDFa metadata, on a
*wide scale*, other than classifying a portion of the open web (e.g. in
its local history), eventually allowing users to select trusted sources?
Actually, I don't think such would bring enough benefits for *average*
users, compared to the risk to get a lot of spam metadata from
/heterogeneous/ sources. I really don't expect average users to
understand how to filter sites basing on metadata reliability (and just
for the purpose to use a metadata-based query interface, because a site
with wrong metadata might still contain usefull informations); instead
they might just try and use a query interface the same way they use a
default search bar, get wrong results (once spam metadata became
widespread) and decide the mechanism doesn't work fine (eventually
complaining for that). A somewhat antispam filter might help, but I
think that understanding if metadata are reliable, that is if they
really correspond to a web page content, is an odd problem to be solved
by a bot without a good degree of Artificial Intelligence (filtering
emails by looking for suspicious patterns is far easier than
implementing a filter capable to /understand/ metadata, /understand/
natural language and compare /semantics/ ).
As well, I don't expect the great majority of web pages to contain
"valid" metadata: most people would not care of them, and a potentially
growing number might copy&paste code containing metadata from other
sites as a kind of template, then edit the content and ignore any
metadata, thus breaking reliability. I do think wide-scale use of
metadata coming from heterogeneous sources can be more harmful than
useful. *If* we do agree that small-scale needs is the main context
where RDFa can bring benefits, perhaps a custom mechanism and external
plugins are all we need; otherwise, it should be proved that /misused/
and /abused/ metadata can be filtered out *easily* and *automatically*,
without requiring average users to understand the problem, nor affecting
the overall efficiency. IMHO.
>> ...
>> However, AIUI, actual xml serialization (xhtml5) allows the use of
>> namespaces and prefixed attributes, thus couldn't a proper namespace
>> be introduced for RDFa attributes, so they can be used, if needed, in
>> xhtml5 documents? I think such might be a valuable choice, because it
>> seems to me RDFa attributes can be used to address such cases where
>> metadata must stay as close as possible to correspondent data, but a
>> mistake in a piece of markup may trigger the adoption agency or
>> foster parenting algorithms, eventually causing a separation between
>> metadata and content, thus possibly breaking reliability of gathered
>> informations. From this perspective, a parser stopping on the very
>> first error might give a quicker feedback than one rearranging
>> misnested elements as far as it is reasonably possible (not
>> affecting, and instead improving, content presentation and users'
>> "direct" experience, but possibly causing side-effects with metadata).
>> ...
>
> That would make RDFa as used in XHTML 1.* and RDFa used in HTML 5
> incompatible. What for?
>
> > ...
>
> BR, Julian
Because I'm not sure RDFa can work fine with HTML serialization. To
clarify that, let me take and modify an example from W3C Recommendation
(without pretending it to be a good example to build a good worst-case
scenario, but just to give an idea):
[...]
<p>
I'm holding
<span property="cal:summary">
one last summer Barbecue
</span>, to meet friends and have a party before the end of holidays
on
<span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
datatype="xsd:dateTime">
September 16th at 4pm
</span>.
</p>
[...]
Now let consider it written as:
[...]
<p>
I'm holding
<span property="cal:summary">
one last summer Barbecue
<!-- now the </span> close tag is missing here -->,
to meet friends and have a party before the end of holidays
on
<span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
datatype="xsd:dateTime">
September 16th at 4pm
</span>.
</p>
[...]
The above would result in a parse error as an xml-serialized document,
since the document isn't well formed. Instead, as part of an
html-serialized document, the above fragment would be processed anyway,
improving users' experience (with respect to a page stopping rendering
on a missing close tag), but potentially causing metadata to be
imprecisely binded to any data, thus potentially harming automated data
extraction (for some purpose). Therefore, perhaps using such metadata
only inside xml serialized pages might give a quick feedback on such a
problem as soon as the author checked a page appearance (which I think
would be the very first check, as well as I think about no one would
check the _whole_ range of possible queries people might make over a
document, to look for errors).
*If* this is meaningful, supporting RDFa attributes as "rdfa:*" might
ensure that xml serialization is preferred by people really needing to
use this kind of metadata (while leaving a chance to experiment RDFa
with html serialization, because no one can be prohibited to use
data-<prefix>-* for this purpose beside a proper script or plugin),
whereas introducing "about", "property", "content", "datatype" and so on
directly in html namespace, as attributes shared by all elements, would
make the choice of one serialization or the other indifferent, thus
leading to every possible side-effects html serialization may cause.
As a side note, It seems that people from the W3C are evaluating a
resort to extensibility to introduce RDFa attributes into xml-serialized
html documents, and they also have some doubts whether allow use of RDFa
attributes within html serialization or not:
"The HTML WG is encouraged to provide a mechanism to permit
independently developed vocabularies such as Internationalization Tag
Set (ITS), Ruby, and RDFa to be mixed into HTML documents. /Whether this
occurs through the extensibility mechanism of XML, *whether it is also
allowed in the classic HTML serialization*, and whether it uses the DTD
and Schema modularization techniques/, is for the HTML WG to determine."
(from <http://www.w3.org/2007/03/HTML-WG-charter#deliverables>)
WBR, Alex
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
Meetic: il leader italiano ed europeo per trovare l'anima gemella online. Provalo ora
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8291&d=9-1
More information about the whatwg
mailing list