[whatwg] Bibliography Markup in HTML5

Tue Oct 6 01:16:49 PDT 2009

On Mon, Oct 5, 2009 at 7:51 PM, Ian Hickson <ian at hixie.ch> wrote:
> On Sun, 27 Sep 2009, tjeddo wrote:
>>
>> I am surprised at how little concern there seems to be over the lack of
>> bibliography markup in HTML5.
>
> There's a lot of concern, but it was deemed that microdata is a better way
> of addressing this than specific elements.

Thanks for your response. After reviewing the info on microdata, I
certainly agree
that microdata would be a great fit for marking up bibliographies and
their entries.
I do hope that a controlled vocabulary is worked out and gets widely adopted...
but I recall this issue was already discussed at length.

>> What if HTML5 specified this approach--except that in place of the <dl>
>> (definition list) tags, a collection of entries would be contained
>> between <bibliography> tags? That is, the above example would look as
>> follows:
>>
>> <bibliography> ...
>>
>> <dt id="refsRFC5322">[RFC5322]</dt>
>> <dd><cite><a href="http://www.ietf.org/rfc/rfc5322.txt">Internet Message
>> Format</a></cite>, P. Resnick. IETF, October 2008.</dd>
>>
>> ...
>> </bibliography>
>>
>> The value here is the elimination of ambiguity
>
> What ambiguity?

In my example scheme, a parsing program that encounters a
<bibliography> section would
be able to determine (by context) that the dt and dd elements
encountered within represent
bibliography entries. Just like a parsing program that encounters a
<figure> section can
determine that a dt element contains the caption for the figure.
However, if dt and dd are
encountered simply within a dl element; then no additional semantic
information can be
determined. It would have been more appropriate if I said that
ambiguity is reduced in
my example scheme (not eliminated).

>> and that a number of new inferences can now be drawn by user agents.
>> With the <dl> tags, the interpreting agent can only determine that there
>> is a definition list containing term/definition entries.  Whereas, in
>> the context of a new bibliography section element, user agents can
>> unambiguously interpret the 'dt' element to be the displayed content
>> that humans identify a bibliography entry by (e.g., "[RFC5322]" in the
>> example given).
>
> Why is this valuable? How do you expect browser vendors to change their
> interface to use this?
>
> Why would it not be better to have a microdata vocabulary for this?

In my understanding, microdata certainly seems like a sufficient way to
handle bibliography entries--once again, hoping that a standardized vocabulary
develops. The scheme I discussed about introducing a 'bibliography' element
and reusing the 'dt' and 'dd' elements within, I simply felt was
consistent with the
introduction of other new HTML5 elements describing the pieces of a
virtual document (e.g.,
article, section, figure, aside, etc.).  Additionally, the scheme
consistently reused
the elements 'dt' and 'dd' in the 'bibliography' context just as they
are reused in
the new 'figure' and 'details' context.  Although, I have to admit I'm
not sure I'm
a fan of this element overloading as opposed to introducing explicit tags to
cover these concepts when appropriate.  But I do understand that HTML5 is
constrained by legacy HTML and also that microdata is another way to
work around these constraints.

I'm not arguing that microdata isn't the best approach here; but it
should be considered
that first class elements are more legible than microdata. And I'm sure this is
why many of the new HTML5 elements are not implemented as
microdata.

I'm just raising ideas here.

Regards,
Tim Eddo