[whatwg] on bibtex-in-html5

Ian Hickson ian at hixie.ch
Tue Jul 7 19:45:12 PDT 2009

Based on the feedback below, I've removed the BibTeX vocabulary from 
HTML5. The primary use case -- enabling drag-and-drop in a manner that the 
target document could automatically add a reference to the source document 
-- can still be done between cooperating sources, it's just no longer a 
first-class citizen in the automatically generated drag-and-drop JSON 
object. (The previous mechanism found relevant citation information in the 
page or section footer and automatically included that.)

I would encourage people interested in enabling this use case to develop a 
format for this to expose in the drag-and-drop API, along with some 
scripts to enable it. This doesn't really require built-in support so long 
as scripting is enabled; the APIs do provide the power to do this already.

On Wed, 10 Jun 2009, Simon Spiegel wrote:
> > > 
> > > Most of them are defined as aliases and are handled just fine by 
> > > biblatex. For example, journal works just as fine as journaltitle. 
> > > While there may be small differences they're definitely not 
> > > essential. In real life, most of the bibtex data publicly available 
> > > differs to "pure bibtex" in about the same degree. There are very 
> > > few places where you can get 100% correct bibtex. Biblatex certainly 
> > > doesn't bring a new level of incompability here.
> > 
> > My original point was just that it seems unnecessarily incompatible 
> > with BibTeX, and that the latter appears to have more deployed 
> > support.
> > 
> > I disagree that using the same term to mean something else (as in the 
> > "inbook" case) is a "small difference" that is "not essential".
> Are walking an a theoretical level what would be best "in principle", or 
> do we talk about what actually happens? From the fact that you 
> originally chose BibTeX I inferred that you want to go for a "practical" 
> solution which takes account of what is used in the real world. Now if 
> we do that, we also must take a look what actually happens in the real 
> world. And although this may just be anecdotal evidence I can assure 
> that according to my experience a) 100% correct BibTeX is the exception 
> and b) that the compability problems between BibTeX data that you can 
> download from various sites and biblatex is no big deal. About every 
> BibTeX style introduces its own quirks, in the majority of cases you 
> have to clean your data anyway after you downloaded it. So I really 
> don't see a fundamental problem here. But I certainly do see a 
> fundamental problem – both theoretical and practical – if you go for a 
> standard which is limited in major ways and which from the start 
> excludes about everyhing which is not english speaking hard science.
> There will always be a tradeoff, the question is which is the lesser 
> evil.

On Wed, 10 Jun 2009, Simon Spiegel wrote:
> On 10.06.2009, at 11:44, Ian Hickson wrote:
> > On Wed, 20 May 2009, Bruce D'Arcus wrote:
> > > 
> > > Re: the recent microdata work and the subsequent effort to include 
> > > BibTeX in the spec, I summarized my argument against this on my 
> > > blog:
> > > 
> > > <http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on- 
> > > the-inclusion-of-bibtex-in-html5>
> > 
> > | 1. BibTeX is designed for the sciences, that typically only cite
> > |    secondary academic literature. It is thus inadequate for, nor widely
> > |    used, in many fields outside of the sciences: the humanities and law
> > |    being quite obvious examples. For this reason, BibTeX cannot by
> > |    default adequately represent even the use cases Ian has identified.
> > |    For example, there are many citations on Wikipedia that can only be
> > |    represented using effectively useless types such as "misc" and which
> > |    require new properties to be invented.
> > 
> > We will probably have to increase the coverage in due course, yes. 
> > However, we should verify that the mechanism works in principle before 
> > investing the time to extend the vocabulary.
> I really don't think that a body like WHATWG is suited for this task. 
> Especially since other groups have already been working on this exact 
> issue.
> > | 2. Related, BibTeX cannot represent much of the data in widely used
> > |    bibliographic applications such as Endnote, RefWorks and Zotero except
> > |    in very general ways.
> > 
> > If such data is important, we can always add support when this becomes 
> > clear.
> What does this mean? When would it become clear? BibTeX's deficits have 
> been clear for ages. About everyone who works in humanities knows that 
> every bibliographic solution which has been introduced in the past was 
> too limited. Why do we have to go through the same things over and over 
> again? The problems of the current standards are known, that's why new 
> solutions like biblatex or the bibliographic ontology have been 
> developped.

On Wed, 10 Jun 2009, Bruce D'Arcus wrote:
> No; you should drop this proposal and move it to an experimental annex.
> If you do insist, against all reason, in pushing forward with this 
> without modification, then I suggest you explain how this process of 
> extension will work. If, as I suspect, it'll be another case of a 
> centralized authority (you; who have admitted you really know nothing 
> about this space), then that's a deal-breaker from my perspective.
> [...]
> The two biggest problems in bibtex are two properties:
> book
> journal
> They're a problem because they're both horribly concrete/narrow, and
> (arguably) redundant.
> If those were instead replaced with something more generic like either:
> 1) publication-title
> ... or, better yet ...
> 2) a nested/related object (call it "publication" or "container" or "isPartOf")
> ... then extension becomes easier. If I need to encode a newspaper
> article, then I just do:
> title = Some Article
> publication-title = Some Newspaper
> .. or (better, because I can attach other information to the container):
> title = Some Article
> publication = [ title = Some Newspaper ]
> As is, you need to add stuff like this just to resolve the problems
> I've repeayedly pointed out:
> newspaper-title
> magazine-title
> court-reporter-title
> television-program-title
> radio-program-title
> Aside: of course, some of the above could be collapsed into more
> generic stuff like "broadcast-title", but I'm just following the same,
> broken, approach as bibtex.
> This stuff isn't theoretical Ian. Just look through this wikipedia
> page, for example:
> <http://en.wikipedia.org/wiki/Guantanamo_Bay_detention_camp>
> The citations include references to legal cases and briefs, and news
> articles (television, radio and print). Your proposal doesn't cover
> this stuff.
> OTOH, applications like Zoteor can.
> > | 4. The BibTeX model conflicts with Dublin Core and with vCard, both of
> > |    which are quite sensibly used elsewhere in the microdata spec to
> > |    encode information related to the document proper. There seems little
> > |    justification in having two different ways to represent a document
> > |    depending on whether on it is THIS document or THAT document.
> >
> > I don't understand this point. Could you provide an example of this
> > conflict?
> Here's an academic article in an open access biology journal.
> <http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000082>
> THIS article refers to the metadata about the document proper, with
> the title "Accelerated Adaptive Evolution on a Newly Formed X
> Chromosome."
> The metadata about the documents referenced in text are included in
> the bibliography. This is what I mean by THAT document.
> My point—and this is an important one—is that one should be able to
> use to the same mechanism to describe both, but still to be able to
> distinguish them. I'd think this journal would insist on it.
> > | 5. Aspects of BibTeX's core model are ambiguous/confusing. For example,
> > |    what number does "number" refer to? Is it a document number, or an
> > |    issue number?
> >
> > What's the difference? Why does it matter?
> I can't find the example, but I've come across cases where one needed
> both an issue and document number. Since I haven't cited it, though, I
> guess you can leave it aside ;-).
> > | My suggestion instead?
> > | 1. reuse Dublin Core and vCard for the generic data: titles,
> > |    creators/contributors, publisher, dates, part/version relations, etc.,
> > |    and only add those properties (volume, issue, pages, editors, etc.)
> > |    that they omit
> >
> > This seems unduly heavy duty (especially the use of vCard for author
> > names) when all that is needed is brief bibliographic entries.
> On what basis do you make this claim? "[A]ll that is needed" for whom?
> I'll point out here that the article I link to above includes
> affiliation information for the authors.
> But this isn't the most critical point.
> > | 2. typing should NOT be handled a bibtex-type property, but the same way
> > |    everything else is typed in the microdata proposal: a global
> > |    identifier
> >
> > Why?
> a) consistency; why introduce a new mechanism (from the standpoint of
> microdata)?
> b) flexibility (since I've made clear that bibtex is not adequate, and
> I have no intention relying on the WHATWG to determine what's
> important)
> > | 3. make it possible for people to interweave other, richer, vocabularies
> > |    such as bibo within such item descriptions. In other words, extension
> > |    properties should be URIs.
> >
> > This is already possible.
> OK, possible; but hardly very easy. See above.
> > | 4. define the mapping to RDF of such an "item" description; can we say,
> > |    for example, that it constitutes a dct:references link from the
> > |    document to the described source?
> >
> > The mapping to RDF is already defined; further mappings can be done using
> > the "sameAs" mechanism.
> How so? I'm asking: what's the relationship between the document and
> the cited document?

On Wed, 10 Jun 2009 simon at simifilm.ch wrote:
> Related to this I want to remark some things on a more general level: We 
> currently experience major changes in the world of bibliographic 
> software. At least, this is how I experience it. After years of limited 
> and/or closed formats and models like BibTeX or Endnote we finally see 
> new models like CSL or biblatex emerging which try to learn from the 
> lessons from the past. Of course, I do not know how things will evolve, 
> but looking at the success of solutions like Zotero I think it's not so 
> bold to say that things will change quite a bit in the coming years.
> And then we have HTML5, an emerging standard which is now getting 
> support by the newest and latest browsers. I do know even less how HTML5 
> will evolve, what impact it will have on the web. But it's probably fair 
> to say that widespread adoption of HTML5 will not happen overnight.
> Honestly, I really don't get why a coming web standard should support a 
> bibliographic standard which is obviously outdated. The fact that BibTeX 
> is widely used is really a non argument, because if we follow this logic 
> we wont have any development. By the same logic you should avoid 
> something like <video> – after all, there isn't any support for it 
> *yet*. If HTML5 wants to be forward-looking, it certainly shouldn't 
> adopt a twenty years old standard but should instead try to support 
> something new which is really up to date and has chance if being useful 
> in the future.

On Wed, 10 Jun 2009, Jonas Sicking wrote:
> [...] I'd prefer to see these things developed elsewhere. Mostly because 
> the group of people with expertise in developing a better version of 
> bibtex is not the people in this WG.
> I do think it's important to show that microdata is able to express 
> something like bibtex. And I do think that the discussion in the past 
> weeks have been interesting since people haven't actually been finding 
> problems in microdatas ability to express something like bibtex, but 
> rather in the exact bibtex format itself.
> But the exact microdata format does not seem productive to have here. It 
> seems completely orthogonal to the rest of HTML, so there seems to be no 
> win to put it in the HTML 5 spec.
> If bibtex-in-microdata can't gather enough interest outside of the HTML 
> 5 spec, it probably is a bad spec.

On Thu, 11 Jun 2009, Simon Spiegel wrote:
> I completely agree with this conclusion. I also think that it would be a 
> big mistake to include bibtex and then extend it later as Ian has 
> suggested.
> Let me give a concrete example, take the following biblipgraphic entry: 
> Doe, John: Foreword. In: Doe, Jane: The Book. Middle-Earth 2008.
> What we have here is a chapter by an author in a book by someone else. 
> This someone else is not the editor though, but the author of the book, 
> This kind of text is fairly common in my field but it cannot be 
> expressed in bibtex since bibtex originally only has fields for 'author' 
> and 'editor ', but not for 'bookauthor'.
> According to Ian, something like this could be covered by extending the 
> bibtex vocabulary. For me, two problems pop up here:
> Who will decide how the vocabulary gets extended? And on what will these 
> decisions be based?
> Now lets say that some kind of process to extend the bibtex vocabulary 
> can be established and that the addition of a 'bookauthor' field will be 
> decided. The problem then is that something gets added to bibtex which 
> no existing bibtex style (and no other tool which can import bibtex) 
> knows about. AFAIK only biblatex has a 'bookauthor' field. In other 
> words: We then have data which is not useable with the traditional 
> bibtex tools (they don't break, they just wont process the new fields). 
> If bibtex gets extended (which would be absolutely necessary since all 
> kind of additional fields are needed), we unavoidably end up with some 
> kind of superbibtex which no tool in the world can process. In other 
> words: We then have a new format which looks like bibtex but which 
> cannot be used in a traditional bibtex workflow. At this point the whole 
> argument why bibtex should be used in this spec breaks down. Ian is in 
> favor of bibtex because it is widely used; but if we unavoidably end up 
> with an unuseable superbibtex, this argument becomes moot.
> If compatibility to existing formats is the main objective, we simply 
> can't extend an old format like bibtex. If the goal is to cover 
> substantially more than bibtex does, we need a different format.

On Thu, 11 Jun 2009, David Gerard wrote:
> I was about to mention Wikipedia! The citation templates there would be 
> an excellent set of examples of what a citation format would need to 
> cover in practical use. See:
> http://en.wikipedia.org/wiki/Category:Citation_templates
> There's a lot there, but many aren't that heavily used. You can see how 
> many uses there are of a template, or if there are any at all, by going 
> to the template page and clicking on "What links here" in the sidebar. 
> The ones whose name starts "Template:Cite ..." include the biggies.
> These constitute a bunch of special cases, but you'll be pleased to know 
> that similar templates tend to get combined with time. I certainly 
> wouldn't suggest a set of special cases in a spec for this. But these 
> will be useful for ideas and examples of what sort of citations are in 
> demand on the web.

On Thu, 11 Jun 2009, Bruce D'Arcus wrote:
> My immediate concern has been this particular use case, and I've been 
> assuming : that the microdata proposal will be included in HTML5.
> In a vacuum, I think microdata is fine technically.
> In the context of an existing spec that covers the same use cases 
> (RDFa), I think it's creating unnecessary and unproductive duplication.
> Just to go back to the use case I'm focusing on here, it puts metadata 
> producers and consumers in an awkward position of having to likely 
> support two different specs; means double work with no obvious benefits. 
> This is happening JUST as RDFa is starting to be implemented by major 
> players, and starting to build up a head of steam in terms of tools.
> And to put this in some context, the only reasonable technical point 
> that Ian has made in favor of throwing out RDFa and creating a new spec 
> is the prefix issue. But I have a really hard time seeing how prefixes 
> is so onerous a burden as to justify the costs (to the WHATWG, and to 
> metadata producers and consumers) of creating and maintaining a new 
> spec.
> FWIW, some possibly relevant background from the OpenDocument 
> experience:
> To make a long story short, ODF 1.2 will have an extensible metadata 
> system based on RDF/XML (for in--package metadata) and a subset of 
> RDFa.(for embedded). Getting to this solution was a long and torturous 
> process, and the original proposal effectively forked RDFa by requiring 
> fully unqualified URIs for names. The technical reasons were 
> more-or-less the same as those that drove Ian to invent an entirely new 
> spec: that in a GUI environment where users are copy-and-pasting 
> content, dealing with prefixes was an additional burden on implementers. 
> In addition, people don't hand author ODF files, so prefix have no 
> authoring benefit.
> In the end, though, I understand the ODF TC decided to include prefixes, 
> since implementers found the burdens largely theoretical (OpenOffice 
> should see an initial implementation in 3.2 I understand), and because 
> in general the group prefers to stick as closely to existing specs as 
> reasonable.
> On predefined vocabularies, we thought about doing something similar 
> informally, but decided it was out-of-scope; better initially to put a 
> solid extensible system in place and let developers start working with 
> it.
> My work on the Bibliographic Ontology was in part done with that in 
> mind, though has the added benefit it can be repurposed for RDFa in 

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list