[whatwg] on bibtex-in-html5
Ian Hickson
ian at hixie.ch
Tue Jul 7 19:45:12 PDT 2009
Based on the feedback below, I've removed the BibTeX vocabulary from
HTML5. The primary use case -- enabling drag-and-drop in a manner that the
target document could automatically add a reference to the source document
-- can still be done between cooperating sources, it's just no longer a
first-class citizen in the automatically generated drag-and-drop JSON
object. (The previous mechanism found relevant citation information in the
page or section footer and automatically included that.)
I would encourage people interested in enabling this use case to develop a
format for this to expose in the drag-and-drop API, along with some
scripts to enable it. This doesn't really require built-in support so long
as scripting is enabled; the APIs do provide the power to do this already.
On Wed, 10 Jun 2009, Simon Spiegel wrote:
> > >
> > > Most of them are defined as aliases and are handled just fine by
> > > biblatex. For example, journal works just as fine as journaltitle.
> > > While there may be small differences they're definitely not
> > > essential. In real life, most of the bibtex data publicly available
> > > differs to "pure bibtex" in about the same degree. There are very
> > > few places where you can get 100% correct bibtex. Biblatex certainly
> > > doesn't bring a new level of incompability here.
> >
> > My original point was just that it seems unnecessarily incompatible
> > with BibTeX, and that the latter appears to have more deployed
> > support.
> >
> > I disagree that using the same term to mean something else (as in the
> > "inbook" case) is a "small difference" that is "not essential".
>
> Are walking an a theoretical level what would be best "in principle", or
> do we talk about what actually happens? From the fact that you
> originally chose BibTeX I inferred that you want to go for a "practical"
> solution which takes account of what is used in the real world. Now if
> we do that, we also must take a look what actually happens in the real
> world. And although this may just be anecdotal evidence I can assure
> that according to my experience a) 100% correct BibTeX is the exception
> and b) that the compability problems between BibTeX data that you can
> download from various sites and biblatex is no big deal. About every
> BibTeX style introduces its own quirks, in the majority of cases you
> have to clean your data anyway after you downloaded it. So I really
> don't see a fundamental problem here. But I certainly do see a
> fundamental problem both theoretical and practical if you go for a
> standard which is limited in major ways and which from the start
> excludes about everyhing which is not english speaking hard science.
>
> There will always be a tradeoff, the question is which is the lesser
> evil.
On Wed, 10 Jun 2009, Simon Spiegel wrote:
> On 10.06.2009, at 11:44, Ian Hickson wrote:
> > On Wed, 20 May 2009, Bruce D'Arcus wrote:
> > >
> > > Re: the recent microdata work and the subsequent effort to include
> > > BibTeX in the spec, I summarized my argument against this on my
> > > blog:
> > >
> > > <http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-
> > > the-inclusion-of-bibtex-in-html5>
> >
> > | 1. BibTeX is designed for the sciences, that typically only cite
> > | secondary academic literature. It is thus inadequate for, nor widely
> > | used, in many fields outside of the sciences: the humanities and law
> > | being quite obvious examples. For this reason, BibTeX cannot by
> > | default adequately represent even the use cases Ian has identified.
> > | For example, there are many citations on Wikipedia that can only be
> > | represented using effectively useless types such as "misc" and which
> > | require new properties to be invented.
> >
> > We will probably have to increase the coverage in due course, yes.
> > However, we should verify that the mechanism works in principle before
> > investing the time to extend the vocabulary.
>
> I really don't think that a body like WHATWG is suited for this task.
> Especially since other groups have already been working on this exact
> issue.
>
> > | 2. Related, BibTeX cannot represent much of the data in widely used
> > | bibliographic applications such as Endnote, RefWorks and Zotero except
> > | in very general ways.
> >
> > If such data is important, we can always add support when this becomes
> > clear.
>
> What does this mean? When would it become clear? BibTeX's deficits have
> been clear for ages. About everyone who works in humanities knows that
> every bibliographic solution which has been introduced in the past was
> too limited. Why do we have to go through the same things over and over
> again? The problems of the current standards are known, that's why new
> solutions like biblatex or the bibliographic ontology have been
> developped.
On Wed, 10 Jun 2009, Bruce D'Arcus wrote:
>
> No; you should drop this proposal and move it to an experimental annex.
>
> If you do insist, against all reason, in pushing forward with this
> without modification, then I suggest you explain how this process of
> extension will work. If, as I suspect, it'll be another case of a
> centralized authority (you; who have admitted you really know nothing
> about this space), then that's a deal-breaker from my perspective.
>
> [...]
> The two biggest problems in bibtex are two properties:
>
> book
> journal
>
> They're a problem because they're both horribly concrete/narrow, and
> (arguably) redundant.
>
> If those were instead replaced with something more generic like either:
>
> 1) publication-title
>
> ... or, better yet ...
>
> 2) a nested/related object (call it "publication" or "container" or "isPartOf")
>
> ... then extension becomes easier. If I need to encode a newspaper
> article, then I just do:
>
> title = Some Article
> publication-title = Some Newspaper
>
> .. or (better, because I can attach other information to the container):
>
> title = Some Article
> publication = [ title = Some Newspaper ]
>
> As is, you need to add stuff like this just to resolve the problems
> I've repeayedly pointed out:
>
> newspaper-title
> magazine-title
> court-reporter-title
> television-program-title
> radio-program-title
>
> Aside: of course, some of the above could be collapsed into more
> generic stuff like "broadcast-title", but I'm just following the same,
> broken, approach as bibtex.
>
> This stuff isn't theoretical Ian. Just look through this wikipedia
> page, for example:
>
> <http://en.wikipedia.org/wiki/Guantanamo_Bay_detention_camp>
>
> The citations include references to legal cases and briefs, and news
> articles (television, radio and print). Your proposal doesn't cover
> this stuff.
>
> OTOH, applications like Zoteor can.
>
> > | 4. The BibTeX model conflicts with Dublin Core and with vCard, both of
> > | Â Â which are quite sensibly used elsewhere in the microdata spec to
> > | Â Â encode information related to the document proper. There seems little
> > | Â Â justification in having two different ways to represent a document
> > | Â Â depending on whether on it is THIS document or THAT document.
> >
> > I don't understand this point. Could you provide an example of this
> > conflict?
>
> Here's an academic article in an open access biology journal.
>
> <http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000082>
>
> THIS article refers to the metadata about the document proper, with
> the title "Accelerated Adaptive Evolution on a Newly Formed X
> Chromosome."
>
> The metadata about the documents referenced in text are included in
> the bibliography. This is what I mean by THAT document.
>
> My pointâand this is an important oneâis that one should be able to
> use to the same mechanism to describe both, but still to be able to
> distinguish them. I'd think this journal would insist on it.
>
> > | 5. Aspects of BibTeX's core model are ambiguous/confusing. For example,
> > | Â Â what number does "number" refer to? Is it a document number, or an
> > | Â Â issue number?
> >
> > What's the difference? Why does it matter?
>
> I can't find the example, but I've come across cases where one needed
> both an issue and document number. Since I haven't cited it, though, I
> guess you can leave it aside ;-).
>
> > | My suggestion instead?
> > | 1. reuse Dublin Core and vCard for the generic data: titles,
> > | Â Â creators/contributors, publisher, dates, part/version relations, etc.,
> > | Â Â and only add those properties (volume, issue, pages, editors, etc.)
> > | Â Â that they omit
> >
> > This seems unduly heavy duty (especially the use of vCard for author
> > names) when all that is needed is brief bibliographic entries.
>
> On what basis do you make this claim? "[A]ll that is needed" for whom?
>
> I'll point out here that the article I link to above includes
> affiliation information for the authors.
>
> But this isn't the most critical point.
>
> > | 2. typing should NOT be handled a bibtex-type property, but the same way
> > | Â Â everything else is typed in the microdata proposal: a global
> > | Â Â identifier
> >
> > Why?
>
> a) consistency; why introduce a new mechanism (from the standpoint of
> microdata)?
>
> b) flexibility (since I've made clear that bibtex is not adequate, and
> I have no intention relying on the WHATWG to determine what's
> important)
>
> > | 3. make it possible for people to interweave other, richer, vocabularies
> > | Â Â such as bibo within such item descriptions. In other words, extension
> > | Â Â properties should be URIs.
> >
> > This is already possible.
>
> OK, possible; but hardly very easy. See above.
>
> > | 4. define the mapping to RDF of such an "item" description; can we say,
> > | Â Â for example, that it constitutes a dct:references link from the
> > | Â Â document to the described source?
> >
> > The mapping to RDF is already defined; further mappings can be done using
> > the "sameAs" mechanism.
>
> How so? I'm asking: what's the relationship between the document and
> the cited document?
On Wed, 10 Jun 2009 simon at simifilm.ch wrote:
>
> Related to this I want to remark some things on a more general level: We
> currently experience major changes in the world of bibliographic
> software. At least, this is how I experience it. After years of limited
> and/or closed formats and models like BibTeX or Endnote we finally see
> new models like CSL or biblatex emerging which try to learn from the
> lessons from the past. Of course, I do not know how things will evolve,
> but looking at the success of solutions like Zotero I think it's not so
> bold to say that things will change quite a bit in the coming years.
>
> And then we have HTML5, an emerging standard which is now getting
> support by the newest and latest browsers. I do know even less how HTML5
> will evolve, what impact it will have on the web. But it's probably fair
> to say that widespread adoption of HTML5 will not happen overnight.
>
> Honestly, I really don't get why a coming web standard should support a
> bibliographic standard which is obviously outdated. The fact that BibTeX
> is widely used is really a non argument, because if we follow this logic
> we wont have any development. By the same logic you should avoid
> something like <video> after all, there isn't any support for it
> *yet*. If HTML5 wants to be forward-looking, it certainly shouldn't
> adopt a twenty years old standard but should instead try to support
> something new which is really up to date and has chance if being useful
> in the future.
On Wed, 10 Jun 2009, Jonas Sicking wrote:
>
> [...] I'd prefer to see these things developed elsewhere. Mostly because
> the group of people with expertise in developing a better version of
> bibtex is not the people in this WG.
>
> I do think it's important to show that microdata is able to express
> something like bibtex. And I do think that the discussion in the past
> weeks have been interesting since people haven't actually been finding
> problems in microdatas ability to express something like bibtex, but
> rather in the exact bibtex format itself.
>
> But the exact microdata format does not seem productive to have here. It
> seems completely orthogonal to the rest of HTML, so there seems to be no
> win to put it in the HTML 5 spec.
>
> If bibtex-in-microdata can't gather enough interest outside of the HTML
> 5 spec, it probably is a bad spec.
On Thu, 11 Jun 2009, Simon Spiegel wrote:
>
> I completely agree with this conclusion. I also think that it would be a
> big mistake to include bibtex and then extend it later as Ian has
> suggested.
>
> Let me give a concrete example, take the following biblipgraphic entry:
> Doe, John: Foreword. In: Doe, Jane: The Book. Middle-Earth 2008.
>
> What we have here is a chapter by an author in a book by someone else.
> This someone else is not the editor though, but the author of the book,
> This kind of text is fairly common in my field but it cannot be
> expressed in bibtex since bibtex originally only has fields for 'author'
> and 'editor ', but not for 'bookauthor'.
>
> According to Ian, something like this could be covered by extending the
> bibtex vocabulary. For me, two problems pop up here:
>
> Who will decide how the vocabulary gets extended? And on what will these
> decisions be based?
>
> Now lets say that some kind of process to extend the bibtex vocabulary
> can be established and that the addition of a 'bookauthor' field will be
> decided. The problem then is that something gets added to bibtex which
> no existing bibtex style (and no other tool which can import bibtex)
> knows about. AFAIK only biblatex has a 'bookauthor' field. In other
> words: We then have data which is not useable with the traditional
> bibtex tools (they don't break, they just wont process the new fields).
> If bibtex gets extended (which would be absolutely necessary since all
> kind of additional fields are needed), we unavoidably end up with some
> kind of superbibtex which no tool in the world can process. In other
> words: We then have a new format which looks like bibtex but which
> cannot be used in a traditional bibtex workflow. At this point the whole
> argument why bibtex should be used in this spec breaks down. Ian is in
> favor of bibtex because it is widely used; but if we unavoidably end up
> with an unuseable superbibtex, this argument becomes moot.
>
> If compatibility to existing formats is the main objective, we simply
> can't extend an old format like bibtex. If the goal is to cover
> substantially more than bibtex does, we need a different format.
On Thu, 11 Jun 2009, David Gerard wrote:
>
> I was about to mention Wikipedia! The citation templates there would be
> an excellent set of examples of what a citation format would need to
> cover in practical use. See:
>
> http://en.wikipedia.org/wiki/Category:Citation_templates
>
> There's a lot there, but many aren't that heavily used. You can see how
> many uses there are of a template, or if there are any at all, by going
> to the template page and clicking on "What links here" in the sidebar.
> The ones whose name starts "Template:Cite ..." include the biggies.
>
> These constitute a bunch of special cases, but you'll be pleased to know
> that similar templates tend to get combined with time. I certainly
> wouldn't suggest a set of special cases in a spec for this. But these
> will be useful for ideas and examples of what sort of citations are in
> demand on the web.
On Thu, 11 Jun 2009, Bruce D'Arcus wrote:
>
> My immediate concern has been this particular use case, and I've been
> assuming : that the microdata proposal will be included in HTML5.
>
> In a vacuum, I think microdata is fine technically.
>
> In the context of an existing spec that covers the same use cases
> (RDFa), I think it's creating unnecessary and unproductive duplication.
>
> Just to go back to the use case I'm focusing on here, it puts metadata
> producers and consumers in an awkward position of having to likely
> support two different specs; means double work with no obvious benefits.
> This is happening JUST as RDFa is starting to be implemented by major
> players, and starting to build up a head of steam in terms of tools.
>
> And to put this in some context, the only reasonable technical point
> that Ian has made in favor of throwing out RDFa and creating a new spec
> is the prefix issue. But I have a really hard time seeing how prefixes
> is so onerous a burden as to justify the costs (to the WHATWG, and to
> metadata producers and consumers) of creating and maintaining a new
> spec.
>
> FWIW, some possibly relevant background from the OpenDocument
> experience:
>
> To make a long story short, ODF 1.2 will have an extensible metadata
> system based on RDF/XML (for in--package metadata) and a subset of
> RDFa.(for embedded). Getting to this solution was a long and torturous
> process, and the original proposal effectively forked RDFa by requiring
> fully unqualified URIs for names. The technical reasons were
> more-or-less the same as those that drove Ian to invent an entirely new
> spec: that in a GUI environment where users are copy-and-pasting
> content, dealing with prefixes was an additional burden on implementers.
> In addition, people don't hand author ODF files, so prefix have no
> authoring benefit.
>
> In the end, though, I understand the ODF TC decided to include prefixes,
> since implementers found the burdens largely theoretical (OpenOffice
> should see an initial implementation in 3.2 I understand), and because
> in general the group prefers to stick as closely to existing specs as
> reasonable.
>
> On predefined vocabularies, we thought about doing something similar
> informally, but decided it was out-of-scope; better initially to put a
> solid extensible system in place and let developers start working with
> it.
>
> My work on the Bibliographic Ontology was in part done with that in
> mind, though has the added benefit it can be repurposed for RDFa in
> XHTML.
Cheers,
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list