[whatwg] Mathematics in HTML5

James Graham jg307 at cam.ac.uk
Tue Jun 6 07:49:26 PDT 2006


juanrgonzaleza at canonicalscience.com wrote:
> James Graham wrote:
>> I could go on but
>> at least in academic  fields, LaTeX is either the only format accepted
>> for publication or the  preferred format.
> 
> In mathematics, and theoretical physics sure, in rest of science? I doubt.
> In chemistry, LaTeX is not preferred for example.

Not just in theoretical physics, but in all varieties of physics that I have
ever encountered. Nor, as far as I can tell, is th widespread use of LaTeX just
limited to the mathematics and physics communities. It is also, for example, one
of four accepted submission formats of the Royal Society of Chemistry (Word,
Wordperfect, RFT, (LaTeX), the only format accepted by Electronic Notes in
Theoretical Computer Science and the only acceptable format for IEEE
Transactions On Wireless Communication. In general, Googling for these
examples, I was unable to find a single print journal which accepted
electronic submissions but did not accept LaTeX as a format. Indeed, it
is the _only_ hand-authored format accepted by the journals I encountered on my
brief search, except for one online-only robotics journal which published
in HTML and accepted submissions in HTML. Even in that case, the submissions
page is quick to suggest a LaTeX to HTML workflow, implying that engineers are 
another group who often work with LaTeX, a speculation lent credence by
http://www.eng.cam.ac.uk/help/tpl/textprocessing/ which contains an extensive
set of notes for engineers on using LaTeX and begins "TeX is a powerful
text processing language and is the required format for some periodicals
now").

Of course using Google to turn up a few journals hardly makes for a good sample 
and you can no doubt provide counter-examples but it is extremely disingenuous 
to suggest that only pure mathematicians and a small subset of physicists 
commonly use LaTeX - it is clearly in very widespread use wherever mathematical 
communication is required.


> The key is that you learn any new tool when it is useful and solves
> problems. TeX-LaTeX solves a minimum subset of problems of real life and
> reason is not popular except in some academic communities. The only really
> good point of TeX-LaTeX systems is on mathematical typesetting; textual,
> graphical, diagrams, and others items are best done with different systems
> and approaches.

Ah. That would be called "doing one thing and doing it well". I've heard
that it's commonly believed to be a good design principle. In this case,
the problem I would like to solve is "how do we typeset mathematics on
the internet so that people actually use the technology rather than
ignoring it into oblivion"? We've already determined that LaTeX solves
the same problem offline so it seems like a reasonable place to start
when addressing the question for online publishing.

>> You may think I
>> am overstating this but I disagree - bear in mind  that a significant
>> fraction of astronomical (chosen merely because it is the  field I know
>> best) software is written in Fortran 77. For many of these people
>> almost 30 years of language design has never happened.
> 
> If Fortran 77 fulfills the needs they have no reason for the change but if
> it does not fulfill then they will adopt Fortran 90, or C++, or Java, or
> Maple, or anything else.

Technically, all the languages you've suggested are clearly better than Fortran 
77. They don't have irritating limitations like fixed column numbers. They have 
_very_ useful features like dynamically allocatable data structures. It would 
make many people's lives better to migrate away from these languages. But they 
don't - because they are in the business of doing research, not learning new 
technology - so they are always in a metastable state which perhaps doesn't 
provide the most long term benefit but does work well at any given moment. (Of 
course some people embrace new technology, particularly if it is relatively easy 
to use. But don't be fooled into thinking that people will use new technologies 
just because they are in some global sense "better" than the ones they are 
familiar with, particularly if there is no easy path from here to there).

> There are old academicians still using ordinary mail for communicating
> with colleagues. Is this an argument against e-mail or when designing a
> new communication model would we think in a subset of guys loving ordinary
> mail?

Well it maps pretty well to ordinary mail. For example an email address
like jg307 at cam.ac.uk corresponds to the addressing format
commonly used in ordinary mail (starts with the name, becomes more
general toward the end). But more importantly, there are a number of
immediately obvious and tangible benefits to email. In particular the
fact that it is near instant. I don't see anything in your proposals
that offers anything like the same level of obvious and tangible benefits.

> I always am perplexed of double measurement scale of TeX-people. They
> rudely critique mathematical typesetting of programs such as MSWord.

I'm not a "TeX-person", merely a LaTeX user and, in the context of this
discussion, my "pro-LaTeX" stance is merely a practical one; I have come
at it by considering the needs of the audience, not through a desire to
advocate one particular technology. Nor have I mentioned MSWord, except
as an accepted format for submission to some academic journals. Indeed, if 
anything I am quite an anti-LaTeX person - I would never consider using it for a 
poster or slide presentation for example. I have, however, used LaTeX to create 
the equations for a poster and embedded the resulting postscript into anothr 
package. That is closer to the level of interaction I am advocating).

> However, most of web pages generated from TeX-LaTeX systems are really
> unprofessional even at that small subset of static and boring academic
> webpages.

Indeed. But there are two main reasons for that:
1) latex2html sucks

2) Academics have no interest in learning any language other than LaTeX
(did I say that already?). They have to use LaTeX to prepare documents
for publication, it is the only language they know for typesetting
mathematics and, in general, the web is not their major target medium.
LaTeX generated websites tend to be html representations of lecture
notes or papers that are primarily designed for consumption in paper or
PDF formats. So the html version only exists at all because it is
relatively little effort to produce it in addition to the main
publication format. When that is not the case, there will simply be no
html version provided.

> People abandoned TeX-LaTeX in favor of best approaches in many places.

Where? In no journal I could find. If you mean publishers, for archival, that is 
irrelevant because, on the web, most content is created by individuals who are 
not publishers by profession. The tools suitable for the two groups are quite 
different.

> Some weeks ago I received a draft of manuscript prepared by a
> mathematician and will probably be published in MSOR journal in brief. He
> is not using TeX or LateX because limitations and write:
> 
> <blockquote>
> Mathematicians have been served well by TeX and LaTeX for their
> mathematical typesetting. Too well, perhaps. At least, if an dedicated
> TeXnician of the last
> ten years has a chance to \relax and look about himself he will see that
> the rest
> of the world has moved on in several incompatible ways to the cosy world of
> TeX.
> </blockquote>

So one person contacted you and made a comment, which has no substantial content 
I can discern? What's your point? Who are the rest of the world are where are 
they? Why should I listen to this person? For comparison I did a straw poll of 
two people who I work with, asking "will astronomers ever be prepared to learn 
languages other than LaTeX for typesetting mathematics?" they both answered 
"no". But I don't think it's really meaningful enough to talk about.

>> This is why the web is liberally
>> sprinkled with the ugly gif output of  latex2html. If we want this
>> situation to change, the _only_ solution is to allow  LaTeX as a
>> document creation format.
> 
> For creation of unprofessional webpages or electronic documents? Okay.
> Somewhat as anyone can create low quality webpages using “save as” in
> MSWord, but if you want professional webpages then MSWord is not the
> correct tool. Similar thoughts apply to TeX-LaTeX.

That doesn't follow at all. For example Google are successful in making
excellent HTML+js applications starting from Java. If I write a program
in C it's likely to be much better than an equivalent one I write in
assembler. Writing a document in Docbook and converting to postscript is
much easier than writing in postscript directly. Computing is full of
examples of people writing in one language and transforming to something
else for consumption. I am merely stating that for any meaningful
adoption of our chosen output format, it must be compatible with the
chosen high-level format of the majority of research scientists - LaTeX.

> As an exercise let me comment ITeX output in one of your pages. I will not
> review your web page “I'll go and play with words and pictures”, and I
> will say nothing on the quality of the rest of web design not in its
> typesetting.

(Nice job on the subtle implication that because my webpages won't win
any awards for beauty I have no business in a technical discussion, by
the way. In return I won't mention any of the dubious content on your
webpages :) )

> You begin from an IteX source (a dialect of LaTeX) and next present the
> MathML output generated. Then you claim
> 
> <blockquote>
> It's pretty clear which version is easier to enter, read and maintain.
> </blockquote>
> 
> Well. It is clear that IteX is easier to enter and read than MathML. But
> if use this as an argument in favor of IteX then let me say that ASCIIMath
> is still easier to enter and read. Therefore if easiner reallt matter one
> would discard IteX and other Tex-LaTeX approaches.

But is ASCIIMath so expressive? It certainly isn't so widely known.
Therefore it won't be so widely adopted. I seem to have to keep
repeating this point that compatibility with existing technology is
important. The existing technology in the field of mathematical authoring is LaTeX.

> However, IteX is not easier to maintain. If you are looking for basic
> unprofessional encoding of mathematical formulae, then IteX is okay, but
> if you are looking for professional encoding of formulae, IteX is not good
> enough and this will obligate to you to learn CSS, XSL-FO, and p-MathML
> for fine-tuning and maybe DOM, Javascript, or c-MathML (or even OpenMath)
> if you want add interactivity and semantics to your encoding.

No professional I know wants to do that though. They just want to
present mathematical equations in a sane way. GIFs are not a particularly sane 
way - they are ugly and so not scale with the text on the page but, despite 
this, the evidence of the web suggests that they are the best we currently have. 
MathML is not sane - it is too hard to author. ITeX, though far form perfect, is 
much much better.

[lots of irrelevant junk about the itex2mathml output]

> Another point of disappointment is in the encoding of the differential.
> The differential is encoded as a simple variable d. There exist special
> entities defined in MathML DTD and also special Unicode fonts and the true
> is those special character were designed with accessibility in mind.
> Still, if by some reason the author wan not use the special differential
> character, one can easily see that differential is not and variable or
> identifier but a operator. Therefore, <mo>d</mo> is more accurate. The
> same error appears in the other integral.

First of all, that's a false argument. I could just have easily gone
through my character map, found the differential-d and used that in the
formula when writing the ITeX as I could when writing the MathML. The
problem is, in absolute terms it's much harder than just writing "d". This is a 
big problem I have with any solution that requires the extensive use of 
characters not found on a standard US keyboard. The single best idea I've seen 
in this entire discussion is the text-transform:math* properties for CSS.

Now consider: what extra information would we gain by going though our
character map application until we find the right codepoint to express
the d operator? Presumably in each case a visual UA will display almost
exactly the same thing. An aural UA will probably read out "d x" (or
whatever) in either case, in the same way a human would. I guess a
hypothetical computer algebra package that can accept input from the web
might get confused but that seems like such a marginal case that it's
hardly worth optimizing for with the price of damaging language adoption.

> The code is how we can see very deficient even ignoring accessibility
> issues. Note that vectorial quantities are rendered in italic bold font.
> Many authors and some journals prefer roman font for vectors. Imagine you
> have 5 electronic documents containing 10 equations each one. Either you
> learn MathML (and then you are obligated to study three or four language
> even for simplest tasks) and modify by hand the 50 equations or either you
> modify the IteX source. Since the IteX source is presentational, you would
> change each \mathbf in the 50 equations (even using a macro or an
> automated search and replace the task wastes time).

Of course, as an author, one can improve this in TeX by writing a single
\vec macro that changes the formatting to a vector style. Then it is a
simple matter to change vector formatting everywhere with a single
simple change to the macro definition. So, if one wanted to make life
easy for LaTeX authors who envisioned targeting the web, one could
provide a package that would add some mapping onto the more semantic
constructs of the target language. But the majority of authors will have
legacy content that does not use these features and it must be possible
to convert that legacy content to the new output format if you want it
to gain any traction, even if the content produced is not so suitable
for wholesale style changes at a later date (which is a feature that
authors have lived without for years).

> How do encode this example in HTML-Math? Well, that may be debated here
> but a workling possibility could be (I use MathML entities by commodity,
> they could be substituted by Unicode)

[lots of musings about language design]

Note that designing a markup language that can represent maths is
trivial by comparison to the task of making people use that language. My
point throughout is that if you want people to use the language then
backwards compatibility is key.

> And what if I send a document? Would I send the source? The
> final HTML? Both?

It depends who you're sending it to, and for what purpose, obviously. If
you were sending it to a coworker for editing, you would send the
original source. If you've done something fancy manipulating the DOM of
the final output, you would have to send that. It's no different to
LaTeX-postscript (or any other conversion process) - in 99% of cases where the
postscript can be regenerated from the original source, you edit the LaTeX,
in the 1% of cases where you manually edited the postscript file, you'll
have to work with that from now on.

> I see no reason for limiting capabilities of a web markup by
> satisfying a subset of academicians who want not waste their time on
> learning best markup languages. 

I see no point in wasting time designing a document markup language that
will be roundly ignored by ~100% of the people creating content.

> Somewhat as HTML was not designed with
> LaTeX as a “document creation format” in mind but was derived from solid
> and sophisticated SGML

But HTML succeeded for 2 reasons:
1) It was simple (consider the relative semantics on offer with Docbook,
for example, and the relative popularity of each)
2) It wasn't SGML. At least not for long. Browsers brought an
unprecedented ease of authoring to HTML. Sure, it has come back to bite us
now, but the fact that you could send almost any garbage to a browser
and get something rendered on the screen made HTML accessible to people
who wouldn't otherwise have been authors.

>> I should say that, as far as I can tell, using LaTeX as the input
>> language isn't  the accessibility disaster that you make out.
> 
> you? Have you noted that LaTeX was ignored by Maple, Mathematica, ISO
> 12083, EuroMath, MathML, OpenMath...

Yeah and look at how many authors are using those to create content
(note: the primary function of Maple and Mathematica is computer
algebra, not document creation). They may be used by big publishers, I
don't know, but that's utterly irrelevant. The web is primarily a self-published
medium and so things have to be easy for individual authors. Big
publishers also use Docbook but that doesn't mean we should be trying to
use it on the web. Those creating mathematical content largely use
LaTeX. If our publishing solution is not designed so that LaTeX -> foo
converters produce good-looking output then the exercise will be as
futile as XHTML2 is looking to be.



More information about the whatwg mailing list