[whatwg] Mathematics in HTML5

Sun Jun 4 06:23:57 PDT 2006

James Graham wrote:
>
> I could go on but
> at least in academic  fields, LaTeX is either the only format accepted
> for publication or the  preferred format.

In mathematics, and theoretical physics sure, in rest of science? I doubt.
In chemistry, LaTeX is not preferred for example.

> Note also that very very few people have the slightest interest in the
> publishing process itself. They simply wish to achieve high quality
> results at a  minimum of effort. This means that they will not be
> prepared to invest any time  in learning a new language, particularly
> one that is not already widely accepted  (chicken and egg problem) or is
> harder to use than the language they are  familiar with.

People learned to use typewriters, and after computers, and next text
processors, and TeX and LaTeX, and email, and HTML...

The key is that you learn any new tool when it is useful and solves
problems. TeX-LaTeX solves a minimum subset of problems of real life and
reason is not popular except in some academic communities. The only really
good point of TeX-LaTeX systems is on mathematical typesetting; textual,
graphical, diagrams, and others items are best done with different systems
and approaches.

> You may think I
> am overstating this but I disagree - bear in mind  that a significant
> fraction of astronomical (chosen merely because it is the  field I know
> best) software is written in Fortran 77. For many of these people
> almost 30 years of language design has never happened.

If Fortran 77 fulfills the needs they have no reason for the change but if
it does not fulfill then they will adopt Fortran 90, or C++, or Java, or
Maple, or anything else.

There are old academicians still using ordinary mail for communicating
with colleagues. Is this an argument against e-mail or when designing a
new communication model would we think in a subset of guys loving ordinary
mail?

> So, in general the people likely to be publishing mathematical content
> to the  internet have _no_ interest in writing their content in any
> format other than  LaTeX and especially not to a verbose format of the
> type that fits the XML data  model.

I always am perplexed of double measurement scale of TeX-people. They
rudely critique mathematical typesetting of programs such as MSWord. They
like use the word “unprofessional” for ranking many non-TeX systems.
However, most of web pages generated from TeX-LaTeX systems are really
unprofessional even at that small subset of static and boring academic
webpages.

People abandoned TeX-LaTeX in favor of best approaches in many places.
Some weeks ago I received a draft of manuscript prepared by a
mathematician and will probably be published in MSOR journal in brief. He
is not using TeX or LateX because limitations and write:

<blockquote>
Mathematicians have been served well by TeX and LaTeX for their
mathematical typesetting. Too well, perhaps. At least, if an dedicated
TeXnician of the last
ten years has a chance to \relax and look about himself he will see that
the rest
of the world has moved on in several incompatible ways to the cosy world of
TeX.
</blockquote>

>This is why the web is liberally
> sprinkled with the ugly gif output of  latex2html. If we want this
> situation to change, the _only_ solution is to allow  LaTeX as a
> document creation format.

For creation of unprofessional webpages or electronic documents? Okay.
Somewhat as anyone can create low quality webpages using “save as” in
MSWord, but if you want professional webpages then MSWord is not the
correct tool. Similar thoughts apply to TeX-LaTeX.

As an exercise let me comment ITeX output in one of your pages. I will not
review your web page “I'll go and play with words and pictures”, and I
will say nothing on the quality of the rest of web design not in its
typesetting.

You begin from an IteX source (a dialect of LaTeX) and next present the
MathML output generated. Then you claim

<blockquote>
It's pretty clear which version is easier to enter, read and maintain.
</blockquote>

Well. It is clear that IteX is easier to enter and read than MathML. But
if use this as an argument in favor of IteX then let me say that ASCIIMath
is still easier to enter and read. Therefore if easiner reallt matter one
would discard IteX and other Tex-LaTeX approaches.

However, IteX is not easier to maintain. If you are looking for basic
unprofessional encoding of mathematical formulae, then IteX is okay, but
if you are looking for professional encoding of formulae, IteX is not good
enough and this will obligate to you to learn CSS, XSL-FO, and p-MathML
for fine-tuning and maybe DOM, Javascript, or c-MathML (or even OpenMath)
if you want add interactivity and semantics to your encoding.

At the same time IteX is not useful for the rest of webpage content
(images, links, menus, text, metadata) not for preparation of electronic
scientific datuments. This will obligate to you to learn HTML or other
systems.

Take the MathML code you generated with your Linux/x86 binary

<math xmlns='http://www.w3.org/1998/Math/MathML' display='block'>
  <msub>
    <mo>∮</mo>
    <mtext>loop</mtext>
  </msub>
  <mstyle fontweight="bold">
     <mrow>
       <mi>H</mi>
     </mrow>
  </mstyle>
  <mo>⋅</mo>
  <mrow>
      <mi>d</mi>
    <mstyle fontweight="bold">
       <mrow>
          <mi>l</mi>
       </mrow>
    </mstyle>
  </mrow>
  <mo>=</mo>
  <msub>
    <mi>I</mi>
    <mrow>
      <mtext>free</mtext>
    </mrow>
  </msub>
  <mo>+</mo>
  <msub>
    <mo>∫</mo>
    <mtext>surface</mtext>
  </msub>
  <mfrac>
    <mrow>
       <mo>∂</mo>
       <mstyle fontweight="bold">
         <mrow>
           <mi>D</mi>
         </mrow>
       </mstyle>
    </mrow>
    <mrow>
      <mo>∂</mo>
      <mi>t</mi>
    </mrow>
  </mfrac>
  <mo>⋅</mo>
  <mi>d</mi>
  <mstyle fontweight="bold">
    <mrow>
      <mi>s</mi>
    </mrow></mstyle>
</math>

The first trouble is that structure of MathML code is very wrong. TeX-like
systems are token-based system designed for fixed layouts. Web and
electronic publications are different and good structure matters. For
example, good structure helps to breaking large formulae in liquid layouts
and is basic when copying and pasting fragments or manipulating
substructures by specialized tools. In fact, the point that TeX does not
correctly structure mathematics is one reason was rejected for MathML as
was for any other (SGML, XML...) mathematical or scientific markup.

Second trouble is in usage of MathML entities. This can produce problems
in interchange of data if receipting tool or document cannot access the
DTD entities declarations.

Third trouble is in modification of tree structure by addition of a
<mstyle> tag. Presentational markup would be less intrusive possible and
one of reasons that old <font> tag of HTML was substituted by style
attribute in HTML elements. Related difficulty is on the usage of
fontweight attribute, which is presentational and, therefore, to be
discouraged. The use of bold attribute in math is so boring as usage of
<b> instead <strong> in HTML.

I have not problems with accepting that unprofessional markup if webpage
was a schoolwork document generated by a 15 year-old student (somewhat as
I would not obligate to a student to present me a 1200 dpi document
printed with LaTeX). However, I would obligate to an academician to encode
using the type='vector” attribute. Ah! and do not forget that fontweight
is deprecated in MathML 2.0.

The usage of special mstyle tag is boring, but I find more perplexing the
redundant mrow around H token. I find the code so foolish as a
mathematician would find the expression x((x-1)) when you mean x(x-1) but,
moreover, there are well-known technical difficulties with redundant mrows
(at least in Gecko engines). The own Mozilla organization recommends to
avoid any unnecessary mrow, WS, or markup.

The IteX generated markup

  <mstyle fontweight="bold">
     <mrow>
       <mi>H</mi>
     </mrow>
  </mstyle>

would be encoded like

<mi type=“vector”>H</mi>

This double error is also present in other parts of the code.

Another point of disappointment is in the encoding of the differential.
The differential is encoded as a simple variable d. There exist special
entities defined in MathML DTD and also special Unicode fonts and the true
is those special character were designed with accessibility in mind.
Still, if by some reason the author wan not use the special differential
character, one can easily see that differential is not and variable or
identifier but a operator. Therefore, <mo>d</mo> is more accurate. The
same error appears in the other integral.

Again I find a redundant <mrow> around the “free” text fragment.

The code is how we can see very deficient even ignoring accessibility
issues. Note that vectorial quantities are rendered in italic bold font.
Many authors and some journals prefer roman font for vectors. Imagine you
have 5 electronic documents containing 10 equations each one. Either you
learn MathML (and then you are obligated to study three or four language
even for simplest tasks) and modify by hand the 50 equations or either you
modify the IteX source. Since the IteX source is presentational, you would
change each \mathbf in the 50 equations (even using a macro or an
automated search and replace the task wastes time). Next you would parse
again the source for generating new MathML markup, which would be
uploaded. I do not call that maintainable (and reason most of academic
publishers do not use TeX like code in their publishing/archiving
systems).

I would use a more solid HTML-Math approach and a standard CSS external
stylesheet. I could change the rendering of millions of pages in my site
with a simple change in a CSS rule.

Moreover, there are additional problems when trying to add dynamism or
links to code when using IteX code.

How do encode this example in HTML-Math? Well, that may be debated here
but a workling possibility could be (I use MathML entities by commodity,
they could be substituted by Unicode)

<df>
∮<sub>loop</sub>
<var class=“vc”>H</var>·<var class=“df”>d</var><var class=“vc”>l</var>
 = <var>I</var><sub>free</sub> + ∫<sub>surface</sub>
<frac>
<num>∂<var class=“vc”>D</var></num>
<den>∂<var>t</var></den>
</frac>·<var class=“df”>d</var><var class=“vc”>s</var>
</df>

that of course can be changed and improved in many ways. Note that there
is more information than in the original IteX source after translated to
MathML (for example I am encoding diffenretials). If main goal is
simplicity of markup and one just can reply IteX results, then one could
try something like

<df>
∮<sub>loop</sub>
<var><b>H</b></var>·d<var><b>l</b></var>
 = <var>I</var><sub>free</sub> + ∫<sub>surface</sub>
<frac>
<num>∂<var><b>D</b></var></num>
<den>∂<var>t</var></den>
</frac>
·d<var><b>s</b></var>
</df>

or still more simple (more IteX like)

<df>
∮<sub>loop</sub>
<b>H</b>·d<b>l</b>
 = <i>I</i><sub>free</sub> + ∫<sub>surface</sub>
<frac>
<num>∂<b>D</b></num>
<den>∂<i>t</i></den>
</frac>
·d<b>s</b>
</df>

If you compare this last version (containing same information that an IteX
source) then you can see that HTML-Math is not much more complex than
IteX. Look to next pairs

∮<sub>loop</sub>

\oint_\text{loop}

Note that the MathML entity is larger than TeX command. Using the Unicode
character the verbosity of HTML-Math is the same.

<b>H</b>

\mathbf{H}

d<b>l</b>

{d\mathbf{l}}

<i>I</i><sub>free</sub>

I_\text{free}

∫<sub>surface</sub>

\int_\text{surface}

Here also main difference is because I used MathML entity. Verbosity is
very close to that of IteX when using Unicode character.

<frac><num>∂<b>D</b></num><den>∂<i>t</i></den></frac>

\frac{\partial \mathbf{D}}{\partial t}

This is by far more verbose in HTML-Math. The MathML entities add
verbosity over the \partial command. Using the Unicode character

<frac>
  <num>∂<b>D</b></num>
  <den>∂<i>t</i></den>
</frac>

Of course there is some room for improvement. For example HTML let you to
avoid end tags in cases when there is not possibility for confusion. Same
option was available in SGML 12083 math. Therefore,

<frac>
  <num>∂<b>D</b>
  <den>∂<i>t</i>
</frac>

or

<frac>∂<b>D</b><den>∂<i>t</i></frac>

were valid markup.

On any case only with fractions you find significant verbosity over IteX
with none of the disadvantages of using a non-web markup. Now imagine that
you want add some behavioral properties to HTML-Math, performing
fine-tuning adjust of the visual rendering, or changing the visual style
of a numerator by didactical motives or so on. You will find further
difficulties when using a TeX-LateX-IteX source may be translated to
HTML-XML. And what if I send a document? Would I send the source? The
final HTML? Both?

LaTeX-like conversors generate basic webpages with an unprofessional (and
boring) look. I see no reason for limiting capabilities of a web markup by
satisfying a subset of academicians who want not waste their time on
learning best markup languages. Somewhat as HTML was not designed with
LaTeX as a “document creation format” in mind but was derived from solid
and sophisticated SGML, I think that HTML-math cannot be based in LaTeX
but would be based in SGML math (ISO 12083) or in XML-MAIDEN or similar
approach.

I think that this is so obvious that no time would be devoted to
discussion. Of course, anyone is free to develop a latex to HTML Math
translator if desire, so free as anyone using tables for layouts; simply
note the limitations of the approach.

> If, or whatever reason MathML is a poor
> target language for TeX->foo converters then maybe we should talk about
> improving the situation. But authors _will_not_ learn anything other
> than LaTeX.

They will learn if need to solve problems are not solved by LaTeX,
somewhat as they learn Java if programming, somewhat as learn e-mail when
communicating, somewhat as learn to use Adobe Reader for printing PDFs,
somewhat as they have learned Mathematica when need to compute a 1000-term
summation in a partition function.

> I should say that, as far as I can tell, using LaTeX as the input
> language isn't  the accessibility disaster that you make out.

you? Have you noted that LaTeX was ignored by Maple, Mathematica, ISO
12083, EuroMath, MathML, OpenMath...

You cite, for example, ASTER. It is true that ASTER was a breakthrough
time ago, but it cannot be considered to be the last word in the topic.
Obvious limitations of ASTER come from the inexistence of any kind of math
formulae navigation support or that it gets very complicated with complex
math expressions to follow the audio output with effectiveness. Since it
relies on LaTeX input, it can be ambiguous. How would ASTER read next TeX
fragments?

f(x-1), a(x), dy, Df, Df, Df, b^2, x^2, x^2.

> directly in  an XML language, the verbosity of the output language is
> almost irrelevant

Is it? Then why was MathML WG so worried about verbosity of fine parallel
MathML markup to the point they provided an alternative _less verbose_
encoding?

Juan R.

Center for CANONICAL |SCIENCE)