[whatwg] Mathematics on HTML5

juanrgonzaleza at canonicalscience.com juanrgonzaleza at canonicalscience.com
Fri May 26 06:58:41 PDT 2006


I have read with great interest this program and I would recommend
reconsideration of the role of mathematical markup in HTML5. But I would
first explain a my position. Initially, I began believing that web
authoring was "save as" command in Mword. Next I begin to work with a real
HTML tool and discovered the XML world just next. Then I, seeing all hype
around the new exciting technology, decided to generate a pure XML
website: XHTML for text, CML for chemistry, MathML for maths, SVG for
graphics, XSLT for programming, XSL-FO for style...

Big mistake! Moreover due to difficulties on implementation of young
technologies, cross-browser (in)compatibility and so on I focused on
XHTML+MathML and used CSS as a first styling language (because browsers do
not support FO today I said). More errors! After I learned DOM and
JavaScript because something so simple as a drop-down menu cannot be done
in XSLT (it is not for dynamical pages).

Now I learned XSL-FO and even if tomorrow was implemented in browser I
would *not* use FO. It is ugly and inefficient! Similar thoughts about
SVG. I proved and quickly abandoned. I see with terror how people has been
criticizing "old" table-layout gigantic pages filled with presentational
<center>, <b>, and <font> whereas now it is "in" to server to clients a
giant SVG archive with lot of presentational tags simulating tables,
paragraphs...

More headaches became from the XHTML part, specially incompatibilities
with browsers and search engines, the nightmare of MIMEs, and others.
Finally I abandoned...

But biggest error was try to use MathML. MathML is full of incorrect
design options and technical holes! Even some MathML author recognizes
that content MathML was not "well thought" due to lack of agreement on the
committee.

The failure of HTML math was not because lack of interest in mathematics
or because HTML cannot represent math. The failure was because design of
HTML-Math was able to join the poor of TeX with the poor of SGML being
unusually rejected. The W3c did a poor work with HTML-Math and also with
MathML 1.x and last 2.0.


**************************
Some problems with MathML:

1) Insanely complicated and inefficient. In some cases, I have computed 15
times more bandwidth and server storage when using MathML than
alternatives.

2) Not fully compatible with other basic technologies such as CSS and DOM.
I find interesting that just after ten years and many specifications the
MathML WG begins to ask what would be changed in MathML for CSS friendly!
Also the MathML WG has clearly stated that no backward incompatible
changes will be done to future MathML 3.0. End of history.

Position paper for HTML 5 says

<blockquote>
Web application technologies should be based on technologies authors are
familiar with, including HTML, CSS, DOM, and JavaScript.
</blockquote>

Well MathML is not really based in those. But we can render math using
just HTML, and CSS, and we can use JS and DOM in the same way we use in
HTML or CSS for text. Look XML-MAIDEN
[http://www.geocities.com/csssite/index.xml] for ideas, samples, etc.

And adds:

<blockquote>
Basic Web application features should be implementable using behaviors,
scripting, and style sheets in IE6 today so that authors have a clear
migration path. Any solution that cannot be used with the current
high-market-share user agent without the need for binary plug-ins is
highly unlikely to be successful.
</blockquote>

Well precisely MathML violates that. George is preparing a cross-browser
CSS also working in MSIE. MSIE does not provide native support for MathML
because difficulties for unification with rest of DOM and rendering
engines; they prefer external plug in, somewhat as Opera browser
developers rejected native support for MathML before break the browser,
whereas Firefox uses an external module built-in by similar motives.

The points "Users should not be exposed to authoring errors" and
"Device-specific profiling should be avoided" are also violated by MathML.
For example, rendering of MathML in Firefox is based in specific fonts may
be downloaded and installed. This has been disapproved. One of problems
with this approach is that once new STIX fonts available I can use them in
HTML, also in CSS rendering of math, but I cannot use them in firefox,
since MathML module would be rewritten, and the full engine recompiled,
obligating to users to download and install new versions of browser for
new fonts!!!

3) Incompatible with other markup models. For example, superindices are
encoded in different ways in XHTML that in MathML, you would use style=""
attribute for changing font or colour of a token in XHTML but may use the
<mstyle> tag in MathML, etc. In XML-MAIDEN you use <sub> and <sup> and
style="" in the same way that in HTML.

4) The default printing of MathML is not good and people is returning to
TeX for that!

5) Accessibility is very deficient in most of cases because people is not
using invisible operators not the correct number of <mrow>.

Accessibility is better with the old HTML+GIF+ALT models! Aural renderers
of HTML could be easily adapted to HTML math. Only content MathML 2.0 is
designed for accessibility (in theory) but support in current browsers is
zero.

Moreover, the situation is still poor than that! Many sites claiming
theoretical accessibilities (e.g. Distler blog) are serving (ds)^2 as
<mi>d</mi><msup><mi>s</mi><mn>2</mn></msup>, i.e. 2s ds!!!

6) There are problems with default rendering of entities and with usage of
invisible operators. Accessible code render ugly in screen whereas
visually correct code being inaccessible. This could be corrected with
HTML math and proper usage of CSS for selecting rendering (e.g. italic vs.
roman).

Take the case of x = 10 m. In HTML I could use <var>x</var> = 10 m or even
some <span class="unit">m</span> (which is a similar approach to other
Markup models for scientific units as STMML) and I could add CSS rules if
needed. How this is encoded in MathML? The w3c technical note on units
says

<blockquote>
Unit symbols are written in roman (upright) type, are not altered in the
plural, are not followed by a period except at the end of a sentence, and
no space is left between a prefix and a unit symbol. This is accomplished
in MathML by using the mi element. Single character symbols must be
qualified by setting the mathvariant attribute to normal as otherwise they
would be italicized. For example,

<mi mathvariant='normal'>m</mi>
</blockquote>

Yes, I find so odd the usage of mathvariant as you guys find usage of a
hypothetical

<span textvariant='italic'>x</span> instead of <var>x</var>.

7) The possibility for automated searches of math continues being largely
a myth. I can search E=mc2 in Google today when formula is encoded in HTML
4 and it works reasonably well, but how would I search the formula in a
MathML search engine?

<mi>E</mi>
<mo>=</mo>
<msup>
<mrow>
<mi>m</mi>
<mi>c</mi>
</mrow>
<mn>2</mn>
</msup>

or maybe

<mi>E</mi>
<mo>=</mo>
<mrow>
<mi>m</mi>
<msup>
<mi>c</mi>
<mn>2</mn>
</msup>
</mrow>

or maybe

<mrow>
<mi>E</mi>
<mo>=</mo>
<mrow>
<mi>m</mi>
<mo>&InvisibleTimes;</mo>
<msup>
<mi>c</mi>
<mn>2</mn>
</msup>
</mrow>
</mrow>

or maybe

<mi>E</mi>
<mo>=</mo>
<mi>m</mi>
<msup>
<mi>c</mi>
<mn>2</mn>
</msup>

or maybe

<mi>E</mi>
<mo>=</mo>
<mi>m</mi>
<mi>c</mi>
<msup>
<mrow/>
<mn>2</mn>
</msup>

or maybe

<mi>E</mi>
<mo>=</mo>
<msup>
<mi>mc</mi>
<mn>2</mn>
</msup>

or maybe

<mi>E</mi>
<mo>=</mo>
<mi>m</mi>
<msup>
<mrow>
<mi>c</mi>
</mrow>
<mn>2</mn>
</msup>

...

I have seen almost all of these codes being generated by real presentation
MathML tools. And note it is a simple E=mc2!!

8) Visual rendering is not incremental as in CSS. This can offer us
problems with large documents or even with server failures. I find just
curious the w3c emphasis on abandoning non incremental rendering of old
HTML presentational table layout models in favour of CSS layouts, whereas
forcing usage of a non incremental MathML presentational markup. Some
mathematical documents take order of 10 minutes before rendering in
Firefox.

9) MathML rendering does not fit with user preferences as CSS does.

10) Advantages of being using a "standard" vanish when one observes the
infinite malleability of mathml code. For example people is simulating
tensors with nested msup, msub, msubsup, and tricky mrows, instead using
<multiscript> and <none/>. Then hypothetical standardization advantages
are lost.

In HTML math one would reuse <sup> and <sub> and maybe some other tag for
a full representation of *any* tensorial structure.

11) p-MathML is not good enough for rendering math in browsers. Luca
Padovani writes,

<blockquote>
A quick analysis of the MathML markup reveals that there is no way to
preserve the structure of the formula and still have a "correct" rendering
at the same time.
</blockquote>

12) The use of presentational markup is contrary to common sense. I write
<H1> in HTML and next I said one -and only once- in a CSS how the heading
may be rendered in my doc. That CSS, when stored externally, can be called
by billions of others HTML docs. In MathML you are forced to repeat
presentation in each formula in each document, to use mstyle...

The use of a presentational language for mathematics remember me the old
days of the <font>, <b>, <i>, <center> tags. Little impact of MathML in
the web remind me the failure of XSL-FO to conquer the web. Instead
specific presentation MathML markup complemented with lot of <mstyle> tags
I would prefer semantic or structural markup.

Here there is a kind of general confusion. MathML authors believe that

<apply><divide/><ci>b</ci><cn>2</cn></apply>

may be the only way to content oriented -note that really we are encoding
<divide></divide> as first child-. Whereas

<mfrac><mi>b</mi><mn>2</mn></mfrac>

is presentational MathML. However, one could copy the standard ISO 12083
and write

<frac><num>b</num><den>2</den></frac>

this is not presentational markup. It is structural. You are encoding
fraction and its structural elements numerator and denominator. This is
similar to splitting of html documents into structural <head> and <body>.
Next structural elements <frac>, <num>, and <den> are rendered via CSS
rules. Modifying CSS rules you modify the presentation (heights, line
style, font sizes, colours, etcetera), somewhat as you modify the
presentation of headings in the same way.

**************************************
Proposals (from less to more radical):

A) Eliminate next text from specification

"Authors are encouraged to use MathML for marking up mathematics"

because authors would use more concise powerful and solid markup for
mathematics.

B) Add special math attribute can be used in structural markup. For
example math="num" would be equivalent to class="num", but using math=
specific attribute. You can also use the class attribute for other tasks.
This could be solved if space attributes (2.2.7) are implemented.

C) A more complete approach is providing a set of structural and/or
semantic tags for usage with HTML5.

This would close the cycle when HTML was designed as a small,
light-weight, non-proprietary, easy-to-use document format designed for
the publication and distribution of scientific documents

One needs little tags, because <sub>, <sup>, <var> and <table> can be
reused. The number of new tags and usage would be debated considering
different proposals available (XML-MAIDEN, ISO 12083...) but here some
illustrative examples: <math>, <frac>, <num>, <den>, <root>, <scripts> and
maybe two or three more tags would be sufficient.

Note that <frac> can be reused in text for introducing inline fractions
usually simulated as <sup>2</sup>/<sub>5</sub>. Note that <scripts> could
be used in text mode for typography (composed diacritics for example).

Implementation would be cheap because the model is backward compatible
with existing CSS, HTML, and DOM technologies. Moreover, visual rendering
is almost solved and one would simply implement some of available CSS
(e.g. George one) by default in browsers (somewhat as <h1> are already
rendered by default).


Juan R.

Center for CANONICAL |SCIENCE)





More information about the whatwg mailing list