[whatwg] Mathematics on HTML5

Henri Sivonen hsivonen at iki.fi
Mon May 29 13:57:15 PDT 2006

First of all, I am not saying that MathML is great. However, when  
someone proposes replacing a language with another one in the context  
of the Web where there are strong network effects and formats cannot  
be considered on their intrinsic technical merits alone, I think the  
burden of proof is very much on the one proposing a new format.

 From time to time people vent their frustration with a particular  
language by proposing abandoning a language for something supposedly  
better in a one-off message to a list. Your message seemed to start  
in the rant mode, hence my dismissive tone. Sorry.

On May 27, 2006, at 20:10, <juanrgonzaleza at canonicalscience.com> wrote:

> Henri Sivonen <hsivonen at iki.fi> wrote:
>> Math even more than schemas or vector graphics needs to have an XML
>> syntax, because math needs to integrate in prose on a more profound
>> level than e.g. replaced elements would allow.
> I do not agree; the current MathML is not really integrated with  

I said that math needs to integrate with the surrounding prose. I did  
not say that MathML is integrated right. The point was mainly that  
there needs to be an XML syntax rendered by the same engine as the  
prose--or at minimum the renderers need to communicate the baseline  
and line breaking--and rendering math as a replaced element (possibly  
using a totally non-XML syntax) is not good enough.

> Similar thoughts when one decides
> reusing old but effective HTML <table> element instead of adding new
> redundant ones: <mtable>, <mtr>, and <mtd>.

Unfortunately, HTML tables carry a parsing legacy that makes them  
problematic in text/html inline content. (And matrices in math need  
to occur in inline content.)

>> gzip
> 1) First I would note that I was not talking of verbosity of XML  
> end tags
> or similar, but of the inefficient markup model specific to MathML.  
> Have
> you tried to encode E=mc2 in full parallel MathML? And what about fine
> parallel markup?

I haven't. I decided long ago that I don't want to write MathML by hand.

> 2) It does no sense to offer people gzip archives of online  
> documents for
> downloading and reading off-line!

Gzip is built into HTTP.

Very often when people complain about the size of something, enabling  
compression on the HTTP level works just fine and is non-disruptive.  
That's why it should always be considered before more drastic measures.

> The w3c has done a big effort on providing us lightweight rational
> alternatives to old insane approaches. A typical example is the  
> usage of a
> simple CSS external document for all your HTML documents instead of
> repetitive encoding of font style in each paragraph of each document.
> MathML just break this tendency providing one of most ultraverbose and
> redundant encodings I have seen in my life.

The W3C's track record with CSS, PNG and XML 1.0 does not necessarily  
generalize to other specs. :-(

> The code in a MathML fashion would be typed like
> <mp><mr>This</mr> <mr>is</mr> <mr>an</mr> <mi>important</mi>
> <mr>text</mr></mp>

I see your point.

> However, since presentational markup is not likely, above <i> is  
> better
> encoded as <em> in HTML. Next <em> is already rendered as italics by
> default but I can change rendering via i) CSS in the head of  
> document ii)
> external CSS is used by several documents at once iii) Special CSS  
> rule
> add via style attribute

(You can change the presentation of <i> in the exact same way.)

> 4) Still using gzip other approaches are less expensive in both  
> disk and
> bandwidth.

Perhaps, but with torrented videos and spam using up Internet  
bandwidth, I don't think that alone is good enough a reason to revamp  

>> How is MathML not compatible with the DOM?
> Introducing specific DOM model does implementation in browsers mainly
> impossible. MathML is not integrable with rest of browsers  
> technologies as
> DOM, CSS, and WS model.

WS? Web Services? How is that relevant?

> There is a lot of technical details in Opera browser developers  
> site on
> why they rejected *native* MathML support.

Could you cite an URL, please? I searched and I found a discussion  
where the XML-MAIDEN guy was against MathML in Opera and two Opera  
employees (Moose and csant) were in favor.


>>> Well MathML is not really based in those. But we can render math  
>>> using
>>> just HTML, and CSS, and we can use JS and DOM in the same way  
>>> we   use
>>> in
>>> HTML or CSS for text. Look XML-MAIDEN
>>> [http://www.geocities.com/csssite/index.xml] for ideas, samples,  
>>> etc.
>> Interesting. However, the results have the look and feel of a
>> afterthought math editor for a word processor rather than the  
>> look   and
>> feel of pdfLaTeX output.

I admit I did not properly appreciate the page. I guess my bogometer  
was fooled by the domain name, a demo link leading to a Yahoo! error  
page and the mention of DTDs.

Now that I took a closer look, it appears that the spec is hidden in  
comments in DTDs. XML-MAIDEN could really use a proper spec.

> The look and feel are better that with MathML.

I disagree. And I am on a Mac here and MathML rendering in the Mac  
builds of Gecko is known to suck.

> Moreover, the MAIDEN markup can be transformed to TeX for printing  
> via TeX
> engines whereas better CSS-based printed engines are not ready.

I found the XSLT program but are there examples of results on the Web?

> Do not forget that those articles are generated with a couple of  
> simple
> CSS 2.1 rules *without* font metrics information

I think that's a bug rather than a feature. The layout should not  
require a particular font, but the math layout engine should make  
good use of font metrics at the client.

> Fine tuning in the web can be achieved complementing the generic
> XML-MAIDEN stylesheet with more rules for special cases or with fine
> tuning CSS rules directly inserted in the document.

I thought one shouldn't have to tweak CSS to get math to render  
properly. By default, the authoring experience should be on a similar  
level of "just works" as LaTeX with the default templates.

> It is not difficult to achieve TeX output quality when using font  
> metrics.

Any demos of that?

There are many programs out there that have access to font metrics  
and still get nowhere near TeX.

> And, of course, it is close to impossibility that you can provide a  
> TeX
> engine for the web

IIRC, IBM tried, though.

>>> One of problems
>>> with this approach is that once new STIX fonts available I can use
>>> them in
>>> HTML, also in CSS rendering of math, but I cannot use them in  
>>> firefox,
>>> since MathML module would be rewritten, and the full engine
>>> recompiled,
>>> obligating to users to download and install new versions of browser
>>> for
>>> new fonts!!!
>> The PUA mapping is indeed a problem. If you want to see a change    
>> here,
>> I suggest creating an OpenType font that uses the Type 1
>> outlines from the YandY version of Computer Modern and has proper
>> Unicode mappings.
> I prefer to follow usual web design guidelines providing rendering  
> engines
> and technologies were independent of the fonts installed at the client
> side.

That was not the point.

The point is that there's an incentive to build custom support for  
particular legacy fonts, because properly encoded TrueType or  
OpenType fonts are missing. To bootstrap an ecosystem of properly  
encoded fonts, the obvious course of action is repackaging the  
outlines from the legacy fonts. That would remove the incentive to  
build custom code.

>>> 4) The default printing of MathML is not good and people is
>>> returning to
>>> TeX for that!
>> In general, Knuth was over 20 years ahead of everyone else. CSS-based
>> typesetters are still catching up with TeX on some things. (And the
>> bar is pretty high.)
> No. It is relatively trivial to provide TeX quality in different  
> markups
> when one knows font metrics.

Where's the demo? Why do most math renderers suck if it is trivial to  
get right?

> In fact, one can see several authors providing TeX quality with SVG  
> and
> even with HTML approaches when font metrics are known.

Could you cite URLs, please? How were those documents produced? Was  
there some kind of LaTeX->pdfLaTeX->PDF->pdf2svg->SVG pipeline involved?

> The really difficult problem is to provide good typesetting quality
> without rely on specific fonts; Knuth has not solved this still ;-)

One can change fonts in TeX. The problem is that CM is hard to beat  
in consistent glyph coverage.

> HTML Math could be incorporated in a few days because it an  
> incremental
> implementation.

Presumably, if it is so easy, in a few days we can download your  
custom build of Gecko or WebCore that implements a prototype, right?

>>> 5) Accessibility is very deficient
>> A different syntax won't help. Implementations of accessibility tools
>> will.
> False. Alternative syntaxes for mathematics already proposed are more
> accessible than MathML by several technical motives.

Do you mean "Are more accessible with existing tools." or "Would  
theoretically be more amenable to accessibility tools that have not  
yet been developed."?

> Ambiguous rendering are absent when one uses old GIF model + ALT.

Assuming that whoever produced the page bothered to provide  
alternative text.

Some automated tools put LaTeX or Mathematica source there, which may  
be quite reasonable both for copying into those tools and for human  
consumption. (I remember reading about a blind person who reads LaTeX  
source in Braille.)

>>> Moreover, the situation is still poor than that! Many sites claiming
>>> theoretical accessibilities (e.g. Distler blog) are serving (ds) 
>>> ^2 as
>>> <mi>d</mi><msup><mi>s</mi><mn>2</mn></msup>, i.e. 2s ds!!!
>> I'm pretty sure Distler doesn't claim his math to be accessible, and
>> I'm pretty sure he is quite aware of the paradox that AsTeR does not
>> support MathML even though its author was on the WG.
>> http://golem.ph.utexas.edu/~distler/blog/archives/000199.html
> Distler does lot of different claims. In the accessibility statement
> [http://golem.ph.utexas.edu/~distler/blog/accessibility.html]
> says
> “Equations are written in MathML 2.0.”

That's a statement of fact about what language is used. It is not an  
assertion about the accessibility properties of the language.

> And do not explain that accessibility of their self-proclaimed
> ultra-advanced blog is poor that if had used old HTML + GIF + ALT  
> model
> (or using PDF or even LaTeX).

Chances are it would be more accessible with GIF + alt.

> Perfect p-MathML 2.0 code is not really accessible, but still poor,  
> MathML
> code is being served by Distler is structurally invalid and based in
> tricks. For example, he is using <mrow> and tricky collections of
> <msubsup> for simulation of tensors. He does not use invisible  
> operators
> introduced in MathML. He encodes prescripts as in TeX via empty groups
> instead using <prescript> tag. This is odd!

Not odd really, considering that his MathML is programmatically  
generated from iTeX source.

> Accessibility of MathML is just a myth.

It is quite possible that it is inherently broken. I centainly  
believe that it is a myth right now.

>>> 7) The possibility for automated searches of math continues being
>>> largely
>>> a myth.
>> Many, many things related to searchability, internationalization and
>> accessibility are myths in the realm of semantic markup.
> Then this is argument for not usage of MathML in HTML 5, point.
> However, you just are systematically failing to understand main points
> here. Take the case of searches.

I was not suggesting that MathML is really searchable. I was agreeing  
that there are myths in the area.

>>> 8) Visual rendering is not incremental as in CSS. This can offer us
>>> problems with large documents or even with server failures. I find
>>> just
>>> curious the w3c emphasis on abandoning non incremental rendering of
>>> old
>>> HTML presentational table layout models in favour of CSS layouts,
>>> whereas
>>> forcing usage of a non incremental MathML presentational markup.  
>>> Some
>>> mathematical documents take order of 10 minutes before rendering in
>>> Firefox.
>> This is not a design problem with MathML. This is Mozilla bug #18333.
> Is _that_ bug related to the 10 minutes, to that MathML rendering  
> is not
> incremental when compared to a CSS solution, or what?

It is the bug related to the general lack of incrementality in XML  
rendering in Gecko. Designing languages around that bug is a *really*  
bad idea. Moreover, it is even worse to think that the bug is somehow  
inherent to MathML or XML. (Yes, people do entertain such ideas.)

> Precisely LaTeX substituted TeX presentational markup by default
> stylesheets and macros with emphasis on content. You apparently  
> misguided
> the point that HTML was semantic markup, after transformed in
> presentational markup by big developers (the nightmare of <font>) and
> recently retransformed again in semantic markup with presentation best
> done by CSS and elimination of <font> and family.

Formulae in LaTeX are still not guaranteed to be unambiguous enough  
("semantic") to be processed as math by an AI-incomplete symbolic  
math package. Mathematica, on the other hand, encodes data sufficient  
for unambiguous input for symbolic math manipulation.

Of course, what everyone would like is something that looks as good  
as LaTeX output but that can be pasted into Mathematica or Maple or  
somesuch and symbolically manipulated and evaluated. But that's a  
hard problem.

> Any other
> alternative approach could be implemented in browsers in a few days,
> because one can reuse working HTML, CSS, and DOM.

I look forward to your implementation.

>>> C) A more complete approach is providing a set of structural and/or
>>> semantic tags for usage with HTML5.
>> Scope creep.
> ?????

Adding math to HTML5 would broaden the scope of HTML5. A broader  
scope translates to more editorial work and more implementation work.

> But again apple and oranges. I am explaining that one can provide a  
> better
> support for online mathematics than using presentation MathML just by
> addition of a few tags to the future HTML. Your appeal to LaTeX  
> appears a
> bit off topic and in any case is not relevant for not considering my
> proposal of avoiding MathML as mathematical markup.

I believe that "a few tags" are sufficient and that LaTeX provides  
too much when I see a working LaTeX to few-tags converter that can  
deal with the wide range of mathematics currently typeset using LaTeX.

Henri Sivonen
hsivonen at iki.fi

More information about the whatwg mailing list