[whatwg] Mathematics on HTML5

juanrgonzaleza at canonicalscience.com juanrgonzaleza at canonicalscience.com
Sun Jun 4 06:12:33 PDT 2006


Michel Fortin wrote:

>
> One thing I know however is that the next time I'll have to put an
> equation on a web page, I won't go looking for a MathML editor just   to
> be able to generate the markup, convert the page to XHTML served   as
> application/xhtml+xml (so that it works with MathML) and ask my   users
> to install the required plugin or web browser just to see my   equation.
> I'll use an image: it'll be a lot simpler.

Not so simple if you need add maintenance, search, storage, printing, and
accessibility items to the list of requirements.

> What Juan propose, about adding a limited number of elements to HTML
> for maths, actually makes sense to me, especially if you can get not-
> too-bad results with CSS.

With more sophisticated design of CSS stylesheet and with more powerful
CSS engines and good Unicode fonts one could achieve TeX quality output.

> HTML is designed to be easy to learn and
> write; if we had a markup like that for mathematics which integrates
> easily in HTML it'd be much more used than MathML, I'm sure.

And if you can reuse those tags in text (e.g. <table>, <sub>, or <sup>)
and if you can use the same content model, DOM, and CSS techniques in both
text and math, then that is cheap. With MathML you cannot.


James Graham wrote:

> In this situation, I imagine most scientists will simply write LaTeX and
>  use a tool to produce the output format that they desire.

I doubt because LaTeX has not the sufficient capabilities for a full web
design.

> For MathML, there
> is already a reasonable story here since Itex2MML exists, although it
> really needs to be integrated with tools like hyperlatex if it is ever
> to be widely used. I would also argue that the difficulty of providing
> suitable imaged-based fallback content is a massive hindrance to the
> adoption of mathematical markup.

I proven that MathML code generated by IteX tool is very bad in several
occasions (in my weblog and in the official MathML mailing list).

> Look
> at the test page - some of the rendering is awful (the radical signs in
> particular stand out here).

The approach was designed to be minimalist. Of course it can be improved.
Moreover, radicals (looking better than in Firefox with native MathML
support) could be best rendered via future CSS embellishments for math.

> And, despite being sold as a simpler
> solution than a MathML implementation, it works in about 1% of UAs (by
> number of users) compared to > 95% that have a story for native or
> plugin-based MathML.

Original approach works in many rendering engines including off-line
engines as Prince. The approach has been recently generalized to work also
with several XSL-FO formatters (MathML does not work in FO).

Current problems are in current implementation of CSS standards rather
than problems with George approach. For example, it is needed good support
for inline CSS blocks. Firefox has a bug on that. The same bugs affected
Opera 8 and Prince 4, but were solved.

Prince developers fixed bug in some few days, whereas they were unable to
integrate MathML in the rendering engine in despite of many efforts.

The bug in Firefox is scheduled for a next release of Firefox. Therefore
inline CSS blocks will be correctly rendered. However, there is not
schedule for the unification of MathML in browser not for a full support
of the 2.0 specification. There is not schedule for implementation of
MathML in Opera, MSIE, FO formatters...

Moreover, George is now working in a cross-browser version of stylesheet
(using for example moz extensions as alternative to above bug) since three
or four weeks ago. My last news are he achieved a standard stylesheet
working in MSIE, Safari, Opera, and Firefox and also in several off-line
CSS printers and some FO engines for almost all the math (limitations are
due to partial support of standards in browsers but situation will be
better in future once full support for the standards). This is more that
MathML has been able to do in 10 years using specialized tool,
specifications, markup, browsers, dozens of publicity efforts in
journals...!

And do not forget that nobody know really Opera browser statistics,
because the Opera simulation mode doing that statistic tools confound it
with others browsers as Firefox or Explorer.

>The language that they have used is also overly
> simplistic.

They? Overly simplistic?

> For example one would expect most text in a formula to be in
>  italics except where actual words were being used in which case the
> text  should be roman. So you need an additional element to distinguish
> text  from ordinary numbers. Add a few more considerations like that and
> you  soon have a language that's just as painful to hand-author as
> MathML  (which, I agree, is far from perfect) and little support among
> end users.

Use roman text by default (as in text) and use <var> for italics. In HTML
you write

<p>This is an <i>important</i> text</p>

instead of (<r> for roman)

<p><r>This</r> <r>is</r> <r>an</r> <i>important</i> <r>text</r></p>

MathML uses last approach, which is both verbose and redundant. Moreover,
instead reusing working tags, MathML introduces new ones. For example you
can use the <var> tag in pure text but you are forced to use <mi> when
writing a MathML fragment.

> Also, I think it's worth mentioning that trying to get accessibility
> right for Maths content is likely to be extremely challenging. The
> chance of authors investing the time to allow a semantic e.g.
> spoken-word representation is 0 (this is incompatible from the
> 'everything will be generated from LaTeX hypothesis above). So I think
> it would be useful to know what actual scientific users currently do
> when faced with mathematical content in e.g. a PDF document.

This is a question of personal responsibility. If journal X encourages you
to submit articles in AmsTeX via ftp, you do not send a 5 ¼ floppy by
ordinary mail containing article in Atari text processor.

If you have no interest in accessibility, ok. But if you are interested in
accessibility or you are forced to provide accessible content by law (e.g.
official bodies in several countries) then what? MathML? Content do not
work for most of math and is not implemented in browsers. Presentation
MathML cannot disambiguate expressions even if were perfectly encoded, but
most of presentational code is far from perfect, for example IteX tools
are encoding prescripts via tricks (the same tricks avoided in ISO 12083)
instead of using the specific <mprescript> tag of p-MathML. What is poor,
since authors are encouraged to not learn the final MathML code, they have
no idea if are encoding a = b + c or sqrt(pi) instead.

Take the case of Distler blog (the same problem also in Living Reviews on
relativity articles generated by HERMES). He is encoding and serving 2 s
ds when initially he tried to encode (ds)^2.

In short, MathML code is being served in the web is less accessible than
using old GIF + ALT model. I see no reason for use a technology does poor
than available ones.


"Mihai Sucan" wrote:

> I
> am   currently using only Mathematica Notebook documents because MathML
> is not   supported by Opera and Gecko's support is not something I
> consider   awesome.

Hi.

The problem with MathML is that is not compatible with other web standards
and, therefore, very difficult to implement. There are also difficulties
related to semantics, automated searching, print, and accessibility.

> Good-enough implementations for MathML would probably make-up for some
> of the bad things in MathML.

Quality implementation = quality programmers + quality specification

Original HTML-Math (initially designed by w3c) was so bad that was
extraordinarily rejected. Subsequent MathML specifications contained
several errors and design mistakes doing very difficult or even impossible
implementation and spreading of the markup.

Take the example of content MathML. Authors apparently ignored decades of
experience on symbolic mathematics and provided several specifications
with sound mistakes. Some of mistakes were fortunately corrected in last
content MathML 2.0, but others remain. Recently, it has been proven that
something so simple as integral sin (x) dx is not correctly encoded in
content MathML, whereas it *can* be encoded in OpenMath (or in other
approaches).

If your markup is doing poor that those old techniques why would you waste
time on implementation of something that will not work even if you are the
perfect programmer?

Take now the case of presentational markup. How do you encode superscripts
in TeX, in HTML, or in ISO 12083?

A_b, a<sub>b</sub>, and a<sub>b</sub> respectively.

In MathML it is done (ignoring tokens) like <msub>a b</msub>. Then you
obtain difficulties for correct translation of TeX sources, difficulties
for correct implementation of MathML in rendering engine of browsers and
difficulties for printing, backward incompatibility with ISO 12083, and
HTML...

What is more, in ISO 12083 authors provided a very powerful script model.
This has not been achieved by the posterior MathML.

Since you are introducing base inside in MathML, you cannot easily amply
the markup. For example if I want add superscript I add one new tag: ^ in
TeX, and <sup> in both HTML and ISO 12083. You cannot do it in MathML
because base conflict and this obligates to you to introduce two new tag
and a new content model: <msup> and <msubsup>.

With *less* tags, ISO 12083 can encode script structures could *not* be
encoded in MathML. For example I need 5 and 6 scripts structures can be
easily achieved copiyng ISO 12083 markup model for scripts (model is
improved in my own canonical language). Those structures cannot be encoded
in MathML because limitations of markup. Then I suggested next new
encoding for next MathML 3.0 in official MathML list

<msubsupunderover> Base script1 script2 script3 script4 <msubsupunderover>

With each new script structure was not covered by p-MathML you will need a
new content model and new tags.

> If implementors can reach an agreement
> together,   they could even break the current MathML. If their new ideas
> would prove   good, then be sure the W3C will include those changes.

Unfortunately, MathML WG has officially said on that.

> Another different take:
> If LaTeX is considered to be the best available language for writing
> mathematical scientific documents, and the best for printing too... why
>  not have user agents implement it?

It is not THE best. It is very good (but boring) at mathematical
typesetting but is not good enough for web and reason was rejected for
several mathematical markups (ISO 12083, EuroMath, MathML, OpenMath,
XML-MAIDEN...).

>> But biggest error was try to use MathML. MathML is full of incorrect
>> design options and technical holes! Even some MathML author recognizes
>> that content MathML was not "well thought" due to lack of agreement on
>>   the
>> committee.
>
> You entirely dislike MathML. What do you think of OpenMath?

And many, many other people (this is reason of lack of popularity and
support after of 10 years and many propaganda).

OpenMath is just for content and lacks presentational/structural
capabilities. Number of tools is really small :-). OpenMath’s design is
more solid than Content MathML 2.0 in some points but I am not sure of its
capabilities for correctly encoding meaning of mathematical concepts not I
am sure of possibilities as universal data-format between tools.

> Make your own proposal. Which is the currently available standard you'd
>  like implemented? None?

For what? typesetting? semantic web? computerized mathematics?

For structure and presentation i think that ISO-12083 international
standard for electronic documents is good enough. In fact, XML-MAIDEN is a
modification of ISO-12083 (designed before XML and CSS) for doing it more
browser accessible (e.g. CSS friendly).

In fact, I offered p-MathML support since 2005 in canonical website but I
just obtained headaches in despite of many effort, dozens of tools,
plugins, fonts, MIMES, lot of emails, and so on. I decided abandon MathML
support.

> I'd be interested of your Canon (Markup Language).

Thanks! M is for meta :-)

Please copy anything of interest and report me errors or best ways to do
things.

>> 1) Insanely complicated and inefficient. In some cases, I have
>> computed   15
>> times more bandwidth and server storage when using MathML than
>> alternatives.
>
> Bandwidth is becoming more and more less of a problem. Hard disk space
> is   not close to a problem, since it's cheap and everybody who's
> serious about   working with lots of data has terabytes storage. I
> myself have close to a   TB, and I am not doing anything too special.

The tendency in web design (and web recommendations is just to save kbs)

Look for the first of the benefits of compliant w3c web sites on the list:

[http://snews.awddesign.co.uk/snews/designs/snews_business/index.php?id=9]

I find interesting W3C effort on CSS layout models for saving bandwidth
(see figures)

[http://www.maccaws.org/kit/primer/]

[http://www.w3.org/WAI/bcase/benefits.html]

[http://www.zeldman.com/dwws/]

whereas MathML markup breaks completely the tendency providing markup can
be 15 or more times more verbose is reasonable.

I also find very interesting this reply from the MathML FAQ

[http://www.w3.org/Math/mathml-faq.html]

<blockquote>
Does the verbose MathML syntax takes a long time to transfer across the
net and parse?

By comparison with existing image based methods of embedding math(s) in
web pages, for example GIF files, MathML is relatively quick to transfer
and process.
</blockquote>

and also I find amazing that *full* fine parallel markup was so insanely
verbose and unmanageable that own MathML WG was obligated to introduce an
alternative approach.

I would recommend anyone encoding of E=mc2 in full parallel MathML markup
as an exercise.

> As for alternatives to MathML, be sure they've got their set of
> weaknesses. MathML shouldn't be trashed just because some people don't
> like it.

It is simpler than like/dislike: work/does-not-work.

> The specification didn't just reach recommendation suddenly,
> without receiving any positive appreciation and implementations.

Well I do not know details of history but I know that group has been
defined as immune to external criticism. I also know that design of MathML
was full of political pressure because internal wars in the committee.
This is even recognized by Neil Soiffer!

<blockquote>
>From the earliest days, the MathML working drafts included structured
presentation. This was not without controversy
</blockquote>

Presentational markup violates basic web guidelines. It is amazing that
whereas <font>, <i>, and <b> are deprecated from HTML, MathML introduces
lot of presentational tags.

<blockquote>
some members were very opposed to disambiguation characters
</blockquote>

Understandable. They do not correctly work. A problem about this was
discussed last month in MathML mailing list. I already did not supported
disambiguating characters in my previous CanonMath approach (after
abandoned) but alternative idea has been reused in my previous program.

<blockquote>
Human authorable MathML was one of the goals listed in the MathML charter.
Many people felt human authorablility was one of the reasons HTML was so
successful.
</blockquote>

but add

<blockquote>
Ultimately, the MathML committee couldn’t reach agreement on an input
syntax and decided that the marketplace should on the syntax.
</blockquote>

<blockquote>
Many people asked why didn’t we use TEX?

Which part of TEX? TEX is not amenable to the growing number of XML tools
such as CSS, XSLT, DOM, parsers...
</blockquote>

And now we see people using TeX as input syntax because MathML is so
insanely verbose. At the end, the MathML code generated from TeX syntaxes
is so limited as initial TeX syntax is.

About content MathML

<blockquote>
Not all members were in favor of it for MathML 1.
</blockquote>

<blockquote>
Many of the people proposing content were opposed to presentation and the
earlier form allowed specifying some things not possible in the latter
notation.
</blockquote>

<blockquote>
Due to greater emphasis and discussion on presentation MathML, content
MathML was not as well thought out as Presentation MathML. As evidence of
this, MathML2 has many content fixes (deprecates <fn> and <reln>), and
adds some tags that were glaring omissions (eg, <lcm/>).
</blockquote>

One of points I find more surprising is that just a month or so one of
MathML authors asked in public that would be changed in MathML for fully
compliance with CSS standard! After of 10 years and 5
drafts/recommendations they are asking for compatibility with CSS now!
Something is not working here.

> IMHO, a good implementation for any math-related web technology must not
>   ask the user to download fonts, to install some plugin or anything
> similar. I do not like Gecko for the fact it asks me to download
> mathematical fonts.

Still poor, Gecko rendering engine is optimized for *those* fonts and if
changing fonts you (they) may rebuild the rendering engine and serving a
new browser version would be download and installed by clients together
the new fonts of course.

> Math WebSearch - A semantic search engine
> http://search.mathweb.org/
> http://kwarc.eecs.iu-bremen.de/software/mmlsearch/
>
> I'm not sure if searching math is entirely a myth. This is a recent
> guided   research project done by a student of Dr. Kohlhase.

I was referring to MathML. Somewhat as MathML is not very popular at the
browser side it is not popular at the search engine side.

>> **************************************
>> Proposals (from less to more radical):
>>
>> A) Eliminate next text from specification
>>
>> "Authors are encouraged to use MathML for marking up mathematics"
>>
>> because authors would use more concise powerful and solid markup for
>> mathematics.
>
> I don't agree because authors would just use images with no alternate
> text   or... they'd even give up adding math to their page. If they
> don't,   they'll find a way to use a WYSIWYG editor to generate the
> borked HTML   code that looks "perfect" in IE & FF.

and using MathML academic journals are encoding (ds)^2 as 2 s ds

<mi>d</mi><msup><mi>s</mi><mn>2</mn></msup>

whereas via simple HTML I can write ds<sup>2</sup>. What is poor?

And author of self-proclaimed semantic HERMES project is serving XHTML
docs where layout is done with empty paragraphs <p></p> and authors or
dates are encoded as headings of level 3. But that is not a problem of
XHTML it is a problem of the author.

>> C) A more complete approach is providing a set of structural and/or
>> semantic tags for usage with HTML5.
>
> Mathematics is a field which is very complex and you cannot dream of a
> simple solution for displaying advanced scientific documents.

Then MathML or OpenMath are a waste of time. Why recommend former in the
WHATG specification?


Juan R.

Center for CANONICAL |SCIENCE)






More information about the whatwg mailing list