[whatwg] Mathematics on HTML5

juanrgonzaleza at canonicalscience.com juanrgonzaleza at canonicalscience.com
Sun Jun 4 06:15:53 PDT 2006


Henri Sivonen wrote:

> I said that math needs to integrate with the surrounding prose. I did
> not say that MathML is integrated right. The point was mainly that
> there needs to be an XML syntax rendered by the same engine as the
> prose--or at minimum the renderers need to communicate the baseline
> and line breaking--and rendering math as a replaced element (possibly
> using a totally non-XML syntax) is not good enough.

Precisely main difficulty with p-MathML is on difficulty for integration
with rest of browser rendering engine. A standard XML or HTML for content
CSS for presentation technique is just in the correct way.

>> Similar thoughts when one decides
>> reusing old but effective HTML <table> element instead of adding new
>> redundant ones: <mtable>, <mtr>, and <mtd>.
>
> Unfortunately, HTML tables carry a parsing legacy that makes them
> problematic in text/html inline content. (And matrices in math need   to
> occur in inline content.)

Then use CSS inline table.

In MathML you need introduce a new table model is not compatible with HTML
table model (precisely this is one of reasons of deficient implementation
of MathML on Firefox). Providing best support for CSS in browsers you can
display inline mathematical tabular structures (e.g. matrices) whereas you
can reuse the code in other uses in pure text.

>> Have
>> you tried to encode E=mc2 in full parallel MathML? And what about fine
>> parallel markup?
>
> I haven't. I decided long ago that I don't want to write MathML by hand.

I do not know of none tool can do the encoding.

>> 2) It does no sense to offer people gzip archives of online
>> documents for
>> downloading and reading off-line!
>
> Gzip is built into HTTP.
>
> Very often when people complain about the size of something, enabling
> compression on the HTTP level works just fine and is non-disruptive.
> That's why it should always be considered before more drastic measures.

Please note that verbosity of full parallel MathML is so immense that
MathML WG was obligated to provide an alternative encoding via id and href
(still order of 15 times more verbose for E=mc2 that is a trivial
equation).

Moreover, HTTP compression does not work if I want open a very large
database or complex mathematical equation on my computer for
reading/editing.

> The W3C's track record with CSS, PNG and XML 1.0 does not necessarily
> generalize to other specs. :-(

But we can correct that. This initiative in fact born from disappointment
of users and developers with some of w3c guidelines for web.

>> The code in a MathML fashion would be typed like
>>
>> <mp><mr>This</mr> <mr>is</mr> <mr>an</mr> <mi>important</mi>
>> <mr>text</mr></mp>
>
> I see your point.

Ok. Now note I was trying to simulate just p-MathML in text. But MathML
defines also content markup and parallel markup, therefore, above
illustrative example would be retyped as

<semantics>
<cp><cr>This</cr> <cr>is</cr> <cr>an</cr> <ci>important</ci>
<cr>text</cr></cp>
<annotation-xml encoding='XHTML-Presentation'>
<mp><mr>This</mr> <mr>is</mr> <mr>an</mr> <mi>important</mi>
<mr>text</mr></mp>
</annotation-xml>
</semantics>

Computer know that the full content text is related to presentational text
as a whole but does not know if <mi>important</mi> is the presentation of
<cr>This</cr>, of <ci>important</ci>, or of <cr>text</cr>. Then you may
use fine parallel markup is much more verbose and complex still. Then you
map each content structure with presentational in a recursive way.

But this only will say that “important” is a content indentifier (<ci>) is
rendered as italics (<mi>). If you want add semantics to the generic <ci>
token you would add semantic definition via URL and <symbol>.

Moreover you may define markup in each piece of text again and again in
each document.

In HTML or XHTML I write content oriented

<p>This is an <em>important</em> text</p>

and display it via a simple CSS rule I can write in the head of the
document or in a external stylesheet just one time.

> Perhaps, but with torrented videos and spam using up Internet
> bandwidth, I don't think that alone is good enough a reason to revamp
> markup.

Note own MathML WG revamped original parallel encoding because verbosity.

>>> How is MathML not compatible with the DOM?
>>
>> Introducing specific DOM model does implementation in browsers mainly
>> impossible. MathML is not integrable with rest of browsers
>> technologies as
>> DOM, CSS, and WS model.
>
> WS? Web Services? How is that relevant?

WhiteSpace is different in XML and MathML parsers. <mi> a </mi> and
<mi>a</mi> are equivalent for a MathML parser but is not for a XML parser.

If I remember correctly also something as tree structure
<mrow><mi>a</mi></mrow> is treated like <mi>a</mi> for MathML parsers.

Apparently Gecko ignores this and then Mozilla guys carefully recommend
authors to eliminate extra mrows from code because parsing, memory, and
rendering troubles (difficult when assumed that MathML is not to be seen
by authors). Precisely something so simple as the fraction 1/n is
incorrectly generated by IteX tool and two extra <mrow> are generated to
MathML code.

> Could you cite an URL, please? I searched and I found a discussion
> where the XML-MAIDEN guy was against MathML in Opera and two Opera
> employees (Moose and csant) were in favor.
>
> http://my.opera.com/community/forums/topic.dml?id=36775

Both employes critiqued MathML many times in that and in other threads and
began a collaboration with George on rendering of mathematical formulae
via CSS in Opera browser

See also

[http://my.opera.com/community/forums/topic.dml?id=69327]

[http://my.opera.com/community/forums/topic.dml?id=132308]

[http://my.opera.com/community/forums/topic.dml?id=41533]

[http://my.opera.com/community/forums/topic.dml?id=39677]

[http://my.opera.com/community/forums/topic.dml?id=56685]

and related links. See also my reply to Anne van Kesteren below


> I admit I did not properly appreciate the page. I guess my bogometer
> was fooled by the domain name, a demo link leading to a Yahoo! error
> page and the mention of DTDs.
>
> Now that I took a closer look, it appears that the spec is hidden in
> comments in DTDs. XML-MAIDEN could really use a proper spec.

Please split two main topics of this discussion:

1) MathML is incompatible with HTML, CSS, and DOM and do not achieve none
of original goals. Support in browsers is weak because specification is
not solid.

In the same way that this group has generated interesting alternatives to
w3c specifications, it could also provide alternative to
presentation/structure of math.

2) We can generate alternative mathematical markup. What one? XML-MAIDEN
is one option, try to offer a HTML version of SGML 12083 math is other, a
mixture of both, et cetera. That is to be debated.

>> The look and feel are better that with MathML.
>
> I disagree. And I am on a Mac here and MathML rendering in the Mac
> builds of Gecko is known to suck.

I am not specialist on this. George can offer better replies, but so far
as I know there are problems with Mac (Firefox and Safari). It is a
problem with support for standard CSS properties, not a problem with
MAIDEN approach. Look pdfs. I have represented both MathML and CSS
mathematical objects in firefox 1.0 windows PC and both render equal.

>> Moreover, the MAIDEN markup can be transformed to TeX for printing
>> via TeX
>> engines whereas better CSS-based printed engines are not ready.
>
> I found the XSLT program but are there examples of results on the Web?

I do not understand you. You can easily translate

<fraction>
<num>a</num>
<den>2</den>
</fraction>

to \fract{a}{2} and next to print result with a TeX engine.

> I think that's a bug rather than a feature. The layout should not
> require a particular font, but the math layout engine should make   good
> use of font metrics at the client.

I fail to appreciate you at this point.

>> Fine tuning in the web can be achieved complementing the generic
>> XML-MAIDEN stylesheet with more rules for special cases or with fine
>> tuning CSS rules directly inserted in the document.
>
> I thought one shouldn't have to tweak CSS to get math to render
> properly. By default, the authoring experience should be on a similar
> level of "just works" as LaTeX with the default templates.

Yes I agree, but one still needs fine-tuning. In TeX approaches this is
achieved via special commands, attributes, macros. In CSS approach you
need just the same CSS rules one also uses on text. For example,
fine-tuning of indices in roots could be achieved via adjust of left and
right margins and vertical align in CSS; amstex introduce two special
commands for that with arguments in absolute units.

>> It is not difficult to achieve TeX output quality when using font
>> metrics.
>
> Any demos of that?

There are several formatters for MathML listed in official MathML software
using TeX-like formatting approach.

>> And, of course, it is close to impossibility that you can provide a
>> TeX
>> engine for the web
>
> IIRC, IBM tried, though.

And popularity is close to zero with related IBM projects as TeXML being
abandoned years ago.

>> In fact, one can see several authors providing TeX quality with SVG
>> and
>> even with HTML approaches when font metrics are known.
>
> Could you cite URLs, please? How were those documents produced? Was
> there some kind of LaTeX->pdfLaTeX->PDF->pdf2svg->SVG pipeline involved?

There are examples of MathML to SVG translators using TeX-like approach in
MathML official software page.

Regarding HTML, there is a jsMath approach using TeX rules, TeX fonts, and
images for rendering math on display with nice quality. I personally
dislike that approach.

PDF is unnecessary in both approaches. JsMath uses LaTeX syntax embedded
in HTML pages.

John Wiley & Sons, Inc., use families of TeX fonts for publication from
XML markup but does not use TeX engine.

>> The really difficult problem is to provide good typesetting quality
>> without rely on specific fonts; Knuth has not solved this still ;-)
>
> One can change fonts in TeX. The problem is that CM is hard to beat   in
> consistent glyph coverage.

Yes one can, but using TeX-friendly fonts, not any font I want. This is a
very difficult task when compared to implementation of TeX formatting
rules and TeX fonts metric.

CM was not designed for web display but for paper printing with
high-quality devices.

>> HTML Math could be incorporated in a few days because it an
>> incremental
>> implementation.
>
> Presumably, if it is so easy, in a few days we can download your
> custom build of Gecko or WebCore that implements a prototype, right?

You need full implementation of CSS 2.1 and some extensions from future
CSS 3 would be okay. Moreover, time and effort on implementation of
standards is recovered in other docs. Time spend in implementation of
MathML in a browser cannot be reused in a better support for textual or
graphical content.

After 10 years most of browsers ignore MathML because difficulties with
specification. Firefox offers native support of les than a half the
specification and cannot offer better support because MathML is
incompatible with previous rendering/layout engines and with CSS. There is
not connection between MathML and XSL-FO (George techniques have been
implemented in a FO engine in two weeks). There is not connection with CSS
group and last information I have is that CSS group will abandon the CSS
Math module because technical difficulties (incompatibilities) with
MathML.

Moreover, MathML is not completely DOM compatible, needs of namespaces,
and of xml and deal WS in a different way that rest of XML parsing
engines.

I continue thinking that HTML math can be implemented cheap and easy.
Final result would be a solid unified design, where mathematical markup
can be typed by hand if one want, math is accessible via standard
Javascript and DOM, and can be styled via CSS.

> Do you mean "Are more accessible with existing tools." or "Would
> theoretically be more amenable to accessibility tools that have not
> yet been developed."?

Both. An ALT attribute in math could be reused by current tools (aural
browsers). But we could research improvements in that way that could be
exploited by future tools.

>> Distler does lot of different claims. In the accessibility statement
>>
>> [http://golem.ph.utexas.edu/~distler/blog/accessibility.html]
>>
>> says
>>
>> ?Equations are written in MathML 2.0.?
>
> That's a statement of fact about what language is used. It is not an
> assertion about the accessibility properties of the language.

And automatically any novice reader believes that usage of MathML language
automatically generate accessible pages (is not that reason for remarking
the usage on MathML in the accessibility statement?), when code is being
served is poor than using GIF + ALT and, in some cases, poor that using
plain HTML.wit

>> Perfect p-MathML 2.0 code is not really accessible, but still poor,
>> MathML
>> code is being served by Distler is structurally invalid and based in
>> tricks. For example, he is using <mrow> and tricky collections of
>> <msubsup> for simulation of tensors. He does not use invisible
>> operators
>> introduced in MathML. He encodes prescripts as in TeX via empty groups
>> instead using <prescript> tag. This is odd!
>
> Not odd really, considering that his MathML is programmatically
> generated from iTeX source.

That is odd is final MathML code, not the method of generation of code,
which is of no interest for final users. That is also odd is the aura of
proclaiming that Musing is the “most technologically advanced blog”.
Really important are the results, never the technology used.

Using the self-proclaimed most advanced technology on the world, specific
plugins, specific mimes, specific xhtml + MathML DTD, specific
editor/conversor, specific input language (IteX), and specific fonts and
browser at the client side, string theorist Distler is serving very
obtuse, verbose, and inaccessible code in most cases is structurally
invalid.

Using Notepad, old HTML 3.2 without DTD, and a cheap geocities site I can
encode (ds)^2 correctly; the result is seeing without specific fonts not
browsers and can be indexed by Google. I think that people would think
about this.

> Mathematica, on the other hand, encodes data sufficient   for
> unambiguous input for symbolic math manipulation.

People from MathML and OpenMath will claim the contrary. In fact, OpenMath
started because problems with CAS.

> Of course, what everyone would like is something that looks as good   as
> LaTeX output but that can be pasted into Mathematica or Maple or
> somesuch and symbolically manipulated and evaluated. But that's a   hard
> problem.

Yes I know personally that problem :-(

But I believe can be solved. Key is on providing a superset of MathML and
TeX/LaTeX infosets.

> Adding math to HTML5 would broaden the scope of HTML5. A broader   scope
> translates to more editorial work and more implementation work.

More work will be needed for unify MathML with HTML 5 in browsers than
needed for implementing a new HTML Math in browsers.

Henri Sivonen wrote:

> I think the following could be technically feasible:
>   1) Author writes iTeX code as the text content of an <f> element
> for inline formulae (and <df> for display formulae; two elements to
> cut down on verbosity of attributes).

Yes, somewhat as pair span and div or like pair quote and blockquote is
better.


Anne van Kesteren wrote:

>> There is a lot of technical details in Opera browser developers site
>> on why they rejected *native* MathML support.
>
> As far as I know, working for Opera, we never made an official
> statement regarding MathML. I tried looking one up:

Well, I did not use the word “official”.

>    http://www.google.com/search?q=site%3Aopera.com+mathml
>
> ... didn't work out. Then again, I'm also unaware of an "Opera browser
> developers site" :-)

In opera.com I can read

[http://my.opera.com/community/forums/topic.dml?id=36775]

<blockquote>
Is Opera going to implement MathML any time soon?
</blockquote>

<blockquote>
MathML was discussed here many times, try to search forum. Answer is that
MathML will not be implemented in near future.
</blockquote>

I none of Opera software guys participating in discussion (e.g. Moose or
cesant) officially refuted the point.

Moose did some good points as

<blockquote>
So, this leaves us here:

1. We reject MathML, as it is not suitable for our needs
</blockquote>

and just next non opera software user binoculous

<blockquote>
There are lots of standards that aren't supported by Opera. Many aren't
even relevant to Opera. MathML is not what makes or breaks Opera's
standards support either.

Since MathML is a huge thing to add support for, since it is a return to
the old days of non-separated content and style, since it is useless to
most people, and since good alternatives already exist, Opera shouldn't
worry about supporting it the next few years.
</blockquote>

and again nobody from Opera software officially said contrary or refuted
the point. At contrary, just after both White Linx and Moose began a
discussion on what tags would be needed for doing math without MathML.
After two members from Opera Software discussed possibilities for
rendering of MathML in Opera via CSS, i.e. without need for native support
breaking the browser.

Latter Moose (Opera Software) adds

<blockquote>
Think how many people wanted to use MathML, wanted to style it, and
failed. We can help them, even though we know so many pitfalls and dangers
that we might be pressed to divert the people from using MathML.

We can give people CSS-styled MathML today. That does not preclude
research on better alternatives for MathML as a markup language, does it?
</blockquote>

That does not preclude research on better alternatives for MathML as a
markup language. This is my point for this group; try to do it better than
w3c did.

In

[http://my.opera.com/community/forums/topic.dml?id=132308]

jaks also from Opera Sofware said

<blockquote>
We never support stuff because it is on a list somewhere, we support it
because there is a real need. It isn't our goal, and has never been, to
support the most standards possible. However the standards we do support
we should support well.

I have some sympathy for caret navigation, but it doesn't make sense for
us right now. For MathML you might also want to have a look at existing
discussions. We have given priority to Web Forms 2 over XForms, Opera 9
has experimental support here.
</blockquote>

And White Link adds some more links and data here (Friday, 14. April 2006,
12:37:26).

Also interesting is the discussion about semantic web and why w3c has failed

[http://my.opera.com/community/forums/topic.dml?id=39677]

with bloated markups as MathML being completely unnacesible in the real
world. Old HTML 3.2 or GIF + ALT alternatives are more accessible to final
users in despite of many hype from w3c on their markups.

Finally moose (from Opera Software says)

<blockquote>
Yes. I'll finish my Torture page today, as I don't like unfinished
projects :smile:

I also have to continue the propagatioon of backend redesign on my site -
the author pages. Lots of work!

As for Math, I think we are converging to something. I think once we are
done with creation of a set of replicable MathML moosifization with CSS,
we should invest real and true effort in creating an XML alternative,
based on your work. And I mean, for real, not simply as an example...
</blockquote>

Now take a look to

[http://my.opera.com/community/forums/topic.dml?id=56685]

Again an Opera Sofware author says

<blockquote>
Yes, both MathML and client-side XSLT are better discussed in their
respective threads. There is no enthusiasm anywhere in Opera for XSL-FO on
the web, but there is no enthusiasm for XSL-FO in the web community
either, so that hasn't turned out to be any problem.
</blockquote>

and

<blockquote>
If people were happy with MathML and the advantages of supporting it
outweighted the disadvantages we might support it. Right now that spec is
on hold, where it has been for quite a few years.
</blockquote>


Now note following

[http://www.whatwg.org/]

<blockquote>
The position paper submitted by Opera and Mozilla represents the
fundamental principles upon which the WHAT working group intends to
operate.
</blockquote>

and the position paper says (My comments between { })

<blockquote>
The following seven principles represent what we believe to be the most
critical requirements for this work.

Backwards compatibility, clear migration path
    Web application technologies should be based on technologies authors
are familiar with, including HTML, CSS, DOM, and JavaScript.

{MathML is not compatible with rest of technologies not familiar with}

    Basic Web application features should be implementable using
behaviors, scripting, and style sheets in IE6 today so that authors
have a clear migration path. Any solution that cannot be used with the
current high-market-share user agent without the need for binary
plug-ins is highly unlikely to be successful.

{MathML needs plugin. Last CSS approach by George already works in MSIE}

Well-defined error handling
    Error handling in Web applications must be defined to a level of
detail where User Agents do not have to invent their own error
handling mechanisms or reverse engineer other User Agents'.

{Have you tried to interchange and render different MathML encodings from
different tools?}

Users should not be exposed to authoring errors
    Specifications must specify exact error recovery behaviour for each
possible error scenario. Error handling should for the most part be
defined in terms of graceful error recovery (as in CSS), rather than
obvious and
catastrophic failure (as in XML).

{is this another point in favour of HTML + CSS rather than MathML usage?}

Practical use
    Every feature that goes into the Web Applications specifications must
be justified by a practical use case. The reverse is not necessarily
true: every use case does not necessarily warrant a new feature.

{Mathematical markup is needed in research, economic, publishing, and
educative fields. Even K-12 mathematics is not completely covered by
MathML} {Wikipedia contains lot of educative articles with mathematics
being served as GIF. Would be not cheap if one can provide accessible,
searchable, and structural mathematical markup rendered via CSS (as rest
of text)?}

    Use cases should preferably be based on real sites where the authors
previously used a poor solution to work around the limitation.

{There is lot of information in Opera forums about this. Look also at
XML-MAIDEN project} {There are many other similar projects on the web} {A
pair of days ago an mathematician working in input syntax for mathematics
said me in a personal communication that he will not support Content
MathML because “is not way”}

{snip}

Device-specific profiling should be avoided
    Authors should be able to depend on the same features being
implemented in desktop and mobile versions of the same UA.

{How is this is achieved by MathML?} {How is this achieved when Gecko
rendering engine rely on specific fonts} {How can ultraverbose MathML code
be parser by mobile devices} {However a CSS based approach can be easily
implemented in both desktop and mobile}

Open process
    The Web has benefited from being developed in an open environment. Web
Applications will be core to the web, and its development should also
take place in the open. Mailing lists, archives and draft
specifications should continuously be visible to the public.

{Has not the MathML WG been critiqued by being considered immune to
external criticism?}

Answers to Questions on Web Applications

What functionality is needed for Web applications? What should a hosting
environment provide?

    More of the same. HTML, CSS, DOM, and JavaScript provide enough power
that Web developers have managed to base entire businesses on them.
What is required are extensions to these technologies

{MathML is incompatible with HTML, incompatible with CSS and DOM and not
very likely to Javascript parsers in practice}

to provide much-needed features such as:

        * Native pop-up menus and context menus.
        * Inline markup for pop-up windows, for example for dialog boxes
or tool palettes, so that dialogs need not be defined in separate
files.
        * Command updating: applications that have several access points
for the same feature, for instance a menu item and a tool-bar
button, would benefit from having to disable such commands only
once, instead of having to keep each access point synchronized
with the feature's availability at all times. Similarly menu items
or tool-bar buttons that represent a toggle state could
automatically stay synchronized whenever toggled.
        * Server-sent events: triggering DOM3 Events from the server-side,
for example for tickers or status updates.
        * Client-server communications methods that do not require page
loads, enabling on-demand data retrieval (where the UA
automatically fetches data from the server as required), remote
procedure calls (where script can invoke code on the server side
and get an XML fragment in return), etc.
        * More device-independent DOM events: The DOM event set needs
device-independent events, such as events that fire when a button
or link is activated, whether via the mouse or the keyboard.
DOMActivate is a start, but it lacks equivalent HTML attributes,
and additional events may be needed.
        * Richer widget set: the existing HTML controls are quite limited,
some controls for commonly used types such as date controls and
range controls would be useful.
        * Sortable and multicolumn tree views and list views with rich
formatting.
        * Ability to define custom widgets cleanly, for example using XBL
and APIs to query and control focus state, widget state, the
position and state of input devices, etc.
        * Rich text editing: an underlying architecture upon which
domain-specific editors can be created.
        * A predefined HTML editor based on the rich text editing
architecture.
        * Drag and drop APIs.
        * Text selection manipulation APIs.
        * Clipboard APIs (if the security and privacy concerns can be
addressed).
        * Flexible box model: The existing box model in CSS is designed
largely for documents rather than user interface. We need a new
box model designed for user interface which would relieve author
complaints about other aspects of CSS and also reduce the need for
tables for layout.

{Half a dozen of CSS new rules for Math (somewhat as planned CSS Math
module) and new tags e.g. copied/adapted from SGML ISO 12083 is all one
needs for encoding mathematics on the web.}

    Some less important features would be good to have as well:

        * Window-based state management (so that new windows don't
interfere with existing sessions), for example implemented as a
per-domain, per-window "file system". This would allow multiple
instances of the same application (from the same site) to run
without the instances overwriting each other's cookies.
        * Elements for semantics commonly found in applications, such as
<byline>, <footer>, <section>, <navigation>, etc.

{How will be unified with MathML when already nobody has unified with CSS
DOM or XSL-FO?}

        * Markup to denote mutually exclusive sections (as in the commonly
seen wizard interfaces).
        * An improved CSS object model, for example with better APIs for
animation, simpler ways to navigate the rendered content, a way to
find the position of an element, methods to list the elements
under a coordinate, etc.

{If already MathML is incompatible with CSS, how will animation and others
be implemented?}

{Etcetera}
</blockquote>


Juan R.

Center for CANONICAL |SCIENCE)






More information about the whatwg mailing list