[whatwg] Mathematics in HTML5

Thu Jun 8 11:15:03 PDT 2006

Some people has prompted to reuse LaTeX.

People who want reuse LaTeX can “do” it, in the same way that one can
reuse existing jsMath. However, mixing of two different languages is
usually considered to be a bad practice. For example x < 5 is okay in TeX
but prohibited in XML. A = 3$ & B = 5% is valid XML but is not in TeX,
etcetera.

Moreover to problems with entities and reserved characters, (unicode vs.
7-bit encoding) and other related difficulties; there are technical points
forcing to reject which as has been done during last decades by *all*
markup or computer language except TeX-like ones, of course. The fact
TeX/LaTeX is not suitable for web did that the LaTeX guru D. Carlisle was
interested in mathematical language and joined to the w3c. Carlisle knows
very well LaTeX and extensions he has done. He could solve your doubts
better I can.

There has been multiple attempts to extend or adapt TeX to the web
(extended TeX approaches, TeXML...), all attempts failed by one or other
reason. There has been also attempts to display math in the web using web
browsers natively understanding TeX source (e.g. IBM launched one of this
kind) but also failed by technical motives.

I have not time for recopilating all information in the topic of “why not
TeX/LaTeX”. However, I can cite some recent links of interest

http://www.w3.org/Arena/tour/math1.html

http://www.mathmlcentral.com/history.html

http://my.opera.com/White%20Lynx/blog/show.dml/256124

http://www.w3.org/Math/mathml-faq.html

It would be also noted that next LaTeX will change for adapat to the
SGML/XML world. For example current research work for future LaTeX3
promise us a new syntax for further SGML world compatibility and will
incorporate novel feature such as native support for the
SGML concepts of “entity”, “attribute” and “short reference”. The new
LaTeX3 will incorporate style-sheet concepts such as those we used today
with HTML.

I also would note others limitations of TeX and related systems apparently
ignored at this mailing list.

Jonathan Fine adds:

<blockquote>
Plain TeX, LaTeX and ConTeXT all use the familiar `backslash and braces'
input syntax. This can cause problems, because it is not rigorous.
Translation to HTML for example, requires that the source document be
parsed. But LATEX for example is in general the only program that can
successfully parse LATEX documents. This tends to result in (La)TEX living
in a world of its own, isolated from the world of desktop publishing and
word processing. For some communities of users, such as mathematicians,
this may not be a hardship.
</blockquote>

But is a hardship for rest of the world and reason that Active TeX was
proposed.

In fact, LaTeX is much more complex that kind of approach is being
proposed here. Rahtz described LaTeX as

<blockquote>
hugely powerful, but chaotic, and on the verge of becoming unmanageable.
</blockquote>

When people is talking of LaTeX here I suspect them are talking only of
some basic LaTeX constructs as \frac, \vec, and others.

I would be glad if anyone explains here how a root beta of k can be
encoded in LaTeX and next fine-tuned e.g. moving index 2 units to the left
and 4 units up or how would we encode the hat of large base as ABCD or the
four dotting of Q or what are TeX/LaTeX constructs for

  ∑’ H
  0<i<m

or

 a
 X and X
       b

Mihai Sucan wrote:

>>> I'd be interested of your Canon (Markup Language).
>>
>> Thanks! M is for Meta, because the language is also a formal language
>> :-)
>>
>> Please copy anything of interest and report me errors or best ways to
>> do things.
>
> I would like to see a specification of CanonML and working examples with
>   an experimental implementation. Your site provides only talk about
> CanonML. Is it too early to ask for this?
>
> If you have some, send it over to my (private email).

CanonML is in an early research stage. There is not formal specification
still because is a research in progress. The canonical science blog
includes history of the program and an outline of current ideas. More
recent post update previous postings. The research has proved to be more
variable I wait in a principle. I initially copied many aspects from
available TeX, SGML/XML, content MathML, and OpenMath, but after changed
by better options. Recently I have eliminated the concpet of “entities”
(which I have discored is also an idea proposed by some XML gurus as Tim
Bray).

Variablity of research may be understood since I am trying to offer more
power than SGML/XML maintaining the full language more easy than TeX/LaTeX
and that is really very difficult. However I wait to finalize the work in
a few weeks and just prepare first formal specification for debate.

Ok I will send you.

>>> Math WebSearch - A semantic search engine
>>> http://search.mathweb.org/
>>> http://kwarc.eecs.iu-bremen.de/software/mmlsearch/
>>>
>>> I'm not sure if searching math is entirely a myth. This is a recent
>>> guided   research project done by a student of Dr. Kohlhase.
>>
>> I was referring to MathML. Somewhat as MathML is not very popular at
>> the browser side it is not popular at the search engine side.
>
> Maybe you didn't look into the site careful enough. I was also referring
>   to MathML.

Sorry, I did mean “I was referring to presentation MathML. Somewhat as
presentation MathML is not very popular at the browser side it is not
popular at the search engine side.”

> MathWebSearch is an entire application which deals with indexing Content
>   MathML and allows users to input Content MathML code as search query.
>  Therefore, that's not something different. As far as I know, the
> semantic   search engine can be extended to any XML format (for indexing
> and for   search query input). The guy actually added support for
> OpenMath search   queries. More technical details are available in the
> page.

This approach is highly limited since is based in two specifications with
problems. I have discussed those topics with specialists from a popular
search engine (for a project here at the Center) and recommend me do not
follow that way.

I can try

  <apply>
   <eq></eq>
   <ci>E</ci>
   <apply>
    <times></times>
    <ci>m</ci>
    <apply>
     <power></power>
     <ci>c</ci>
     <cn type='integer'>2</cn>
    </apply>
   </apply>
  </apply>

in the engine and I receive “Your search returned no results!” The same
using other versions. However go to Google and try E=mc^2 or even E=mc2.

Content MathML only can encode simple formulae, has not browser support
(is less browser friendly that presentation MathML) and minimal number of
authoring tools. Moreover even at very simple formulae it offers problems
because we do not know as formulae will be encode. For example I can
suspect how a mathematician or physicist can encode a+b using text. Most
(all?) of people would encode it as $a+b$ in raw TeX and I could search
for “a+b” in a hypotetical TeX-like search engine. But what would I type
in the MathML search engine you cited?

Michel Fortin wrote:
> I would also use <f> instead of <formula> (as Juan used in one of his
> example), because it's shorter and fits well with many other wildly
> used container elements: <p>, <h1>-<h6>, <ol>, <ul>, <li>, <dl>,   <dt>,
> and <dd>.

Or maybe tag names would follow some XHTML conventions:

quote    <--->   blockquote

code     <--->   blockcode

formula  <--->   blockformula

for a better transition from HTML users to HTML-Math. But some people
would prefer HTML5 way

q   <--->   blockquote

f   <--->   blockformula

and others a TeX way

$   <--->   $$

f   <--->   ff

on any case this part of debate would be posponed until final. Once
elements, attributes and content model were known, we can debate about
names.

Alexey Feldgendler wrote:

> Why have <f> at all? When I'm writing about <var>x</var>, why should I
> write <f><var>x</var></f>? What would be the difference? I think a
> <formula> element is only needed for what is called "display equations"
> --   they are rendered out of line, usually centered, and sometimes
> numbered.
>
> That way, inline math would require no special element at all -- just
> write math in the middle of a sentence, and it should work. On the other
>   hand, when math is put inside a <formula>, it's displayed on a line by
>   itself, centered, numbered etc. And, by the way, one can actually have
>   just plain text inside a formula, such as some statement in prose that
>   needs to be centered and numbered like other formulae.

I prefer maintain special tag for inline formulae somewhat as there exist
a $...$ construct in TeX or somewhat like there is an inline quote element
in HTML. Moreover you can still write <var>x<var> in textual mode.

Juan R.

Center for CANONICAL |SCIENCE)