[whatwg] The problems with namespaces in text/html
hsivonen at iki.fi
Sat Nov 4 17:14:51 PST 2006
On Nov 5, 2006, at 01:19, Elliotte Harold wrote:
> Henri Sivonen wrote:
>> Anne is talking about the text/html serialization, which is
>> supposed to be parsed using an HTML5 parser. It is a special-
>> purpose alternative serialization for a subset of possible
>> infosets--like RELAX NG Compact Syntax. Please ignore the
>> superficial syntactic similarity to XML 1.0.
> Does that subset include MathML?
Not yet. Whether it should is what is being discussed.
> However if the plan is to mix in entire additional languages, then
> I think this is driving off a cliff. MathML and MathML tools are
> designed under the assumption that they can rely on well-formedness
> and namespaces. Integrating MathML with HTML absolutely needs this.
You wouldn't be able to feed MathML-enabled HTML5 to MathML tools
that use an XML parser. You'd either have to use an HTML5 to XHTML5
converter for creating an intermediate XML 1.0 serialization that can
be fed to an XML parser or you could optimize away the serialization
and plug an HTML5 parser into the XML processing pipeline the way
TagSoup is used.
> It sounds to me like the working group is considering the needs of
> thick-client web browsers
The WHAT WG is very much biased towards Gecko, Presto, WebKit and
Trident as the consumers of documents.
> and people hand-authoring HTML in text editors to the complete
> exclusion of every other community and use case.
Personally, I think MathML is so hopelessly verbose for hand
authoring that this really shouldn't be about enabling hand authoring
MathML-in-HTML5 but about enabling MathML-in-HTML5 (perhaps generated
by a future version of itex2mml or similar) to be served through
content management systems that are not built around a SAX pipeline
or an XML tree API or XSLT but are built as tag soup systems and
simply cannot guarantee well-formedness. I mean systems like
WordPress and MovableType.
> Please prove me wrong. If it's not true that you're planning on
> sending mixed HTML and MathML documents on the wire without
> namespaces or perhaps even well-formedness, then please say so; but
> so far I'm not hearing anyone deny that.
A conforming HTML5 byte stream is *never* a well-formed XML 1.0 byte
stream. However, for every *conforming* HTML5 byte stream there
should (in my opinion) exist (in the mathematical sense of existence)
an XML 1.0 byte stream that parses into the same infoset. (If this is
not the case, I consider it a spec bug that needs to be fixed.)
So far what has been suggested is that the MathML elements parsed out
of an HTML5 byte stream would be in the MathML namespace in the infoset.
hsivonen at iki.fi
More information about the whatwg