[whatwg] Unsafe SGML minimizations
ian at hixie.ch
Fri Mar 10 15:57:04 PST 2006
On Thu, 8 Sep 2005, Henri Sivonen wrote:
> > >
> > > * <>
> > > * </>
> > Agreed. Those should generate comment nodes, I think.
> Opera, Firefox and Safari already interoperably handle <> as character
> data (equivalent to <>) and ignore </>.
> > > * tagc omission ie. <foo<bar>...</bar</foo>
> > Well we have to define what that does, and the most obvious error handling
> > behaviour here is to start the new tag. So effectively, I would say we
> > shoul have TAGC omission.
> But it would still be an error as far as a conformance checker is
> concerned, right?
That's what the spec says.
> > > * <foo/bar/
> > Agreed, sadly. That would be equivalent to something like <foo /bar/="">
> > (or something similar).
> I think the HTML5 spec should allow TagSoup to be updated for HTML5 or
> an equivalent of TagSoup for HTML5 to be written. TagSoup guarantees to
> the application that it acts as if it was an XML parser parsing XHTML.
> Therefore, XML and, by extension, the SAX2 API contract restrict the
> attribute names to legal XML attribute names. If HTML5 required "/bar/"
> to be reported as an attribute name, TagSoup would have to violate that
> constraint and could not claim conformance.
Well, <foo/bar/> gets parsed as <foo bar="">, but there are plenty of
other ways to get non-XML-well-formed output from an HTML5 stream. For
example, <foo \=> tokenises to a start tag token with an attribute "\".
I'm not convinced we don't want to do that.
But then the HTML parsing model already requires the parser to sometimes
actively go in and modify the DOM on the fly, so I don't think it's
possible to guarentee that it will look like an XML parser at all.
> > > * attribute name omission (except for the well-known "boolean
> > > attributes")
> > Again, we have to define error handling. <foo bar baz> will probably just
> > be equivalent to <foo bar="" baz="">.
> I have previously argued for <foo bar="bar" baz="baz"> in the
> TagSoup-like scenario, because that would be the same as the treatment
> required for the "boolean attributes".
Yeah, I still need to look at boolean attributes.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg