[whatwg] [html5] tags, elements and generated DOM

Ian Hickson ian at hixie.ch
Fri Feb 24 14:57:33 PST 2006


On Wed, 6 Apr 2005, Olav Junker Kjær wrote:
> 
> An innocent question (no flamewar intended): What is the benefit of 
> having HTML defined as an application of SGML ?

You could use SGML tools with it, including well-established validator 
tools; the parsing model (for compliant documents) is very clear; SGML has 
a lot of abbreviation syntaxes that make it quick to write markup, it 
means we're not reinventing the wheel.

Unfortunately, in practice, nobody uses SGML tools, validators are unable 
to catch a number of important (computer-checkable) conformance problems, 
the parsing model doesn't handle non-compliant documents and the majority 
of documents are non-compliant, the abbreviation syntaxes are extremely 
complicated and largely unimplemented, and incompatible with existing 
content, and the wheel was already reinvented.


On Wed, 6 Apr 2005, Olav Junker Kjær wrote:
> 
> The problem is that validators use the term "valid" in a very limited 
> sense, but web authors without a through understanding of DTD-validation 
> would naturally assume that "valid" would mean "valid according to the 
> spec".

Indeed; the term "valid" in an XML/SGML context is used to mean a specific 
subset of "conformant", but most users don't know this and assume it means 
"fully conformant".

I've tried to work around this in the spec.


On Wed, 6 Apr 2005, Olav Junker Kjær wrote:
> 
> There are three types of conformance criteria:
> (1) Criteria that can be expressed in a DTD
> (2) Criteria that cannot be expressed by a DTD, but can still be checked by a
> machine.
> (3) Criteria that can only be checked by a human.
> 
> A conformance checker must check (1) and (2). A simple validator which only
> checks (1) is therefore not conformant.

I've put this in the spec, I hope that's ok.


On Thu, 7 Apr 2005, [ISO-8859-1] Olav Junker Kjær wrote:
> 
> A DTD or schema in the spec would be redundant anyway, since it would 
> only echo what is described in prose.

Indeed.


> DTD validation would be almost useless in the case of WF2, except 
> perhaps for catching spelling errors in attribute names. A schema in a 
> sufficiently expressive language would go along way, though.

For WF2 it may be far enough, I'm not sure. For HTML5 I'm pretty sure no 
Schema language (short of a turing-complete one) is expressive enough.


> I notice that <input type="text" src="some url" checked="true"> is valid
> according to the schema for XHTML.

Indeed.

It'll probably be conformant in HTML5 as well, to be honest, because you 
might want to set things up for a dynamic change of |type|. I don't know 
where to draw the line there. (Similarly; should empty paragraphs be 
conformant? I often use empty paragraphs as somewhere to later fill in 
some text.)


> Actually I think it would be beneficial for interoperability and perhaps 
> discovery of weaknesses in the spec, if several schemas were developed 
> by independent parties during the call for implementation.

Absolutely.


On Thu, 7 Apr 2005, [ISO-8859-1] Olav Junker Kjær wrote:
> 
> Actually, the HTML element has a (deprecated!) version attribute, which 
> could be used for this purpose. I agree it feels cleaner than using the 
> doctype syntax.

It's not clear to me what the purpose would be.


> OTOH authors are going to use doctypes for the forseeable future anyway, 
> since they want to trigger standards compliant mode in browsers, so we 
> might as well put the doctype to some use.

What use?


On Thu, 7 Apr 2005, [ISO-8859-1] Olav Junker Kjær wrote:
> 
> A conformance checker is a rubber stamp. Therefore its quite important 
> that a conformance checker actually checks conformance to the spec, 
> otherwise it is snake oil.

Hear hear!


> As HTML applications becomes more complex it becomes more important that 
> the markup and code is correct, but DTD-validation becomes even less 
> sufficient to catch errors. A basic validity error like forgetting to 
> close an <b>-tag will not cause the page to stop working. However, a 
> syntax error in the initial value of a date control *will* cause the 
> page to stop working as intended.

Indeed.


> > now I realise it's to the advantage of existing browser manufacturers 
> > to rubber stamp complicated heuristic behaviour they've already solved 
> > into a spec (it prevents new entrants from coming along)  but how is 
> > it to the advantage to the rest of us - understanding specifications 
> > becomes harder and harder and relies on the fact that we knew what 
> > happened before...
> 
> If you are referring to the paragraph about parse errors in 
> <http://whatwg.org/specs/web-forms/current-work/#handling> I tend to 
> agree with you.

In HTML5 there is less and less that is left up to reverse engineering. 
Hopefully that addresses your concern; I hope to continue in this 
direction to the point where eventually maybe there will not be any need 
for reverse engineering at all.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


More information about the whatwg mailing list