[whatwg] [html5] tags, elements and generated DOM

Ian Hickson ian at hixie.ch
Thu Feb 23 18:51:34 PST 2006

On Wed, 6 Apr 2005, Lachlan Hunt wrote:
> Validators should not be non-conformant simply because they only do 
> their job to validate a document and nothing else.

Validators are not conformant conformance checkers. They are validators, 
which is a subset of conformance checking. You could implement other 
subsets, there's nothing particularily magical about validators that make 
them any more appropriate than other incomplete conformance checkers.

> I don't see any reason why such a statement needs to be included at all.

Because people rely on validators in a way that implies they think they 
are getting a complete conformance check.

> If OpenSP was non-conformant, then any current or future UA that is 
> built with OpenSP as the parser would be non-conformant also, which 
> should not be the case.

It is the case. HTML5 is not an SGML application. Any UA based on an SGML 
parser would not be able to handle today's existing Web content.

> What the?  I disagree with that.  HTML should remain an application of 
> SGML, and browser's should be built to conform properly.  Aside from the 
> unimplemented SHORTTAG features (which can be turned off in the DTD 
> anyway) and the mostly undefined error handling, what about HTML 5 will 
> be so incompatible with SGML to warrant such a decision?

Why do you discard the error handling so lightly? It is critical. To a 
rough approximation, all the content on the Web is errorneous, invalid, or 
non-conformant. We have to handle it.

> > > Also, while on the topic of handling invalid documents, is this spec 
> > > going to attempt to address the <x><y></x></y> problem?
> > 
> > Probably not, as there is no generally accepted solution. In fact 
> > there is no known solution (to my knowledge) that is entirely 
> > satisfactory.
> Agreed, since no existing browser I know of handles it in the most 
> logical and, IMHO, the most correct way (ie. when a parent element is 
> closed, all unclosed children elements should be closed and not reopened 
> after); and no two browsers that I know of create completely compatible 
> DOMs with any other method.

We've addressed it anyway, in a way that manages to remain closely 
compatible with Safari and Mozilla, and mostly compatible with IE and 

> Anne van Kesteren wrote:
> > Jim Ley wrote:
> > 
> > > This is clearly an example of how existing browsers are 
> > > non-conformant,
> > 
> > Doing otherwise would result in a lot of broken pagges
> Those pages are already broken.  Authors just don't know it because the 
> browsers are even more broken by being forced to deal with them.

More importantly, users don't know it. Changing the behaviour of the UA 
would make them consider that UA broken. That would prevent that UA from 
getting any market share, and would make the spec irrelevant as all UAs 
would end up ignoring it.

> > and probably less market share for the browser.
> I thought this was about standardisation, not some marketing gimmick for 
> brower vendors!

There's no point having standards if nobody uses them. The browsers won't 
use them if they would end up losing customers as a result.

> Documents that contain </ within script and style elements, that are not 
> </script> and </style> respectively (or the SHORTTAG version </>) are 
> broken.

Not as far as the user can see -- they work fine.

> I see no problem with defining error handling for broken documents, but 
> no need to break conformance with SGML in the process.

The error handling must be compatible with existing content. Existing 
content is incompatible with SGML.

> HTML is an application of SGML, regardless of all the broken 
> implementations and documents we currently have, and I don't want to see 
> that changed.

HTML was only an application of SGML from versions 2 to 4. The original 
version, the second version (HTML+), and the new version (HTML5) are not.

On Wed, 6 Apr 2005, Lachlan Hunt wrote:
> In the note in that section [1]:
> | Conformance checkers that only perform validation are non-conformant,
> In fact, now that I've read it again, it seems rather contradictory. Just
> before the note, it states:
> | Conformance checkers are exempt from detecting errors that require
> | interpretation of the author's intent (for example, while a document
> | is non-conformant if the content of a blockquote element is not a
> | quote, conformance checkers do not have to check that blockquote
> | elements only contain quoted material).
> I would argue that conformance requirements that cannot be expressed by 
> a DTD *are* constraints that require interpretation by the author. 

This is clearly not the case, since it is trivial to show 
machine-checkable constraints that DTDs or Schemas can't express. (The 
exact range of such constraints varies based on the DTD or Schema.)

On Wed, 6 Apr 2005, Lachlan Hunt wrote:
> > A conformance checker might do things validators do too, but that 
> > doesn't make it one.
> I belive such conformance checkers are often called lints and they are 
> usually not true validators, despite what many claim, so you are correct 
> in that a conformance checker may not be a validator.  But, from what I 
> understand of the wording in the spec, a validator is a form of 
> conformance checker. Basically, metaphorically speaking, it's like a 
> square is a rectangle, but a rectangle is not always a square.

A validator is an incomplete conformance checker. It's like one side of 
the rectangle.

Lints are tools that check for things that the specification does not 
require, generally; they are not conformance checkers.

> Yes, but "Conformance checkers that only perform validation" are, unless 
> I am mistaken, validators.

Correct. They are incomplete conformance checkers.

On Thu, 7 Apr 2005, Lachlan Hunt wrote:
> Olav Junker Kjær wrote:
> > There are three types of conformance criteria:
> > (1) Criteria that can be expressed in a DTD
> > (2) Criteria that cannot be expressed by a DTD, but can still be checked by
> > a machine.
> Such as...?

The syntax of the datetime attribute, for example.

> > (3) Criteria that can only be checked by a human.
> > 
> > A conformance checker must check (1) and (2). A simple validator which only
> > checks (1) is therefore not conformant.
> Which is exactly what I'm complaining about.  A user agent *must not* be 
> automatically non-conformant for doing it's job correctly!!!

A user agent whose intent is to only implement half the specification may 
do its job correctly, but that doesn't make it a conformant 
implementation. It's no different whether the UA is a conformance checker 
or a Web browser.

On Thu, 7 Apr 2005, Lachlan Hunt wrote:
> If every conformance checker has to implement their own [DTD], there's 
> more chance they some of them will make mistakes, and each end up with 
> differing DOCTYPES. If that happens, then chances are each validator 
> would give differing results, which is even more confusing and would 
> result in no-one validating at all!

The same could be applied to Web browsers. History has shown again and 
again that such competition raises the quality of all involved, however, 
rather than lowering it and killing the market as you suggest.

> I wouldn't bother including character entity references in HTML 5, their 
> use should be deprecated, although UAs should be advised to support the 
> HTML4 entities for bugwards compatibility.

They are included in HTML5, but excluded in XHTML5.

> There is no need to make HTML 5 no longer an SGML application.  The only
> reason one might consider it to not be is due to broken documents, which
> should be fixed

There are billions of such documents. Are you volunteering?

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list