[whatwg] The problems with namespaces in text/html

Elliotte Harold elharo at metalab.unc.edu
Mon Nov 6 05:42:39 PST 2006

James Graham wrote:

> But everything to do with the actual reasons that people will choose one 
> system over another -- ease of authoring, attractiveness of final 
> output, ease of maintainance; /not/ welformedness of the final HTML. 
> Insisting on wellformed HTML can only make tools harder to use, because, 
> it requires all templates and all /content/ must be well formed. 
> Templates may be written by geeks but content is typically written by 
> users don't really understand well-formedness. Therefore avoiding error 
> messages that are read as "can't publish because the blah wibble foo on 
> line 80 frobulates the kaniki bar on line 71" is impossible (several 
> popular blogs that use XHTML have exactly this problem. Fortunately in 
> those cases the audience tend to be web geeks or string-theory geeks so 
> it's not a critical problem.). Needless to say systems which regularly 
> spew out such apparent nonsense will not be popular with their users.

All true, which is why properly written such systems should not spew out 
such apparent nonsense. It is easy to take the input from regular 
end-users who are not markup geeks or string theorists and transform it 
into proper, well-formed, even valid XHTML before republishing it. I was 
doing this years ago. It is trivial compared to some of the things 
WordPress is already doing.

Doing this does require a certain amount of markup-savvy in the part of 
implementers of content management and publishing systems. However, it 
is not unreasonable to expect these implementers to be markup geeks, or 
at the very least to be able to use excellent libraries like Tidy and 
TagSoup for this task.

Remember, almost all of the major systems are already refusing to accept 
arbitrary user input and raw HTML. Instead they:

1. Convert things like markdown to HTML.
2. Strip the HTML of potentially dangerous markup such as scripts.

I'm just suggesting that a third operation be added in which the HTML is 
passed through Tidy or TagSoup or equivalent. It's not like this is 
hard. Steps 1 and 2 are actually much more complicated and error prone.

Elliotte Rusty Harold  elharo at metalab.unc.edu
Java I/O 2nd Edition Just Published!

More information about the whatwg mailing list