[whatwg] Custom elements and attributes
Henri Sivonen
hsivonen at iki.fi
Tue Oct 31 03:46:15 PST 2006
On Oct 31, 2006, at 01:03, Øistein E. Andersen wrote:
> On 23 Oct 2006, at 12:43PM, Henri Sivonen wrote:
>
>> Using custom schemas with the HTML parser is for experts only
>> and produces very wrong results unless the schema is suitable.
>
> Indeed so, but then any tool can potentially be misused.
> Still, I do realise that this is not a priority, of course.
It isn't about me being worried about misuse. Rather, I have not
taken steps to prevent users of custom schemas from shooting
themselves in the foot. (Taking those steps would involve a non-
trivial amount of work.)
There are no gotchas with using a custom schema with the XML parser.
There are also no gotchas in making a copy of one of the schemas that
the service offers for use with the HTML parser and adding custom
*attributes*, except the attributes have to be legal in XML also,
constrained to ASCII, written in the schema in lower case and must
not collide with case-folded or boolean attributes on other HTML
elements.
If you add custom *elements* and use the HTML parser, the system does
not ensure that the custom elements would not adversely interact with
tag inference or error handling in browsers. That is, the schema
might validate a tree, but there's no guarantee that you'd get the
same tree in a browser. If you add custom elements, you just have to
know what you are doing in order to keep the results useful for the
purpose of authoring for browsers.
But in any case, using a custom schema is no longer checking HTML5
conformance but checking your private dialect.
>> personally I am not at all sympathetic to extending HTML5 with
>> names that
>> contain non-ASCII (due to case folding issues),
>
> It might be interesting to see how current browsers handle element
> names
> containing such characters:
> The current draft seems to describe Firefox's behaviour on this point.
Which is good for security, since Unicode case folding involves
security issues similar to non-shortest forms in UTF-8.
>> non-XML characters (due to XML serializability issues)
>
> Which are those characters? Do you mean <, >, ", ' and &?
I mean characters that do not match the production named Char in XML
1.0.
http://www.w3.org/TR/REC-xml/#NT-Char
For example, \0, form feed and U+FFFF are non-XML characters.
Of course, the production is rather arbitrary, but XML 1.0 is written
in stone.
Actually, I should have said that the minimum condition that I think
is necessary for a name of a custom attribute or element to be
reasonable is that the name matches the NCName production from
Namespaces in XML 1.0 and only contains characters from the Basic
Latin (ASCII) block.
http://www.w3.org/TR/REC-xml-names/#NT-NCName
The NCName production is arbitrary, too, but, again, Namespaces in
XML 1.0 is written in stone.
>> Any attribute or element not specifically allowed in the spec is
>> non-conforming.
>> Therefore, all "custom attributes" and "custom elements" are non-
>> conforming.
>
> Custom attributes are (I believe, though I do not have any
> statistics to support this) quite common in the wild
I don't know how common they are.
> and can certainly be useful in combination with
> scripting. Furthermore, current browsers handle custom attributes
> effortlessly.
On these points, I agree.
> I therefore find it unfortunate that custom attributes are not
> allowed in a
> conforming HTML5 document.
It does not necessarily follow that custom attributes have to be
conforming. The alternative is that advanced scripters make an
informed decision not to conform in a harmless way at a particular
point.
Not that I like designing specs to be violated in an informed way,
but the alternative is not that elegant, either.
> Still, allowing /any/ attribute name would of course
> make it impossible to add new attributes later on (HTML6?);
Another problem is that making a conformance checker silently pass
unknown attributes would also make it useless in catching typos in
attribute names.
> that is why I
> propose explicitly to reserve attribute names starting with
> "x-" (inspired by
> codes for custom languages, but any prefix would be fine) for use by
> authors and to make documents containing custom attributes of this
> form fully
> conforming.
That could work. In my case, I could put a filter between the parser
and the rest of the conformance checking back end and drop "x-"
attributes. It would probably cause the addition of one more checkbox
in the UI, though.
However, I'd expect XML folks to scream, because their wildcard
tooling is tuned for unknown namespaces rather than magic prefixes
within the local name.
> Ideally, I would like the same principle to apply for element
> names; such
> elements should probably be parsed as phrasing elements and be
> allowed to
> contain strictly inline-level content only to be conforming.
Given the off-the-shelf technologies that I have chosen for the
conformance checker, I don't see an *elegant* way to implement that.
I do see an inelegant way, though, but it would produce confusing
error messages unless fixed with even more inelegance. (See point
about XML tooling above.) Of course, it doesn't follow that the spec
couldn't go there.
--
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
More information about the whatwg
mailing list