[whatwg] Augmenting HTML parser to recognize new elements

Wed Jan 18 14:17:43 PST 2012

On Wed, Jan 18, 2012 at 2:00 PM, Adam Barth <w3c at adambarth.com> wrote:
> On Wed, Jan 18, 2012 at 1:55 PM, Dimitri Glazkov <dglazkov at chromium.org> wrote:
>> On Wed, Jan 18, 2012 at 1:47 PM, Adam Barth <w3c at adambarth.com> wrote:
>>> On Wed, Jan 18, 2012 at 1:29 PM, Dimitri Glazkov <dglazkov at chromium.org> wrote:
>>>> On Wed, Jan 18, 2012 at 1:14 PM, Dimitri Glazkov <dglazkov at chromium.org> wrote:
>>>>> Ah, that's a good question. This also must be specified. It should
>>>>> depend on the parent of the <content> element. If the parent is shadow
>>>>> root or <table>, then it should make <tr> the child of <content>.
>>>>> Otherwise, it should use foster parenting as usual.
>>>>
>>>> Oops, not "foster parenting", but "ignore" as you mentioned. Still
>>>> getting through the details of the parsing spec.
>>>
>>> There's also some subtly w.r.t. the pending character tokens.
>>>
>>> More generally, I think we'd all be much more sane if the HTML parsing
>>> algorithm was specified in the HTML living standard rather than
>>> modified ad-hoc in a number of different documents.
>>
>> That makes sense, but how will we handle the fact that the elements in
>> the algorithm aren't part of the HTML specification?
>
> Through the magic of legacy support, that's already the case today!
> (I'm looking at you <xmp>.)
>
> The parsing algorithm just says how to construct a DOM.  You can have
> all sorts of crazy futuristic/obsolete elements in the DOM.

This sounds bewildering yet encouraging. Should I just attempt writing
a patch against the spec and ask Hixie to review it?

:DG<

>
> Adam
>
>
>>>>> On Wed, Jan 18, 2012 at 10:58 AM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>>>>>> What if content wrapped elements ignored by the parser. e.g.
>>>>>> <content><tr>hi</tr></content>
>>>>>>
>>>>>> What should the content element include in that case?
>>>>>>
>>>>>> - Ryosuke
>>>>>>
>>>>>> On Jan 18, 2012 10:19 AM, "Dimitri Glazkov" <dglazkov at chromium.org> wrote:
>>>>>>>
>>>>>>> 'sup, Whatwg!
>>>>>>>
>>>>>>> The new HTML elements in the shadow DOM spec
>>>>>>> (http://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/shadow/index.html)
>>>>>>> and the nascent HTML templates spec (see it all explained here:
>>>>>>> http://dvcs.w3.org/hg/webcomponents/raw-file/tip/explainer/index.html)
>>>>>>> require tweaking of the HTML parsing behavior -- mostly the tree
>>>>>>> construction bits.
>>>>>>>
>>>>>>> A typical example would be specifying an insertion point (that's
>>>>>>> <content> element) as child of a <table>:
>>>>>>>
>>>>>>> <table>
>>>>>>>    <content>
>>>>>>>        <tr>
>>>>>>>            ...
>>>>>>>        </tr>
>>>>>>>    </content>
>>>>>>> </table>
>>>>>>>
>>>>>>> Both <shadow> and <template> elements have similar use cases.
>>>>>>>
>>>>>>> What would be the sane way to document such changes to the HTML parser
>>>>>>> behavior? A list of modifications to tree construction modes in each
>>>>>>> respective spec? Some "generic insertion point element" clause in the
>>>>>>> HTML spec? Give me ideas.
>>>>>>>
>>>>>>> Also -- what are the side effects of such a change? Surely, there's
>>>>>>> something I am not thinking of.
>>>>>>>
>>>>>>> :DG<