[whatwg] DOM Range Deletions

Aryeh Gregor Simetrical+w3c at gmail.com
Wed Jul 27 13:47:43 PDT 2011


(answering some old feedback on DOM Range that Hixie pointed me to)

On Tue, Jun 15, 2010 at 6:52 AM, Andrew Oakley <andrew at ado.is-a-geek.net> wrote:
> I've been trying to implement DOM Range but can't work out how ranges
> are supposed to work under mutation.

This should now be more or less fully defined in the DOM Range spec,
with a pretty decent test suite:

http://html5.org/specs/dom-range.html#range-behavior-under-document-mutation

I wrote up the new definitions a couple of months ago.  They aim to be
both precise and compatible with browser behavior, which means they
mostly match DOM 2 Range but differ in some respects.

> In the following examples I use *this* to indicate a range being deleted
> and slashes to indicate another range.
>
>
> Section 2.6 - Deleting Content with a Range gives the example of
>
> <FOO>X*Y<BAR>Z*W</BAR>Q</FOO> -> <FOO>X^<BAR>W</BAR>Q</FOO>
>
>
> Section 2.12.2 - Deletions says:
>
> "If a boundary-point of the original Range is within the content being
> deleted, then after the deletion it will be at the same position as the
> resulting boundary-point of the (now collapsed) Range used to delete the
> contents."

This is not what browsers do, and not what the new DOM Range spec
requires.  DOM 2 Range treats deletions as deletions of ranges, but
browsers and DOM Range both treat deletions as node-by-node.
deleteContents() specially modifies the range you call it on so that
it's always collapsed, as is defined in detail:

http://html5.org/specs/dom-range.html#dom-range-deletecontents

Note how the last step is "Set the context object's start and end to
(new node, new offset)", so the range you call the method on is
changed differently from other ranges.

If you have a range <FOO>X[Y<BAR>Z]W</BAR>Q</FOO> (using [] to denote
the endpoints), then the algorithm works as follows:

* "If original start node is an ancestor container of original end
node, set new node to original start node and new offset to original
start offset."  Original start node here is the Text node "XY", and
original end node is the Text node "ZW".  The former is neither equal
to nor an ancestor of the latter, so this doesn't apply, and we go to
the other branch.

* "Let reference node equal original start node."  So reference node
is now the Text node "XY".

* "While reference node's parent is not null and is not an ancestor
container of original end node, set reference node to its parent."
Reference node's parent is <FOO>, which is not null, but is an
ancestor container of original end node.  Thus we do nothing in this
step.

* "Set new node to the parent of reference node, and new offset to one
plus the index of reference node."  Thus new node is <FOO>, and new
offset is 1.

So the Range you delete will eventually collapse to
<FOO>X{}<BAR>W</BAR>Q</FOO>.  Note that here I use curly braces
instead of brackets, to indicate that the endpoint of the Range is in
an Element node, not a Text node.  The old DOM 2 Range standard is
unclear on that point, but my spec matches what browsers do.

> We then have the example of:
>
> <P>ABCD *efgh The <EM>R*ange</EM> ijkl</P>
>              /            \
>
> Goes to
>
> <P>ABCD <EM>ange</EM> ijkl</P>
>           /    \

In the syntax I'm using, that's: <P>ABCD [efgh T[[he <EM>R]ange]]</EM>
ijkl</P>, where I use single brackets for the range being deleted and
double brackets for the other, for lack of better syntax.

The new specification uses entirely different rules when the Range
being deleted is different from the one being modified, as I noted.
The deletion is treated as a sequence of separate mutations of
individual nodes.  In this case, deleteContents() will do the
following:

1) Call deleteData() on the Text node "ABCD efgh The ", with offset 5
and count 9.  This deletes "efgh The " and leaves only "ABCD ".
Current DOM Core defines this as replacing data with offset 5, count
9, and data "", so we look at the "When something replaces data of a
CharacterData node" case at
<http://html5.org/specs/dom-range.html#range-behavior-under-document-mutation>.

The first boundary point of the [[ range has offset 11, and 5 < 11 <=
5 + 9, so we hit the case "For every boundary point whose node is
node, and whose offset is greater than offset but less than or equal
to offset plus count, set its offset to offset."  Thus the offset is
set to offset, i.e., 5.  This gives us:

<P>ABCD [[<EM>Range]]</EM> ijkl</P>

2) Call deleteData() on the Text node "Range", with offset 0 and count
1.  This deletes "R" and leaves "ange".  We're replacing data with
offset 0, count 1, and data "", and the second boundary point of the
]] range has offset 5, and 5 > 0 + 1, so we hit the case "For every
boundary point whose node is node, and whose offset is greater than
offset plus count, add the length of data to its offset, then subtract
count from it."  The length of data is 0 and count is 1, so we set the
new offset to 5 + 0 - 1 = 4.  This gives us:

<P>ABCD [[<EM>ange]]</EM> ijkl</P>

The example in DOM 2 Range implies something more like <P>ABCD
<EM>[[ange]]</EM> ijkl</P>.  I agree this is wrong according to DOM 2
Range itself.  DOM 2 Range is a decent spec for its time, but we've
moved to much greater levels of precision these days.  One thing it
often does is not clearly distinguishing boundary points that "look"
the same, in that no nodes or characters lie between them.

> I assume that the range indicated by the underline in the spec and like
> *this* here collapses to just before the <EM> tag as this document has
> the same structure as the other example I pulled out of the spec.  This
> would mean that the start point of the other range should also be just
> before the <EM>, but that isn't what has happened in this example.

The example is buggy, yes.  The starting <EM> tag should be
highlighted according to both specs and according to browser behavior.

> Any idea what I've got wrong?  Some browsers (e.g. Safari) seem to
> behave as in the example, others (e.g. Firefox) put the end point before
> the <EM> (as I would have expected).

Here's a test case:

http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1086

Firefox 7.0a2, Chrome 14 dev, and Opera 11.50 all log "ABCD", "5",
"ange", "4", which matches my spec.  IE10PP2 logs "ABCD", "5",
"undefined", "1".  The "undefined" winds up being because it puts the
new endpoint in the <em> with offset 1, instead of in the Text node
"ange" with offset 4.  IE might or might not be able to argue that
it's correct per DOM 2 Range, but it's not correct according to the
new spec.

I have a reasonably comprehensive test suite for range mutation
behavior, by the way:

http://aryeh.name/spec/dom-range/test/Range-mutations.html

It only tests what happens with basic DOM operations like replaceData,
though, it doesn't check if things like Range.deleteContents do any
additional magic.


More information about the whatwg mailing list