[whatwg] HTML-to-plaintext conversion (innerText and Selection.toString())

Boris Zbarsky bzbarsky at MIT.EDU
Fri Feb 4 12:15:53 PST 2011

On 2/4/11 2:59 PM, Aryeh Gregor wrote:
> On Fri, Feb 4, 2011 at 10:32 AM, Boris Zbarsky<bzbarsky at mit.edu>  wrote:
>> Until they try to use it on a disconnected subtree and it does something
>> weird, right?
> Well, it shouldn't do weird stuff on a disconnected subree.  :)  It
> doesn't in IE.

I thought you said Webkit would refuse to implement that sort of behavior?

>> This whole thing seems to me like an exercise in premature standardization.
>>   Browsers are actively experimenting with their dom-to-text conversion APIs.
>>   It'd be nice if it were happening behind vendor prefixes, but they started
>> before such prefixing was popular in the DOM world.
> Authors are using these features

Yes, I realize that.

> and they're implemented inconsistently.

Yes, I also realize that.  That's what it means that UAs are experimenting!

> If browsers are experimenting and you think there's
> some chance that we'll eventually get a standardizable algorithm

I have no idea whether they will, because I'm still not sure what 
problems we're trying to solve here...

 > then I don't see why the new algorithm can't use a new prefixed name 
> we reserve the legacy names for legacy-compatible behavior.

Compatible with what legacy?  We have at least 4 different legacies 
here, right?

>  From what WebKit and Opera people have told me, innerText is necessary
> for web-compat for non-Gecko browsers.  There are sites out there that
> use textContent if they sniff Firefox, and innerText otherwise.

That's really unfortunate (esp. if they actually sniff for "Firefox").  :(

> innerText apparently can't be exactly the same as textContent --
> Maciej said that "I know that if<br>  doesn't produce newlines, stuff
> will break"

But the "Firefox" path takes the textContent and does its own newline 
processing or something?

> and Opera does add extra newlines for<br>  (but doesn't
> seem to change much else).

I could live with an innerText that was textContent but converted <br> 
to newline, I guess...

> I'm slightly less sure about Selection.toString(), but I'd be inclined
> to take the same general approach.  It's much better for authors to
> have to code around browsers not offering them enough features than to
> code around browsers offering them incompatible features.  At least
> then they only have to do the coding work once.

I still think it'll really confuse authors that Selection.toString() 
won't do the same thing as copying.  But maybe I overestimate the 
problems it would cause....


More information about the whatwg mailing list