[whatwg] HTML-to-plaintext conversion (innerText and Selection.toString())

Aryeh Gregor Simetrical+w3c at gmail.com
Fri Feb 4 13:17:13 PST 2011


On Fri, Feb 4, 2011 at 3:15 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
>> Well, it shouldn't do weird stuff on a disconnected subree.  :)  It
>> doesn't in IE.
>
> I thought you said Webkit would refuse to implement that sort of behavior?

I don't think I said that -- I can't speak for WebKit at all.  The
only things I know about it are what Maciej has said to me.

> I have no idea whether they will, because I'm still not sure what problems
> we're trying to solve here...

The thing is, I don't think there are any big problems.  AFAICT, the
use-cases are marginal and can be solved via JavaScript if people
actually want to.  So I don't see reason to maintain incompatibility
here.

>>  From what WebKit and Opera people have told me, innerText is necessary
>> for web-compat for non-Gecko browsers.  There are sites out there that
>> use textContent if they sniff Firefox, and innerText otherwise.
>
> That's really unfortunate (esp. if they actually sniff for "Firefox").  :(

Apparently search.blogs.ebay.com, or some page on it, formerly contained

if(vjo.dsf.client.Browser.bFirefox){oL.textContent=_d;}else{oL.innerText=_d;}

I don't know if this is very common, though, to be fair.

> But the "Firefox" path takes the textContent and does its own newline
> processing or something?

I don't know.  I haven't seen specific examples where innerText can't
behave exactly like textContent.  But Maciej said he's seen such
content, and newline handling is the only substantive difference in
Opera between innerText and textContent, so I assume it's there for a
reason.  It could be that such pages just break if innerText doesn't
exist.  I've certainly found such pages, e.g.,

http://beihai.gov.cn/5863/2010_4_28/5863_98289_1272423962000.html?open=1

A whole chunk of content is missing in Firefox.  Such pages seem to be
mostly East Asian.


More information about the whatwg mailing list