[whatwg] HTML-to-plaintext conversion (innerText and Selection.toString())
Tim Down
timdown at gmail.com
Fri Feb 4 02:22:38 PST 2011
On 4 February 2011 01:25, Aryeh Gregor <Simetrical+w3c at gmail.com> wrote:
> On Thu, Feb 3, 2011 at 4:41 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
>> And all I'm saying is that there are at least three pieces of data here:
>>
>> 1) innerText return value
>> 2) Selection.toString() return value
>> 3) What the browser actually copies
>>
>> My point is that browsers must be free to modify #3 as desired. Dictating it
>> in a web spec, is not acceptable, imo.
>
> Sure.
>
>> Agreed, I think; but should that be Selection.toString() or some other API?
>> That is are we hijacking Selection.toString() because it's convenient, or
>> because it's the right way to expose such an algorithm?
>
> innerText seems like a reasonable place to put such an API, if only
> because WebKit already does it. It's not ideal a priori, but by the
> consistency standards of the web platform it's not noticeably bad. I
> should particularly point out that your typical author is not going to
> have the foggiest notion of separation of DOM and CSS and so on -- it
> will make intuitive sense to authors to have it at innerText as much
> as anywhere.
>
> I did actually find a couple of sites that defined functions that
> accepted an HTML string, created a div, assigned the HTML to the div's
> innerHTML, and returned the innerText (or textContent if innerText is
> unavailable):
>
> http://api.opencast.naver.com/CS888/23
> http://bbs.ptbus.com/thread-22143-1-1.html
>
> I didn't find what they were actually used for, though. Note that
> this breaks if innerText doesn't work correctly for non-displayed
> elements, so basically it will only do any prettification in IE.
>
>> Depending on your definition of "okay", yes. I mean... we have an "okay"
>> way that's interoperable now (I hope): Range.toString. Except you don't
>> think it does an okay job, clearly. I agree on that; I don't necessarily
>> agree that current browser Selection.toString does an "okay" job.
>
> Actually, if browsers are willing to converge on making innerText work
> like textContent and Selection.toString() work like Range.toString(),
> I'd be okay with that. There are use-cases for a standardized
> plaintext conversion API, but at this point I think they're too
> marginal to be worth the effort of actually specifying and
> implementing. Such an API is inherently going to be either not very
> good or unreasonably complicated. There's no reason at all that you
> couldn't implement such an API in a JavaScript library -- I don't see
> why it has to be part of the web platform.
>
> I've been told Opera doesn't care about this and will implement
> whatever is specced as long as it's web-compatible and not too
> complicated to be worth the effort. Gecko (at least that portion that
> I'm talking to :) ) seems to be skeptical of implementing anything
> very complicated here either. But Maciej has told me that WebKit
> doesn't want to scrap its elaborate plaintext-conversion APIs (which
> have by far the best fidelity of any browser's from what I see).
>
> So as I see it, the easiest solution would be for WebKit to agree to
> move its APIs to prefixed versions if it wants to keep them, and
> change behavior of the unprefixed ones to something like textContent.
> (Possibly with minimal differences for web-compat -- Opera's behave
> slightly differently, and IIRC I was told it's for web-compat
> reasons.)
>
> On the other hand, if WebKit is unwilling to accept anything other
> than a complicated plaintext conversion algorithm here, I don't think
> we're going to have interop in the foreseeable future no matter what.
> Even if it gets specced, no one will want to implement it. I'm not
> clear on whether WebKit would be willing to implement a standardized
> algorithm either, given the nonexistent web-compat issues. So in that
> case I'd try to ask Microsoft, and unless they side with WebKit, we
> can at least have everyone but WebKit converge on
> innerText/Selection.toString() behaving as similarly to
> textContent/Range.toString() as possible.
>
> How does that sound to everyone?
It sounds less than ideal to me. From the perspective of web
developer, that removes useful functionality. I'm not too bothered
about innerText, but it's not hard to come up with use cases for an
implementation of Selection.toString that returns the text that is
visually selected on the screen rather than the trivial concatenation
of calling toString on its Ranges. For example, a bookmarklet to
search the web for the text the user has selected in the current page,
or a tooltip that show content relating to the current selected text.
I don't think it's necessary to have perfect interoperability for this
to be useful: the current situation is not that bad, although IE9
worsens it since it implements the Range-toString-concatenation
approach that is in the current spec and is now being suggested again.
I also suspect that use of Selection.toString is fairly widespread and
browsers changing their implementation to this could break a lot of
pages.
Tim
More information about the whatwg
mailing list