[whatwg] HTML-to-plaintext conversion (innerText and Selection.toString())

Glenn Maynard glenn at zewt.org
Wed Feb 2 17:16:49 PST 2011


On Wed, Feb 2, 2011 at 5:30 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

> I should note that it's not clear to me how much we want to standardize
> what browsers actually copy when the user copies.  This seems like something
> that users may want to configure and where we want to let browsers
> experiment with heuristics and such; I have a really hard time believing
> that the current browser behavior here is the best we can do.
>

Given how often I've had poor results from copying (hidden blocks being
included, copying image alts, sprinkling newlines in strange places, and so
on), this seems important--browsers should be free to improve on copying
without violating the spec.

That leaves the question of whether Selection.toString should produce the
> same string as the user copying and pasting would, of course. Perhaps it
> shouldn't.  I'm not sure we'd want to make what toString produce depend on
> new CSS layout modes, for example, since that could break scripts... but the
> user-facing copied text might want to depend on those.
>

I'd intuitively expect toString to give the same results that the user would
get if he did a copy.  If the two differ, there should be a separate method
to do just  that, including any browser-specific heuristics and so on.  That
way, scripts can get the best possible text representation available, rather
than the most precisely-defined one, when that's what they want.

-- 
Glenn Maynard



More information about the whatwg mailing list