[whatwg] HTML-to-plaintext conversion (innerText and Selection.toString())

Glenn Maynard glenn at zewt.org
Wed Feb 2 17:16:49 PST 2011

On Wed, Feb 2, 2011 at 5:30 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

> I should note that it's not clear to me how much we want to standardize
> what browsers actually copy when the user copies.  This seems like something
> that users may want to configure and where we want to let browsers
> experiment with heuristics and such; I have a really hard time believing
> that the current browser behavior here is the best we can do.

Given how often I've had poor results from copying (hidden blocks being
included, copying image alts, sprinkling newlines in strange places, and so
on), this seems important--browsers should be free to improve on copying
without violating the spec.

That leaves the question of whether Selection.toString should produce the
> same string as the user copying and pasting would, of course. Perhaps it
> shouldn't.  I'm not sure we'd want to make what toString produce depend on
> new CSS layout modes, for example, since that could break scripts... but the
> user-facing copied text might want to depend on those.

I'd intuitively expect toString to give the same results that the user would
get if he did a copy.  If the two differ, there should be a separate method
to do just  that, including any browser-specific heuristics and so on.  That
way, scripts can get the best possible text representation available, rather
than the most precisely-defined one, when that's what they want.

Glenn Maynard

