[whatwg] converting word (was <code> attributes

Adrian Sutton adrian.sutton at ephox.com
Fri May 1 04:42:25 PDT 2009

On 01/05/2009 12:27, "Bruce Lawson" <brucel at opera.com> wrote:
> On Fri, 01 May 2009 12:22:32 +0100, Adrian Sutton
> <adrian.sutton at ephox.com> wrote:
> Off topic, I know - but couldn't a VBA macro hook into word and actually
> make an "export as semantic html" option that exported the heading levels
> as h1..h6, honoured bold, italics, links, bullets and numbers as ul and
> ol, and just ignored all colours, font changes etc. So there is nothing to
> clean up?

Yes, but you'd have to get users to install the macro and get around virus
checkers etc. Then you'd have to write a separate applescript version for
Mac.  I'm not sure how difficult all that would be - it's cheaper for us to
just tweak and update the existing filtering code than it would be to go
down this route now.

It also wouldn't give the option to preserve the styles, but in a HTML
compliant way which the filtering process does give (and pretty impressive
accuracy in rendering given how little control HTML gives compared to Word's
native model). EditLive! for example could include the styles as either an
embedded stylesheet (thus preserving any custom styles from word as class
names) or inline styles (thus making a very messy but compliant bit of HTML
but useful if you're only editing a fragment).

For the record, OpenOffice and it's variants tend to create pretty good HTML
out of the box and don't need this kind of filtering at all.  Excel is even
surprisingly good but does add some proprietary attributes.


Adrian Sutton.
Adrian Sutton, CTO
UK: +44 1 753 27 2229  US: +1 (650) 292 9659 x717
Ephox <http://www.ephox.com/>
Ephox Blogs <http://planet.ephox.com/>, Personal Blog

More information about the whatwg mailing list