[whatwg] [authoring] Roundtrip editing between word and xhtml
Karl Dubost
karl at w3.org
Tue Feb 27 03:30:45 PST 2007
This is an interesting post from John Udell about the two extremes of
authoring HTML with pros and cons and bridges he developed.
February 19, 2007
Blogging from Word 2007, crossing the chasm [1]
To that end, I’m developing some Python code to help me
wrangle Word’s default .docx format, which is a zip file
containing the document in WordML and a bunch of other
stuff. At the end of this entry you can see what I’ve got
so far. I’m using this code to explore what kind of XML I
can inject programmatically into a Word 2007 document,
what kind comes back after a round trip through the
application, how that XML relates to the HTML that gets
published to WordPress, and which of these
representations will be the canonical one that I’ll want
to store and process.
So far my conclusion is that none of these
representations will be the canonical one, and that I’ll
need to find (or more likely create) a transform to and
from the canonical representation where I’ll store and
process all my stuff. We’ll see how it goes.
[1] http://blog.jonudell.net/2007/02/19/blogging-from-word-2007-
crossing-the-chasm/
I like the mention of the canonical form.
Not exactly the same canonical form than his, but that would be good
to have an html canonical form for editing. It would help building
tools like for example htmldiff, tidy serialization, and source code
visualizer in editing tools.
It would help authors also to work the way they want with their files
and still communicate files between parties.
my source code layout <-- T1 --> canonical form <-- T2 --> your
source code layout
T1 and T2 being formatting transformation.
--
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
QA Weblog - http://www.w3.org/QA/
*** Be Strict To Be Cool ***
More information about the whatwg
mailing list