[whatwg] [authoring] Roundtrip editing between word and xhtml

Karl Dubost karl at w3.org
Tue Feb 27 03:30:45 PST 2007

This is an interesting post from John Udell about the two extremes of  
authoring HTML with pros and cons and bridges he developed.

February 19, 2007
Blogging from Word 2007, crossing the chasm [1]

     To that end, I’m developing some Python code to help me
     wrangle Word’s default .docx format, which is a zip file
     containing the document in WordML and a bunch of other
     stuff. At the end of this entry you can see what I’ve got
     so far. I’m using this code to explore what kind of XML I
     can inject programmatically into a Word 2007 document,
     what kind comes back after a round trip through the
     application, how that XML relates to the HTML that gets
     published to WordPress, and which of these
     representations will be the canonical one that I’ll want
     to store and process.

      So far my conclusion is that none of these
     representations will be the canonical one, and that I’ll
     need to find (or more likely create) a transform to and
     from the canonical representation where I’ll store and
     process all my stuff. We’ll see how it goes.

[1] http://blog.jonudell.net/2007/02/19/blogging-from-word-2007- 

I like the mention of the canonical form.
Not exactly the same canonical form than his, but that would be good  
to have an html canonical form for editing. It would help building  
tools like for example htmldiff, tidy serialization, and source code  
visualizer in editing tools.

It would help authors also to work the way they want with their files  
and still communicate files between parties.
my source code layout <-- T1 --> canonical form <-- T2 --> your  
source code layout

T1 and T2 being formatting transformation.

Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
   QA Weblog - http://www.w3.org/QA/
      *** Be Strict To Be Cool ***

More information about the whatwg mailing list