Backwards Compatibility in Office Open XML

 

As a member of my country's national standards body committee on electronic data processing, I lately spend considerable time deliberating what our position should be in the upcoming Office Open XML ISO Ballot Resolution Meeting in Geneva. My biggest objection concerns large parts of the standard that are proposed to live in an Annex containing normative descriptions of deprecated features that will only be used by existing binary documents. The rationale behind this decision is backwards compatibility. My opinion is that this solution is counterproductive for a number of reasons.

Burden on Third-Party Applications

The current disposition of comments for Greece proposes various parts to be moved to a normative Annex. Here is a partial list of the corresponding Part 4 paragraphs.

  • 2.15.1.28, pages 1,158–1,172 (Hash algorithm)
  • 6, pages 4,343–4,960 (VML)
  • 2.15.3.26, pages 1,416–1,417 (Word 95 footnotes)
  • 2.15.3.31, page 1,426–1,427 (line wrap like Word 6)
  • 2.15.5.32, pages 1,427–1,428 (small caps Word for Mac)
  • 2.15.3.41, page 1,422–1,423 (shapeLayoutLikeWW8)
  • 2.15.3.51, pages 1,462–1,463 (suppressTopSpacingWP)
  • 2.15.3.53, pages 1,467–1,468 (truncateFontHeightsLikeWP6)
  • 2.15.3.6, pages 1,378–1,379 (autoSpaceLikeWord95)
  • 2.15.3.63, page 1,481 (useWord2002TableStyleRules)
  • 2.15.3.64, pages 1,482–1,483 (useWord97LineBreakRules)
  • 2.15.3.65, pages 1,483–1,484 (wpJustification)
  • 2.15.3.66, page 1,485 (wpSpaceWidth)
  • 2.16.5.5, page 1512 (AUTONUM)
  • 2.15.1.28, pages 1,158–1,172 (document protection)
This material covers more than 600 pages. Adding to it another 100 pages of the Office Open XML Math (pages 4964–5102 of the December 206 version of the proposed standard) gives us 700 pages of specifications that will exist in the standard solely for backward compatibility. To give you a perspective of this size, the complete standard for the C programming language (ISO 9899:1999) is 554 pages. Any third party wishing to fully support Office Open XML would have to implement those 700 pages, just to retain compatibility with legacy documents.

Backwards Compatibility is Not Preserved

Microsoft claims that a new standard and its huge normative Annex is required for backwards compatibility with legacy formats. Let's see how well this backwards compatibility works. I opened a new Word 2000 (SP3) document and wrote in it the words "hello, world". I then tried to save it using Microsoft's own Word 2007 document conversion support. This is the message I got
may contain features that are not compatible with Word 2007 document format
Now, if Microsoft's software can't faithfully convert a simple two-word document into XML, what are the chances of handling more complicated stuff? Therefore, let's drop the backward compatibility excuse.

The Proposed Solution is a Sham

Let us be honest about it. Backward compatibility with legacy documents can be preserved without burdening the standard with hundreds of pages of descriptions of non-standard formats. Current .docx documents are a zip bundle, like the following, containing various XML files.

 Length     Size  Ratio   Date   Time     Name
--------  ------- -----   ----   ----     ----
    1312      358  73%  01-01-80 00:00    [Content_Types].xml
     590      243  59%  01-01-80 00:00    _rels/.rels
     817      250  69%  01-01-80 00:00    word/_rels/document.xml.rels
    1035      463  55%  01-01-80 00:00    word/document.xml
    6992     1686  76%  01-01-80 00:00    word/theme/theme1.xml
    2172     1015  53%  01-01-80 00:00    word/settings.xml
    1031      382  63%  01-01-80 00:00    word/fontTable.xml
     260      187  28%  01-01-80 00:00    word/webSettings.xml
     725      386  47%  01-01-80 00:00    docProps/app.xml
     775      385  50%  01-01-80 00:00    docProps/core.xml
   14818     1788  88%  01-01-80 00:00    word/styles.xml
--------  -------  ---                    -------
   30527     7143  77%                    11 files
The only thing that is needed in order to faithfully preserve a legacy document is to include in the bundle the document in its binary format. An application can then choose to open the document in its legacy form, or in its current form. This can even be done in XML. The following XML document contains a gzipped-base64-encoded version of a Microsoft Word 2000 document.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:legacy_document xmlns:ve="http://schemas.openxmlformats.org/legacy">
H4sICAojvUcAA2hlbGxvLmRvYwDtXF1sVEUU/ububttdit1WLBW0LD+iic2GhmiAGNPaglUDNLZI
Y0i0pYtd3O2t2yUNSEjxlxgTa3jQGBPhwQQ00arESGIi+sSLkReMxBc0mvhgTEEfhATWc2bmtre3
f7tEIMD5krlnfs+Zv3v3fnNn9tQP1WcPf7boFwTwMEK4XIiizBenyC31AnFguY27XCgUOCpBriC4
[...]
i95+P4DNNAt2FlN8EmrIPj/D+JlVSv97lozVFPVAHu36XsiUZL+WWjBX+71578mSDBSBUvvfD+//
ewS3JhSNfihm5lDw2c33aWDvWqu7fVc21Z/X7wQbOziOovTNzP6kl55cg3/Wfv7C/zXDBVcL/wEX
ulKCAEwAAA==
</w:legacy_document>
However, such support for legacy documents should not be part of a standard, and solutions that "support" the migration of legacy documents through this mechanism should not be deemed to be standard compliant.

Proper Solution

So what is the proper solution to this problem? I am not convinced that Microsoft couldn't work within the existing ISO/IEC 26300:2006 (ODF) standard to cover the needs of its applications. Nevertheless, assuming that there is indeed a need for a second standard for office applications, the solution for backwards compatibility would be to translate legacy formats to conform to the main part of the proposed standard. If there are formatting styles that cannot be accommodated, then the standard should be amended to support those styles. If Microsoft can't write reliable code to transform legacy formats to the new format, then this is a problem of Microsoft, and not a problem that should be passed to all implementers and users by including in a new standard support for legacy formats. VML and Open XML Math should go.

Comments   Toot! Share


Last modified: Thursday, February 21, 2008 6:37 pm

Creative Commons Licence BY NC

Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.