Designing Schemas for Backward and Forward Compatibility

Issue

How do you design a schema so that different versions of the schema are backward and forward compatible?

[Definition] An old and new schema are backward and forward compatible with the other if each can validate XML instance documents that were written to the other schema. Thus, a new application can process XML instances from an old application, and an old application can process XML instances from a new application.

This document will show how to design backward and forward compatible XML Schemas, Relax NG schemas, and Schematron schemas.

Contents:

Example of Backward and Forward Compatibility for Three Versions of an XML Schema

The example we will use is a Book schema.

Suppose there are three applications - app1, app2, app3.
app1 is designed to produce and consume the version #1 Book schema.
app2 is designed to produce and consume the version #2 Book schema.
app3 is designed to produce and consume the version #3 Book schema.
The technique we will see enables app1 to process XML instances from app2 or app3; app2 can process XML instances from app1 or app3; and app3 can process XML instances from app1 or app2. Nice!

Here's how to design the schemas to support backward and forward compatibility:

The version #1 Book schema creates an optional <Other> element into which arbitrary new elements can be placed:

<element name="Book">
    <complexType>
        <sequence>
            <element name="Title" type="string"/>
            <element name="Author" type="string"/>
            <element name="Date" type="date"/>
            <element name="ISBN" type="string"/>
            <element name="Publisher" type="string"/>
            <element name="Other" minOccurs="0">
                <complexType>
                    <sequence>
                        <any minOccurs="0" maxOccurs="unbounded" processContents="lax"/>
                    </sequence>
                </complexType>
            </element>
        </sequence>
    </complexType>
</element> 

The contents of Book is: Title, Author, Date, ISBN, Publisher and Other, which can contain anything.

Here's a sample XML instance:

<Book>
    <Title>My Life and Times</Title>
    <Author>Paul McCartney</Author>
    <Date>1998</Date>
    <ISBN>1-56592-235-2</ISBN>
    <Publisher>McMillan Publishing</Publisher>
</Book>

... Time elapses. It is decided to update the Book schema. We want to add a <NumPages> element, without breaking old applications (version #1 applications).

Inside the <Other> element add a declaration for <NumPages> and add another (nested) <Other> element:

<element name="Book">
    <complexType>
        <sequence>
            <element name="Title" type="string"/>
            <element name="Author" type="string"/>
            <element name="Date" type="date"/>
            <element name="ISBN" type="string"/>
            <element name="Publisher" type="string"/>
            <element name="Other" minOccurs="0">
                <complexType>
                    <sequence>
                        <element name="NumPages" type="nonNegativeInteger"/>
                        <element name="Other" minOccurs="0">
                            <complexType>
                                <sequence>
                                    <any minOccurs="0" maxOccurs="unbounded" processContents="lax"/>
                                </sequence>
                            </complexType>
                        </element>
                    </sequence>
                </complexType>
            </element>
        </sequence>
    </complexType>
</element>

Now the contents of Book is: Title, Author, Date, ISBN, Publisher, Other.
Inside Other is NumPages and another Other, which can contain anything.

Here's a sample XML instance:

<Book>
    <Title>My Life and Times</Title>
    <Author>Paul McCartney</Author>
    <Date>1998</Date>
    <ISBN>1-56592-235-2</ISBN>
    <Publisher>McMillan Publishing</Publisher>
    <Other>
        <NumPages>345</NumPages>
    </Other>
</Book>

This instance will validate against the version #1 schema as well as the version #2 schema.

Further, the version #1 instance shown above will validate against this new schema.

... More time elapses. It is decided to update the Book schema again. We want to add a <Hardcover> element, without breaking old applications (version #1 or version #2 applications).

Inside the nested <Other> element add a declaration for <Hardcover> and add another (nested) <Other> element:

<element name="Book">
    <complexType>
        <sequence>
            <element name="Title" type="string"/>
            <element name="Author" type="string"/>
            <element name="Date" type="date"/>
            <element name="ISBN" type="string"/>
            <element name="Publisher" type="string"/>
            <element name="Other" minOccurs="0">
                <complexType>
                    <sequence>
                        <element name="NumPages" type="nonNegativeInteger"/>
                        <element name="Other" minOccurs="0">
                            <complexType>
                                <sequence>
                                    <element name="Hardcover" type="boolean"/>
                                    <element name="Other" minOccurs="0">
                                        <complexType>
                                            <sequence>
                                                <any minOccurs="0" maxOccurs="unbounded" processContents="lax"/>
                                            </sequence>
                                        </complexType>
                                    </element>
                                </sequence>
                            </complexType>
                        </element>
                    </sequence>
                </complexType>
            </element>
        </sequence>
    </complexType>
</element>

Now the contents of Book is: Title, Author, Date, ISBN, Publisher, Other.
Inside Other is NumPages and another Other.
Inside the Other is Hardcover and a third Other, which can contain anything.

Here's a sample XML instance:

<Book>
    <Title>My Life and Times</Title>
    <Author>Paul McCartney</Author>
    <Date>1998</Date>
    <ISBN>1-56592-235-2</ISBN>
    <Publisher>McMillan Publishing</Publisher>
    <Other>
        <NumPages>345</NumPages>
        <Other>
            <Hardcover>true</Hardcover>
        </Other>
    </Other>
</Book>

This instance will validate against the version #1 schema as well as the version #2 schema as well as the version #3 schema.

In fact, all instances will validate against all schemas. There is backward and forward compatibility among all schema versions!

If you would like to see the complete schemas and XML instances, here they are:

Disadvantage: the disadvantage of this approach is that it's not scalable. Imagine 50 versions of the schema - it would result is a nesting of 50 levels in both the schema and instance documents.

Advantage: the advantage of this approach is that the schemas are all backward and forward compatible. An XML instance created against one schema will validate against any other version. Nice!

Example of Backward and Forward Compatibility for Three Versions of a Relax NG Schema

The example we will use is a Relax NG version of the Book schema.

As in the above discussion, suppose there are three applications - app1, app2, app3.
app1 is designed to produce and consume the version #1 Book Relax NG schema.
app2 is designed to produce and consume the version #2 Book Relax NG schema.
app3 is designed to produce and consume the version #3 Book Relax NG schema.
The technique we will see enables app1 to process XML instances from app2 or app3; app2 can process XML instances from app1 or app3; and app3 can process XML instances from app1 or app2. Nice!

Here's how to design the Relax NG schemas to support backward and forward compatibility:

The version #1 Book Relax NG schema creates an optional section into which an element (with any name) can be placed, followed by zero or more other elements (also with any name):

<element name="Book">
    <element name="Title">
        <text/>
    </element>
    <element name="Author">
        <text/>
    </element>
    <element name="Date">
        <text/>
    </element>
    <element name="ISBN">
        <text/>
    </element>
    <element name="Publisher">
        <text/>
    </element>
    <optional>
        <element>     
            <anyName/>
            <text/>
        </element>
        <zeroOrMore>
            <element>     
                <anyName/>
                <text/>
            </element>
        </zeroOrMore>
    </optional>
</element> 

The contents of Book is: Title, Author, Date, ISBN, Publisher, and optionally other elements.

Here's a sample XML instance:

<Book>
    <Title>My Life and Times</Title>
    <Author>Paul McCartney</Author>
    <Date>1998</Date>
    <ISBN>1-56592-235-2</ISBN>
    <Publisher>McMillan Publishing</Publisher>
</Book>

... Time elapses. It is decided to update the Book Relax NG schema. We want to add a <NumPages> element, without breaking old applications (version #1 applications).

Inside the optional section replace the first element with <NumPages> and replace the zeroOrMore section with another (nested) optional section:

<element name="Book">
    <element name="Title">
        <text/>
    </element>
    <element name="Author">
        <text/>
    </element>
    <element name="Date">
        <text/>
    </element>
    <element name="ISBN">
        <text/>
    </element>
    <element name="Publisher">
        <text/>
    </element>
    <optional>
        <element name="NumPages"> 
            <data type="nonNegativeInteger"/>
        </element>
        <optional>
            <element>     
                <anyName/>
                <text/>
            </element>
            <zeroOrMore>
                <element>     
                    <anyName/>
                    <text/>
                </element>
            </zeroOrMore>
        </optional>
    </optional>
</element>

Now the contents of Book is: Title, Author, Date, ISBN, Publisher, NumPages, and optionally other elements.

Here's a sample XML instance:

<Book>
    <Title>My Life and Times</Title>
    <Author>Paul McCartney</Author>
    <Date>1998</Date>
    <ISBN>1-56592-235-2</ISBN>
    <Publisher>McMillan Publishing</Publisher>
    <NumPages>345</NumPages>
</Book>

This instance will validate against the version #1 Relax NG schema as well as the version #2 Relax NG schema.

Further, the version #1 instance shown above will validate against this new Relax NG schema.

... Time elapses. It is decided to update the Book Relax NG schema again. We want to add a <Hardcover> element, without breaking old applications (version #1 or version #2 applications).

Inside the nested optional section replace the first element with <Hardcover> and replace the zeroOrMore section with another (nested) optional section:

<element name="Book">
    <element name="Title">
        <text/>
    </element>
    <element name="Author">
        <text/>
    </element>
    <element name="Date">
        <text/>
    </element>
    <element name="ISBN">
        <text/>
    </element>
    <element name="Publisher">
        <text/>
    </element>
    <optional>
        <element name="NumPages"> 
            <data type="nonNegativeInteger"/>
        </element>
        <optional>
            <element name="Hardcover"> 
                <data type="boolean"/>
            </element>
            <optional>
                <element>     
                    <anyName/>
                    <text/>
                </element>
                <zeroOrMore>
                    <element>     
                        <anyName/>
                        <text/>
                    </element>
                </zeroOrMore>
            </optional>
        </optional>
    </optional>
</element>

Now the contents of Book is: Title, Author, Date, ISBN, Publisher, NumPages, Hardcover, and optionally other elements

Here's a sample XML instance:

<Book>
    <Title>My Life and Times</Title>
    <Author>Paul McCartney</Author>
    <Date>1998</Date>
    <ISBN>1-56592-235-2</ISBN>
    <Publisher>McMillan Publishing</Publisher>
    <NumPages>345</NumPages>
    <Hardcover>true</Hardcover>
</Book>

This instance will validate against the version #1 Relax NG schema as well as the version #2 Relax NG schema as well as the version #3 Relax NG schema.

In fact, all instances will validate against all Relax NG schemas. There is backward and forward compatibility among all Relax NG schema versions!

If you would like to see the complete Relax NG schemas and XML instances, here they are:

Advantages:

  1. This approach is scalable. Suppose 50 versions of the Relax NG schema are made. While the schema will have a 50-level nesting, the XML instance documents remain flat (unnested). Nice!
  2. The Relax NG schemas are all backward and forward compatible. An XML instance created against one schema will validate against any other version. Nice!

Example of Backward and Forward Compatibility for Three Versions of a Schematron Schema

Achieving backward and forward compatibility using Schematron is trivial. For each new version add a new Schematron "phase". Thus, the first phase is used to validate the core Book data - Title, Author, Date, ISBN, and Publisher. The second phase validates that each Book has a NumPages element. The third phase validates that each Book has a Hardcover element.

Here is the Schematron schema, showing all three phases:

<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">

   <sch:phase id="coreBookDataValidation">

      <sch:p>Validate that every book has the core data: Title, Author, Date, ISBN, and Publisher.</sch:p>

       <sch:active pattern="core" />

   </sch:phase>

   <sch:phase id="version2Validation">

      <sch:p>Validate that each book has NumPages.</sch:p>

       <sch:active pattern="count" />

   </sch:phase>

   <sch:phase id="version3Validation">

      <sch:p>Validate that each book has Hardcover.</sch:p>

       <sch:active pattern="cover" />

   </sch:phase>

   <sch:pattern name="Core Book Data" id="core">

      <sch:p>A Book is minimally required to provide the Title,
             Author, Date, ISBN, and Publisher.</sch:p> 

      <sch:rule context="Book">

         <sch:assert test="count(Title) = 1 and
                           count(Author) = 1 and
                           count(Date) = 1 and
                           count(ISBN) = 1 and
                           count(Publisher) = 1">
             Book is comprised of one Title, one Author, one Date, 
             one ISBN, and one Publisher
         </sch:assert>

      </sch:rule>

   </sch:pattern>

   <sch:pattern name="NumPages Extension" id="count">

      <sch:p>The Book data is extended with an indication
             of the number of pages.</sch:p>  

      <sch:rule context="Book">

         <sch:assert test="count(NumPages) = 1">
             A Book is comprised of one NumPages
         </sch:assert>

      </sch:rule>

   </sch:pattern>

   <sch:pattern name="Hardcover Extension" id="cover">

      <sch:p>The Book data is extended with an indication
             of whether it's a hardcover.</sch:p>  

      <sch:rule context="Book">

         <sch:assert test="count(Hardcover) = 1">
             A Book is comprised of one Hardcover
         </sch:assert>

      </sch:rule>

   </sch:pattern>

</sch:schema>

Here are three sample XML instances:

Instance #1:
<Book>
    <Title>My Life and Times</Title>
    <Author>Paul McCartney</Author>
    <Date>1998</Date>
    <ISBN>1-56592-235-2</ISBN>
    <Publisher>McMillan Publishing</Publisher>
</Book>
----------------------------------------------------
Instance #2:
<Book>
    <Title>My Life and Times</Title>
    <Author>Paul McCartney</Author>
    <Date>1998</Date>
    <ISBN>1-56592-235-2</ISBN>
    <Publisher>McMillan Publishing</Publisher>
    <NumPages>345</NumPages>
</Book>
----------------------------------------------------
Instance #3:
<Book>
    <Title>My Life and Times</Title>
    <Author>Paul McCartney</Author>
    <Date>1998</Date>
    <ISBN>1-56592-235-2</ISBN>
    <Publisher>McMillan Publishing</Publisher>
    <NumPages>345</NumPages>
    <Hardcover>true</Hardcover>
</Book>

The first instance will validate against the "core" phase. The second and third instances will validate against the "core" and "count" phases. The third instance will validate against all three phases - "core", "count" and "cover".

There is backward and forward compatibility!

Here is the Schematron schema and the XML instances:

Advantages:

  1. It is trivial to achieve backward and forward compatibility - simply add more phases as desired.
  2. This approach is scalable. Suppose 50 versions of the Schematron schema are made. Simply add 50 phases. Nice!
  3. The Schematron schema is backward and forward compatible. An XML instance created against one version will validate against any other version. Nice!

Acknowledgements

The following people contributed to the creation of this document:

Tags

Last Updated: August 27, 2007