Creating Variable Content Container Elements

(A Collectively Developed Set of Schema Design Guidelines)

XML Schemas: Best Practices	Default Namespace - targetNamespace or XMLSchema?	Hide (Localize) Versus Expose	Element versus Type
Global versus Local	Zero, One, or Many Namespaces	Creating Extensible Content Models	Extending XML Schemas

Issue
Introduction
Example
Method 1: Implementing variable content containers using an abstract element and element substitution
Method 2: Implementing variable content containers using a <choice> element
Method 3: Implementing variable content containers using an abstract type and type substitution
Method 4: Implementing variable content containers using a dangling type
Best Practice
Acknowlegements

Issue

What is the Best Practice for implementing a container element that is to be comprised of variable content?

Introduction

A typical problem when creating an XML Schema is to design a container element (e.g., Catalogue) which is to be comprised of variable content (e.g., Book, or Magazine, or ...) Some things to consider:

Do we allow the elements in the variable content container to come from disjoint sources, i.e., do we allow the container element to contain dissimilar, independent, loosely coupled elements?
How do we design the variable content container so that the kinds of elements which it may contain can grow over time, i.e., how do we design an extensible variable content container?

Example

Throughout this discussion we will consider variable content containers (e.g., <Catalogue>) which are comprised of a collection of elements, where each element is variable.

Here's an example of a <Catalogue> container element comprised of two different kinds of elements:

Below are four methods for implementing variable content containers.

Method 1: Implementing variable content containers using an abstract element and element substitution

Description:

There are five XML Schema concepts that must be understood for implementing this method:

an element can be declared abstract.
abstract elements cannot be instantiated in instance documents (they are only placeholders).
in instance documents the abstract element must be substituted by non-abstract (i.e., concrete) elements which have been declared to be in a substitutionGroup with the abstract element.
elements may be declared to be in a substitutionGroup with the abstract element iff their type is the same as, or derives from the abstract element's type.
the abstract element and all elements in its substitutionGroup must be declared as global elements.

Implementation:

Declare an abstract element (Publication): Declare a variable content container element (Catalogue) to have as its content the abstract element ("ref" to the abstract element declaration): Note that maxOccurs="unbounded", so Catalogue may contain a collection (one or more) of Publication elements.

Declare the concrete elements (Book and Magazine) that are to be the contents of the variable content container and declare them to be in a substitutionGroup with the abstract element:

In order for Book and Magazine to substitute for Publication, their types (BookType and MagazineType) must derive from Publication's type (PublicationType). Here are the type definitions:

PublicationType - the base type:

BookType - extends PublicationType by adding two new elements, ISBN and Publisher: MagazineType - restricts PublicationType by striking out the Author element:

Advantages:

Extensible: This method allows you to extend the set of elements that may be used the variable content container element, even if the schema for the variable content container element is outside your control. For example, suppose that you do not have privilege to modify the above Catalogue schema. Currently, the Catalogue element can only contain Book and Magazine elements. But suppose that your application has a hard requirement for CD elements as well:
```
   <Catalogue>
        <Book> ... </Book>
        <CD> ... </CD>
        <Magazine> ... </Magazine>
        <Book> ... </Book>
   </Catalogue>
```
How can you extend the set of elements that Catalogue may be comprised of, without modifying its schema?
Answer: You can create your own separate schema which contains a declaration of CD (with a type, CDType, that extends the PublicationType in the Catalogue schema), and declares CD to be in the Publication substitutionGroup:
```
    <xsd:include schemaLocation="Catalogue.xsd"/>
    <xsd:complexType name="CDType">
        <xsd:complexContent>
            <xsd:extension base="PublicationType">
                <xsd:sequence>
                    <xsd:element name="RecordingCompany" 
                                 type="xsd:string"/>
                </xsd:sequence>
            </xsd:extension>
        </xsd:complexContent>
    </xsd:complexType>
    <xsd:element name="CD" substitutionGroup="Publication"
                 type="CDType"/>
```
The CD element meets the requirements for being in the variable content container:
- its type (CDType) derives from the PublicationType, and
- it is a member of the Publication element's substitutionGroup.
Book, Magazine, and CD may now be used within the Catalogue element. Thus, we see that this method allows us to extend the set of elements that may be used in the Catalogue element, without modifying its schema. Nice!
Semantic Cohesion: the elements in the variable content container all descend from the same type hierarchy (PublicationType). This type hierarchy binds them together, giving a structural (and, by implication, semantic) coherence to all the elements that may be in the variable content container.

Disadvantages:

No Independent Elements: The type of the elements that are to be used in the variable content container must all descend from the abstract element's type (PublicationType). Further, the elements must be in a substitutionGroup with the abstract element. Thus, the variable content container cannot contain elements whose type does not derive from the abstract element's type, or is not in the substitutionGroup with the abstract element - as would typically be the case with independently developed elements. For example, suppose another schema author creates a "Newspaper" element, with a type that does not descend from PublicationType. <Catalogue> would not be able to contain the <Newspaper> element.
Limited Structural Variability: Over time a schema will evolve, and the kinds of elements which may occur in the variable content container will typically grow. There is no way to know apriori in what direction it will grow. The new elements may be conceptually related but structurally vastly different from the original set of elements. The abstract element's type (e.g., PublicationType) may have been a good base type for the original set of elements which were all structurally related, but may not be a good base type for the new elements which have vastly different structures.
So you are faced with a tradeoff:
- create a simple base type to support lots of different structures (but then you can make less assumptions about the structure of the members), or
- create a rich base type to support strong data type checking (but then you reduce the ability to add elements with radically different types)
Nonscalable Processing: Processing a collection of differently named elements requires a lot of special-case code. For example, consider a stylesheet to process each element in <Catalogue>:
```
     <xsl:if test="Book">
        -- process Book --
     </xsl:if>
     <xsl:if test="Magazine">
        -- process Magazine --
     </xsl:if>
```
This stylesheet snippet suffers from lack of scalability, i.e., it breaks as soon as a new element is added.
This argument needs some qualification. If the contents of <Catalogue> are just elements that substitute for the abstract Publication element, then each element can be uniformly processed, as follows:
```
     <xsl:for-each select="Catalogue/*">
        -- process the element --
     </xsl:for-each>
```
This stylesheet snippet processes each element within Catalogue, regardless of the element name. Obviously, this is scalable, and does not break when a new element is added.
Processing becomes non-scalable when Catalogue contains multiple abstract elements:
```
    <xsd:element name="Catalogue">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element ref="Publication"
                             maxOccurs="unbounded"/>
                <xsd:element ref="Retailer" 
                             maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
```
Suppose that both Publication and Retailer are abstract elements, and there can be any number of each kind of element within Catalogue. Here's a sample instance:
```
    <Catalogue>
        <Book> ... </Book>
        <Magazine> ... </Magazine>
        <Book> ... </Book>
        <MarketBasket> ... </MarketBasket>
        <Macys> ... </Macys>
    </Catalogue>   
```
If you wish to process just the Publication elements (e.g., Book, Magazine) then you will need to write special-case code, as shown above. This is not scalable. Every time a new element is added into the collection of elements that may substitute for the Publication element then your code will have to be updated. This is costly.
No Control over Namespace Exposure: This method requires that the elements which may be used in the variable content container be in a substitutionGroup with the abstract element (e.g., Book and Magazine must be in a substitutionGroup with Publication). A requirement of using substitionGroup is that all elements must be declared globally. The namespace of global elements can never be hidden in instance documents. As a consequence, there is no way to hide (localize) the namespaces of the elements used in the variable content container. This fails the Best Practice rule which states that you should design your schema to be able to hide or expose namespaces at your discretion (using elementFormDefault as an exposure switch). (See Hide (Localize) Versus Expose Namespaces)

Method 2: Implementing variable content containers using a <choice> element

Description:

This method is quite straightforward - simply list within a <choice> element all the elements which can appear in the variable content container, and embed the <choice> element in the container element.

Implementation:

Declare within a <choice> element all the elements (e.g., Book, Magazine) that may be used in the variable content container. Embed the <choice> element within the container element (Catalogue):

Advantages:

Independent Elements: The elements in the variable content container do not need a common type ancestry. They don't have to be related in any way. Thus, the variable content container can contain dissimilar, independent, loosely coupled elements.

Disadvantages:

Nonextensible: Suppose that the Catalogue schema is outside your control. Currently the variable content container only supports Book and Magazine. Suppose that you have a hard requirement for your instance documents to use CD as well as Book and Magazine within Catalogue, e.g.,
```
    <Catalogue>
        <Book> ... </Book>
        <CD> ... </CD>
        <Magazine> ... </Magazine>
        <Book> ... </Book>
    </Catalogue>
```
This method requires that the <choice> element in the Catalogue schema be modified to include the CD element. However, we stipulated that the Catalogue schema is outside your control, so it cannot be modified. This method has serious extensibility restrictions!
No Semantic Coherence: The <choice> element allows you to group together dissimilar elements. While that has been touted as an advantage, it is really a double-edged sword. The elements in the variable content container have no type hierarchy to bind them together, to provide structural (and, by implication, semantic) coherence among the elements. Thus, when processing an instance document you can make no assumptions about the structure of the elements.

Method 3: Implementing variable content containers using an abstract type and type substitution

Description:

There are three XML Schema concepts that must be understood for implementing this method:

a complexType can be declared abstract.
an element declared to be of an abstract type cannot have its type instantiated in instance documents (that is, the element can be instantiated, but its abstract content may not).
in instance documents an element with an abstract type must have its content substituted by content from a non-abstract (concrete) type which derives from the abstract type. This is called type substitution.

Implementation:

Define an abstract base type (PublicationType): Declare the container element (Catalogue) to contain an element (Publication), which is of the abstract type: In instance documents, the content of <Publication> can only be of a concrete type which derives from PublicationType, such as BookType or MagazineType (we saw these type definitions in Method 1 above).

With this method instance documents will look different than we saw with the above two methods. Namely, <Catalogue> will not contain variable content. Instead, it will always contain the same element (Publication). However, that element will contain variable content:

Advantages:

Extensible: Same extensibility benefits as method 1. Namely, this method allows you to easily extend the set of elements that may be used in the variable content container simply by creating new types which derive from the abstract type. These new types can be defined in a separate, independent schema. Thus, you are able to extend the schema without modifying it!
Minimal Dependencies: This method has less dependencies (coupling) than method 1. To extend the collection of elements that may appear in a variable content container using method 1 you need access to both the abstract element (Publication) and its type (PublicationType). With method 3 you only need access to the abstract type. If we assume that in a typical scenario only the types will be put in publicly accessible schemas, then method 3 is the only viable method.
Scalable Processing: Processing a series of <Publication> elements is scalable. For example, a stylesheet could process each publication element as follows:
```
     <xsl:for-each select="Publication">
        -- do something --
     </xsl:for-each>
```
As new types are created (e.g., CDType) no change is needed to the code.
Semantic Cohesion: the elements in the variable content container all descend from the same type hierarchy. This type hierarchy binds them together, giving a structural (and, by implication, semantic) coherence among the elements.
Control over Namespace Exposure: the variable part of the variable content container are the element declarations that are embedded within type definitions. Consequently, we can control exposure of the namespaces of the variable content container elements. This is consistent with the Best Practice design recommendation we issued for hide (localize) versus expose namespaces. (See Hide (Localize) Versus Expose Namespaces)

Disadvantages:

No Independent Elements: Same weakness as with method 1. All types must descend from an abstract type. This requirement prohibits the use of types which do not descend from the abstract type, as would typically be the situation when the type is in another, independently developed schema.
Limited Structural Variability: Same weakness as with method 1. Namely, to facilitate strong type checking you want to have a rich base type, but this is in direct conflict with the desire for components with vastly different structures, which calls for a weak base type.

Method 4: Implementing variable content containers using a dangling type

Motivation:

Thus far our variable content container has contained complex content (i.e., child elements). Suppose that we want to create a variable content container to hold simple content? None of the previous methods can be used. We need a method that allows us to create simpleType variable content containers.

There is one key XML Schema concept that must be understood for implementing this method:

with an <import> element the schemaLocation attribute is optional

Description:

Let's take an example. Suppose that we desire an element, sensor, which contains the name of a weather station sensor. For example: There are several things to note:

This element holds a simpleType
Each weather station may have sensors that are unique to it. Consequently, we must design our schema so that the sensor element can be customized by each weather station

Here's an elegant design for making the contents of <sensor> customizable by each weather station:

- When you create sensor, declare it to be of a type from another namespace.
- Then, when you <import> that namespace don't provide a schemaLocation.
- Thus, the element is declared to be of a type for which no particular schema is identified, i.e., we have a dangling type!

Implementation:

Let's go through the design, step by step. In your schema, declare the sensor element: Note that the sensor element is declared to have a type "sensor_type", which is in a different namespace - the sensor namespace: Now here's the key - when you <import> this namespace, don't provide a value for schemaLocation! (In an import element schemaLocation is optional.) For example: The instance document must then identify a schema that implements sensor_type. Thus, at run time (i.e., validation time) we are matching up the reference to sensor_type with an implementation of sensor_type. For example, an instance document may have this: In this instance document schemaLocation is identifying a schema, boston-sensors.xsd, which is to provide the implementation of sensor_type.

Let's take a look at the schemas and instance documents for the weather station sensor example we have been considering. Here's the main schema, which contains the dangling type:

weather-station.xsd

Note that the <import> element does not have a schemaLocation attribute to identify a particular schema which implements sensor_type. (Stated differently, this schema does not hardcode in the identity of the schema which is to provide the implementation of sensor_type.) The schema validator will resolve the reference to sensor_type based upon collection of schemas that is provided to it in the instance document.

The Boston weather station creates a schema which implements sensor_type:

boston-sensors.xsd

Now an instance document can conform to weather-station.xsd and use boston-sensors.xsd as the implementation of sensor_type:

boston-weather-station.xml

Suppose that the London weather station has all the sensors that Boston has, plus some additional ones that are unique to the London weather patterns. Thus, London will create its own implementation of sensor_type:

london-sensors.xsd

Note that this schema has an additional sensor_type that Boston does not have - hygrometer.

Just as with the Boston weather station instance document, the London weather station instance document will conform to a collection of schemas: weather-station.xsd and london-sensors.xsd:

london-weather-station.xml

Summary:

This method represents an extraordinarily powerful design pattern. The key to this design pattern is:

1. When you declare the variable content container element give it a type that is in another namespace, e.g., s:sensor_type

2. When you <import> that namespace don't provide a value for schemaLocation, e.g.,

3. Create any number of implementations of the dangling type, e.g.,

- boston-sensors.xsd
- london-sensors.xsd

4. In instance documents identify the schema that you want used to implement the dangling type, e.g.,

Both simpleType and complexType:

In our examples we have implemented the dangling type as a simpleType. The implementation of a dangling type does not have to be a simpleType. A schema could define it as a complexType.

Advantages:

Dynamic: A schema which contains a dangling type is very dynamic. It does not statically hard-code the identity of a schema to implement the type. Rather, it empowers the instance document author to identify a schema that implements the dangling type. Thus, at instance-document-creation the type implementation is provided (rather than at schema-document-creation)
Applicable to both Simple and Complex Types: A dangling type can be implemented as either a simpleType or a complexType. The other methods are only applicable to creating variable content containers with a complex type.

Disadvantages:

Different Namespace: The implementation of the dangling type must be in another namespace. It cannot be in the same namespace as the variable content container element. If you have a hard requirement that the contents of your variable content container have the same namespace as the container element then this method cannot be employed.

Best Practice

Which method you should use to create your variable content containers ultimately depends on your requirements. Here are some things to consider.

Use Method 2 (<choice> element) when:

You need to contain a collection of dissimilar, independent elements
It is adequate to have an external authority (i.e., a human) verify the collection of legal elements. Verification is accomplished by the external authority selecting which elements shall be allowed in the <choice> element
Growth of the collection of elements is tightly determined by the external authority that controls the schema.

Use Method 4 (dangling type) when:

You need a simpleType variable content container
You need to extend a simpleType
You need very dynamic, customizable content

Use Method 3 (abstract type with type substitution) when:

All the elements in the variable content container are of the same type, or derived from the same type
The collection of legal elements is verified at run time. Verification is accomplished by the schema validator verifying the type against the base type
The collection of elements may grow, independent of the container schema.

Best Practice: Method 4 + Method 3. That is, create a schema with a dangling type. Then, in the schemas which implements the dangling type use an abstract type with type substitution.

Acknowledgements

This issue has turned out to have many interesting twists and turns. Special thanks to Curt Arnold and Len Bullard for their many excellent inputs. Without their inputs, this document would not be nearly as complete and detailed. Also, thanks to Rick Jelliffe and Jeff Rafter.

Creating Variable Content Container Elements

(A Collectively Developed Set of Schema Design Guidelines)

Table of Contents

Issue

Introduction

Example

Method 1: Implementing variable content containers using an abstract element and element substitution

Description:

Implementation:

Advantages:

Disadvantages:

Method 2: Implementing variable content containers using a <choice> element

Description:

Implementation:

Advantages:

Disadvantages:

Method 3: Implementing variable content containers using an abstract type and type substitution

Description:

Implementation:

Advantages:

Disadvantages:

Method 4: Implementing variable content containers using a dangling type

Motivation:

Description:

Implementation:

Summary:

Both simpleType and complexType:

Advantages:

Disadvantages:

Best Practice

Acknowledgements