Creating Variable Content Container Elements
(A Collectively Developed Set of Schema Design Guidelines)
Table of Contents
Issue
What is the Best Practice for implementing a container element that is to be
comprised of variable content?
Introduction
A typical problem when creating an XML Schema is to design a container element
(e.g., Catalogue) which is to be comprised of variable content (e.g., Book,
or Magazine, or ...)
Some things to consider:
- Do we allow the elements in the variable content container to come
from disjoint sources, i.e., do we allow the container element to contain
dissimilar, independent, loosely coupled elements?
- How do we design the variable content container so that
the kinds of elements which it may contain can grow over time, i.e.,
how do we design an extensible variable content container?
Example
Throughout this discussion we will consider variable content containers
(e.g., <Catalogue>) which are comprised
of a collection of elements, where each element is variable.
Here's an example of a <Catalogue> container element comprised of two different kinds of
elements:
Below are four methods for implementing variable content containers.
Method 1: Implementing variable content containers using an abstract element and
element substitution
Description:
There are five XML Schema concepts that must be understood for
implementing this method:
- an element can be declared abstract.
- abstract elements cannot be instantiated in instance documents (they are only placeholders).
- in instance documents the abstract element must be substituted by
non-abstract (i.e., concrete) elements which have been declared to be in a substitutionGroup with
the abstract element.
- elements may be declared to be in a substitutionGroup with the abstract element
iff their type is the same as, or derives from the abstract element's
type.
- the abstract element and all elements in its substitutionGroup must be declared as global
elements.
Implementation:
Declare an abstract element (Publication):
Declare a variable content container element (Catalogue) to have as its content the
abstract element ("ref" to the abstract element declaration):
Note that maxOccurs="unbounded", so Catalogue may contain a collection (one or more) of Publication elements.
Declare the concrete elements (Book and Magazine) that are to be the contents of the variable content container
and declare them to be in a substitutionGroup with the
abstract element:
In order for Book and Magazine to substitute for Publication, their types (BookType
and MagazineType) must derive from Publication's type (PublicationType). Here are the type
definitions:
PublicationType - the base type:
BookType - extends PublicationType by adding two new elements, ISBN
and Publisher:
MagazineType - restricts PublicationType by striking out the Author
element:
Advantages:
- Extensible: This method allows you to extend the set of elements that
may be used the variable content container element, even if the schema for the variable content container element
is outside your control. For example, suppose that you do not
have privilege to modify the above Catalogue schema. Currently,
the Catalogue element can only contain Book and Magazine elements. But suppose that your
application has a hard requirement for CD elements as well:
How can you extend the set of elements that Catalogue may be comprised of, without modifying its schema?
Answer: You can create your own separate schema which contains a declaration of CD
(with a type, CDType, that extends the PublicationType in the Catalogue schema), and declares CD to be in
the Publication substitutionGroup:
The CD element meets the requirements for being in the variable
content container:
- its type (CDType) derives from the PublicationType, and
- it is a member of the Publication element's substitutionGroup.
Book, Magazine, and CD may now be used within the Catalogue element.
Thus, we see that this method allows us to extend the set of elements that may be used in the
Catalogue element, without modifying its schema. Nice!
- Semantic Cohesion: the elements in the variable content container
all descend from the same type hierarchy (PublicationType). This type hierarchy binds them together,
giving a structural (and, by implication, semantic) coherence to all the elements that may be in the
variable content container.
Disadvantages:
- No Independent Elements: The type of the elements that are to be used in the variable
content container must all descend from the abstract element's type (PublicationType).
Further, the elements must be in a substitutionGroup with the
abstract element. Thus, the variable content container cannot
contain elements whose type does not derive from the abstract
element's type, or is not in the substitutionGroup with the abstract
element - as would typically be the case with independently developed
elements. For example, suppose another schema author creates a
"Newspaper" element, with a type that does not descend from
PublicationType.
<Catalogue> would not be able to contain the <Newspaper>
element.
- Limited Structural Variability: Over time a schema will evolve, and
the kinds of elements which may occur
in the variable
content container will typically grow. There is no way to know apriori in what direction it
will grow. The new elements may be conceptually related but
structurally vastly different from the original set of elements.
The abstract element's type (e.g., PublicationType) may have been
a good base type for the original set of elements which were all
structurally related, but may not be a good base type for the
new elements which have vastly different structures.
So you are faced with a tradeoff:
- create a simple base type to support lots of different structures
(but then you can make less assumptions about the structure of the members), or
- create a rich base type to support strong data type checking
(but then you reduce the ability to add elements with radically different types)
- Nonscalable Processing: Processing a collection of differently named elements
requires a lot of special-case code. For example, consider
a stylesheet to process each element in <Catalogue>:
This stylesheet snippet suffers from lack of scalability, i.e., it
breaks as soon as a new element is added.
This argument needs some qualification. If the contents of <Catalogue>
are just elements that substitute for the abstract Publication element, then
each element can be uniformly processed, as follows:
This stylesheet snippet processes each element within Catalogue,
regardless of the element name. Obviously, this is scalable, and
does not break when a new element is added.
Processing becomes non-scalable when Catalogue contains multiple
abstract elements:
Suppose that both Publication and Retailer are abstract elements, and there can be any
number of each kind of element within Catalogue. Here's a sample instance:
If you wish to process just the Publication elements (e.g., Book, Magazine)
then you will need to write special-case code, as shown above. This
is not scalable. Every time a new element is added into the collection of
elements that may substitute for the Publication element then your code will
have to be updated. This is costly.
- No Control over Namespace Exposure: This method requires that the
elements which may be used in the variable content container be in a
substitutionGroup with the abstract element (e.g., Book and Magazine must
be in a substitutionGroup with Publication). A requirement of using substitionGroup
is that all elements must be declared globally. The namespace of global elements
can never be hidden in instance documents. As a consequence, there is no
way to hide (localize) the namespaces of the elements used in the variable content container.
This fails the
Best Practice rule which states that you should design your schema to be able to
hide or expose namespaces at your discretion (using elementFormDefault as an exposure
switch). (See
Hide (Localize) Versus Expose Namespaces)
Method 2: Implementing variable content containers using a <choice> element
Description:
This method is quite straightforward - simply list within a <choice>
element all the elements which can appear in the variable content
container, and embed the <choice> element in the container element.
Implementation:
Declare within a <choice> element all the elements (e.g., Book, Magazine) that may be used in
the variable content container. Embed the <choice>
element within the container element (Catalogue):
Advantages:
- Independent Elements: The elements in the variable content
container do not need a common
type ancestry. They don't have to be related in any way.
Thus, the variable content container can contain
dissimilar, independent, loosely coupled elements.
Disadvantages:
- Nonextensible: Suppose that the Catalogue schema is outside your control. Currently
the variable content container only supports Book and Magazine. Suppose that you
have a hard requirement for your instance documents to use CD as well as Book and Magazine within Catalogue, e.g.,
This method requires that the <choice> element in the Catalogue
schema be modified to include the CD element. However, we stipulated that the
Catalogue schema is outside your control, so it cannot be modified. This
method has serious extensibility restrictions!
- No Semantic Coherence: The <choice> element allows you to group together dissimilar
elements. While that has been touted as an advantage, it is really
a double-edged sword. The elements in the variable content container
have no type hierarchy to bind them together, to provide structural
(and, by implication, semantic) coherence
among the elements. Thus, when processing an instance document
you can make no assumptions about the
structure of the elements.
Method 3: Implementing variable content containers using an abstract type and type
substitution
Description:
There are three XML Schema concepts that must be understood for
implementing this method:
- a complexType can be declared abstract.
- an element declared to be of an abstract type cannot have its type
instantiated in instance documents (that is, the element can be instantiated,
but its abstract content may not).
- in instance documents an element with an abstract type must have
its content substituted by content from a non-abstract (concrete) type which
derives from the abstract type. This is called type substitution.
Implementation:
Define an abstract base type (PublicationType):
Declare the container element (Catalogue) to contain an element
(Publication), which is of the abstract type:
In instance documents, the content of <Publication> can only be of a
concrete type which derives from PublicationType, such as BookType
or MagazineType (we saw these type definitions in Method 1 above).
With this method instance documents will look different than we saw
with the above two methods. Namely, <Catalogue> will not contain
variable content. Instead, it will always contain the same element
(Publication). However, that element will contain variable content:
Advantages:
- Extensible: Same extensibility benefits as method 1. Namely, this method allows you to
easily extend the set of elements that may be used in the variable
content container simply by creating new types which derive from the
abstract type. These new types can be defined in a separate, independent
schema. Thus, you are able to extend the schema without modifying
it!
- Minimal Dependencies: This method has less dependencies (coupling) than method 1.
To extend the collection of elements that may appear in a variable content container using method 1 you need access to both the abstract element (Publication)
and its type (PublicationType). With method 3 you only need access to the
abstract type. If we assume that in a typical scenario only the types will
be put in publicly accessible schemas, then method 3 is the only viable method.
- Scalable Processing: Processing a series of <Publication> elements is scalable.
For example,
a stylesheet could process each publication element as follows:
As new types are created (e.g., CDType) no change is needed to the code.
- Semantic Cohesion: the elements in the variable content container
all descend from the same type hierarchy. This type hierarchy binds them together,
giving a structural (and, by implication, semantic) coherence among the elements.
- Control over Namespace Exposure: the variable part of the variable content
container are the element declarations that are embedded within type definitions.
Consequently, we can control exposure of the namespaces of the variable content container elements.
This is consistent with the Best Practice design recommendation we issued
for hide (localize) versus expose namespaces. (See
Hide (Localize) Versus Expose Namespaces)
Disadvantages:
- No Independent Elements: Same weakness as with method 1.
All types must descend from
an abstract type. This requirement prohibits the use of types
which do not descend from the abstract type, as would typically be
the situation when the type is in another, independently developed
schema.
- Limited Structural Variability: Same weakness as with method 1.
Namely, to facilitate strong type checking you want to have a rich base type,
but this is in direct conflict with the desire for components
with vastly different structures, which calls for a weak base type.
Method 4: Implementing variable content containers using a dangling type
Motivation:
Thus far our variable content container has contained complex content (i.e., child elements).
Suppose that we want to create a variable content container to hold simple content?
None of the previous methods can be used.
We need a method that allows us to create simpleType variable content containers.
There is one key XML Schema concept that must be understood for
implementing this method:
- with an <import> element the schemaLocation attribute is optional
Description:
Let's take an example. Suppose that we desire an element, sensor, which contains
the name of a weather station sensor. For example:
There are several things to note:
- This element holds a simpleType
- Each weather station may have sensors
that are unique to it. Consequently,
we must design our schema so that the sensor
element can be customized by each weather
station
Here's an elegant design for making the contents of <sensor> customizable
by each weather station:
- When you create sensor, declare
it to be of a type from another namespace.
- Then, when you
<import> that namespace don't provide a schemaLocation.
- Thus, the element is declared to be of a type
for which no particular schema is identified,
i.e., we have a dangling type!
|
Implementation:
Let's go through the design, step by step. In your schema, declare the sensor element:
Note that the sensor element is declared to have a type "sensor_type",
which is in a different namespace - the sensor namespace:
Now here's the key - when you <import> this namespace, don't
provide a value for schemaLocation! (In an import element schemaLocation is optional.) For example:
The instance document must then identify a schema
that implements sensor_type. Thus, at run time (i.e., validation time)
we are matching up the reference to sensor_type
with an implementation of sensor_type.
For example, an instance document may have this:
In this instance document schemaLocation is
identifying a schema, boston-sensors.xsd,
which is to provide the implementation of sensor_type.
Let's take a look at the schemas and instance documents for the weather station sensor example we have been considering.
Here's the main schema, which
contains the dangling type:
weather-station.xsd
Note that the <import> element does not have a schemaLocation
attribute to identify a particular schema which implements
sensor_type. (Stated differently, this schema does not hardcode in the identity
of the schema which is to provide the implementation of sensor_type.) The schema validator will resolve the
reference to sensor_type based upon collection of schemas
that is provided to it in the instance document.
The Boston weather station creates a schema which implements
sensor_type:
boston-sensors.xsd
Now an instance document can conform to weather-station.xsd
and use boston-sensors.xsd as the implementation of sensor_type:
boston-weather-station.xml
Suppose that the London weather station has all the sensors
that Boston has, plus some additional ones that are unique to the London
weather patterns. Thus, London will create its own implementation
of sensor_type:
london-sensors.xsd
Note that this schema has an additional sensor_type
that Boston does not have - hygrometer.
Just as with the Boston weather station instance document, the London weather station instance document will conform to
a collection of schemas: weather-station.xsd and london-sensors.xsd:
london-weather-station.xml
Summary:
This method represents an extraordinarily powerful design pattern. The key to this design pattern is:
1. When you declare the variable content container
element give it a type that is in another
namespace, e.g., s:sensor_type
2. When you <import> that namespace don't provide
a value for schemaLocation, e.g.,
3. Create any number of implementations of the dangling type, e.g.,
- boston-sensors.xsd
- london-sensors.xsd
4. In instance documents identify the schema that
you want used to implement the dangling type, e.g.,
Both simpleType and complexType:
In our examples we have implemented the dangling type as a simpleType.
The implementation of a dangling type does not have
to be a simpleType. A schema could define it as a complexType.
Advantages:
- Dynamic: A schema which contains a dangling type is very dynamic. It does
not statically hard-code the identity of a schema to implement the type. Rather, it
empowers the instance document author to identify a schema that implements the dangling type.
Thus, at instance-document-creation the type implementation is provided
(rather than at schema-document-creation)
- Applicable to both Simple and Complex Types: A dangling type can be implemented
as either a simpleType or a complexType. The other methods are only applicable to
creating variable content containers with a complex type.
Disadvantages:
- Different Namespace: The implementation of the dangling type must be in
another namespace. It cannot be in the same namespace as the variable content
container element. If you have a hard requirement that the contents
of your variable content container have the same namespace as the container element
then this method cannot be employed.
Best Practice
Which method you should use to create your variable content containers ultimately
depends on your requirements. Here are some things to consider.
Use Method 2 (<choice> element) when:
- You need to contain a collection of dissimilar, independent elements
- It is adequate to have an external authority (i.e., a human) verify the collection of legal elements.
Verification is accomplished by the external authority selecting which elements shall be
allowed in the <choice> element
- Growth of the collection of elements is tightly determined by the external authority that
controls the schema.
Use Method 4 (dangling type) when:
- You need a simpleType variable content container
- You need to extend a simpleType
- You need very dynamic, customizable content
Use Method 3 (abstract type with
type substitution) when:
- All the elements in the variable content container are of the same type, or derived from the same type
- The collection of legal elements is verified at run time.
Verification is accomplished by the schema validator verifying the type against
the base type
- The collection of elements may grow, independent of the container schema.
Best Practice: Method 4 + Method 3. That is, create a schema with a dangling type.
Then, in the schemas which implements the dangling type use an abstract type with
type substitution.
Acknowledgements
This issue has turned out to have many interesting twists and turns.
Special thanks to Curt Arnold and Len Bullard for their many excellent inputs. Without their
inputs, this document would not be nearly as complete and detailed. Also,
thanks to Rick Jelliffe and Jeff Rafter.