Extending XML Schemas
(A Collectively Developed Set of Schema Design Guidelines)
Table of Contents
Issue
What is Best Practice of checking instance documents for constraints that are not
expressable by XML Schemas?
Tutorial
This document contains an overview of the topic of extending XML Schemas. For
a much more in-depth, hands-on tutorial on extending XML Schemas please see the
Best Practices Homepage.
The tutorial contains fully worked examples, labs, and a (Powerpoint) tutorial.
Introduction
XML Schemas is very powerful. However, it is not "all powerful". There are many
constraints which cannot be expressed with XML Schemas.
Example. Consider this simple instance document:
With XML Schemas we can check the following constraints:
- the Demo (root) element contains a sequence of elements, A followed by B
- the A element contains an integer
- the B element contains an integer
In fact, here's an XML Schema which expresses these constraints:
XML Schemas does not give us the capability to express the following constraint:
- the value of A must be greater than the value of B
So what do we do to check this constraint? (Interestingly, for the above
instance document, the XML Schema that is shown would accept it as valid, whereas, in fact it
is not since the value of A is less than the value of B. We need something
else to check this constraint.) There are three options.
Three Options for Extending XML Schemas
(1) Supplement with Another Schema Language
There are many other schema languages besides XML Schemas:
Thus, the first option is to use one (or more) of these schema languages to express the additional constraints.
For example, using Schematron you can embed the additional constraints within
the XSD document (within <appinfo> elements). The XSD document shown earlier
has been enhanced (below) with Schematron directives:
Schematron will extract the directives out of the XSD document to create a
Schematron schema. Schematron will then validate the instance document
against the Schematron schema.
The key points to note about using Schematron are:
- The additional constraints are embedded in <appinfo> elements within the
XML Schema document
- The constraints are expressed using <assert> elements
(2) Write Code to Express Additional Constraints
The second option is to write some Java, Perl, C++, etc code to check additional constraints.
(3) Express Additional Constraints with an XSLT/XPath Stylesheet
The third option is to write a stylesheet to check the constraints.
For example, the following stylesheet checks instance documents to see
if the contents of the A element is greater than the contents of the B
element:
Upon running this stylesheet on the above XML data the following output is generated:
This is exactly what is desired.
Thus, the methodology for this third option is:
- check as many constraints as you can using XML Schemas
- for all other constraints write a stylesheet to do the checking
If both the schema validator and the XSL processor generate a positive output
then you know that your instance document is valid.
This combination of XML Schemas plus stylesheets provides for a
powerful constraint checking mechanism.
Advantages/Disadvantages of the Three Options
(1) Supplement with Another Schema Language
Advantages
- Collocated Constraints: Above we saw how Schematron can be used to
express additional
constraints. We saw that you embed the Schematron directives within
the XML Schema document. There is something very appealing about having
all the constraints expressed within one document rather than being dispersed
over multiple documents. [Editor's Note: This ability to collocate constraints within the XSD
document is a feature of Schematron. As far as I know the other schema languages
do not have this capability.]
- Simplicity: Many of the schema languages were created in reaction to
the complexity and limitations of XML Schemas. Consequently, most of them are
relatively simple to learn and use.
Disadvantages
- Multiple Schema Languages may be Required: Each schema language has its own
capabilities and limitations. Multiple schema languages may be required to express
all the additional constraints.
- Yet Another Vocabulary (YAV): There are many schema languages, each with
its own vocabulary and semantics. How do you find a schema language with the capability
to express your problem's additional constraints? You have to take the time to
learn each of the schema languages. Hopefully, you will find one that supports
expression of your constraints. Although relatively easy to learn
and use, it still takes time to learn a new vocabulary and semantics.
- Questionable Long Term Support: In most cases the schema languages listed above
were created by a single author. These authors are busy, very bright people.
Someday their interests will move to something else. At that time you may be
left with a product which is no longer supported. [Editor's Note: Schematron is
basically just a few XSLT/XPath stylesheets. Consequently, Schematron will be
supported as long as there are XSL processors. Also, the author of RELAX has
publically promised to support RELAX for the next five years.]
(2) Write Code to Express Additional Constraints
Advantages
- Full Power of a Programming Language: The advantage of this
option is that with a single programming language you can express all the
additional constraints.
Disadvantages
- Not Leveraging other XML Technologies: There are other XML technologies
that could be used to express the additional constraints in a declarative manner,
without going through the compiling, linking, executing effort.
(3) Express Additional Constraints with an XSLT/XPath Stylesheet
Advantages
- Application Specific Constraint Checking: Each application can
create its own stylesheet to check constraints that are unique to the application.
We can enhance the schema without touching it!
- Core Technology: XSLT/XPath is a "core technology" which is well
supported, well understood, and with lots of material written on it.
- Expressive Power: XSLT/XPath is a very powerful language. Most, if
not every, constraint that you might ever need to express can be
expressed using XSLT/XPath. Thus you don't have to learn multiple
schema languages to express your additional constraints
- Long Term Support: XSLT/XPath is well supported, and will be
around for a long time.
Disadvantages
- Separate Documents: With this approach you will write your XML Schema
document, then you will write a separate XSLT/XPath document to express
additional constraints. Keeping the two documents in synch needs to be carefully
managed