You are tasked with implementing a system's XML data validation requirements. For some data and deployment requirements, there is only one XML validation language that has the needed capability, so the selection of language is clear. For other requirements, however, there is a choice; the requirement could be implemented by several XML validation languages. How do you decide which language to use? What factors should go into making the decision? Should multiple languages be used, or is it better to implement all the data requirements in one language?
Suppose this XML instance document is representative of the type of data that the system exchanges:
<?xml version="1.0"?> <Document classification="secret"> <Para classification="unclassified"> One if by land; two if by sea. </Para> </Document>
And suppose the system's data requirements are:
The first requirement is a co-constraint and cannot currently be expressed using a grammar-based language. It must be implemented using Schematron. At the bottom of this document is a Schematron implementation of this Security Classification co-constraint.
For the next two requirements, however, there are alternative XML validation languages that could be used. Here's how the requirements could be implemented using the W3C XML Schemas:
<attribute name="classification"> <simpleType> <enumeration value="top-secret" /> <enumeration value="secret" /> <enumeration value="confidential" /> <enumeration value="unclassified" /> </simpleType> </attribute>
Here's how the requirements could be implemented using Relax NG:
<attribute name="classification"> <choice> <value>top-secret</value> <value>secret</value> <value>confidential</value> <value>unclassified</value> </choice> </attribute>
Here's how the requirements could be implemented using Schematron:
<sch:pattern name="Classification Values"> <sch:rule context="*[@classification]"> <sch:assert test="@classification='top-secret' or @classification='secret' or @classification='confidential' or @classification='unclassified'"> The value of a classification must be one of top-secret, secret, confidential, or unclassified. </sch:assert> </sch:rule> </sch:pattern>
All the implementations seem equally plausible. So how does one decide which language to use? What factors should enter into the decision?
For the example that we have been considering, Schematron must be used at least minimally to implement the Security Classification co-constraint. Suppose that after considering the various factors you decide to implement the second and third data requirements using a grammar-based language. That is, the implementation of the system's data requirements will be divided up across multiple languages.
The Schematron schema could be written to "assume" that all classification values are legal. However, to be safe, it is good practice to provide a "catch all" rule to catch any errors (not just illegal classification values). Here's how to implement the Security classification co-constraint with a catch all rule (see the last rule, in blue):
<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:pattern name="Security Classification Policy">
<sch:p>A Para's classification value cannot be more sensitive
than the Document's classification value.</sch:p>
<sch:rule context="Para[@classification='top-secret']">
<sch:assert test="/Document/@classification='top-secret'">
If there is a Para labeled "top-secret" then the Document
must be labeled top-secret
</sch:assert>
</sch:rule>
<sch:rule context="Para[@classification='secret']">
<sch:assert test="(/Document/@classification='top-secret') or
(/Document/@classification='secret')">
If there is a Para labeled "secret" then the Document
must be labeled either secret or top-secret
</sch:assert>
</sch:rule>
<sch:rule context="Para[@classification='confidential']">
<sch:assert test="(/Document/@classification='top-secret') or
(/Document/@classification='secret') or
(/Document/@classification='confidential')">
If there is a Para labeled "confidential" then the Document
must be labeled either confidential, secret or top-secret
</sch:assert>
</sch:rule>
<sch:rule context="Para[@classification='unclassified']">
<sch:assert test="(/Document/@classification='top-secret') or
(/Document/@classification='secret') or
(/Document/@classification='confidentia') or
(/Document/@classification='unclassified')">
</sch:assert>
</sch:rule>
<sch:p>Catch all rule: a valid Para element should fire on one
of the above rules. If for whatever reason none of the above
rules fire then drop into this "catch all" rule.
This rule will be fired if a Para doesn't have a classification
attribute or if it has an illegal classification value.</sch:p>
<sch:rule context="Para">
<sch:assert test="false()">
If there is a Para without a classification or with a classification
label other than top-secret, secret, confidential, or unclassified
then the document is in error
</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
The following people contributed to the creation of this document:
<?xml version="1.0"?> <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:pattern name="Security Classification Policy"> <sch:p>A Para's classification value cannot be more sensitive than the Document's classification value.</sch:p> <sch:rule context="Para[@classification='top-secret']"> <sch:assert test="/Document/@classification='top-secret'"> If there is a Para labeled "top-secret" then the Document must be labeled top-secret </sch:assert> </sch:rule> <sch:rule context="Para[@classification='secret']"> <sch:assert test="(/Document/@classification='top-secret') or (/Document/@classification='secret')"> If there is a Para labeled "secret" then the Document must be labeled either secret or top-secret </sch:assert> </sch:rule> <sch:rule context="Para[@classification='confidential']"> <sch:assert test="(/Document/@classification='top-secret') or (/Document/@classification='secret') or (/Document/@classification='confidential')"> If there is a Para labeled "confidential" then the Document must be labeled either confidential, secret or top-secret </sch:assert> </sch:rule> <sch:rule context="Para[@classification='unclassified']"> <sch:assert test="(/Document/@classification='top-secret') or (/Document/@classification='secret') or (/Document/@classification='confidentia') or (/Document/@classification='unclassified')"> </sch:assert> </sch:rule> </sch:pattern> </sch:schema>
Last Updated: July 19, 2007