Selecting Language(s) to Implement a System's Data Validation Requirements

Issue

You are tasked with implementing a system's XML data validation requirements. For some data and deployment requirements, there is only one XML validation language that has the needed capability, so the selection of language is clear. For other requirements, however, there is a choice; the requirement could be implemented by several XML validation languages. How do you decide which language to use? What factors should go into making the decision? Should multiple languages be used, or is it better to implement all the data requirements in one language?

Example

Suppose this XML instance document is representative of the type of data that the system exchanges:

        <?xml version="1.0"?>
        <Document classification="secret">
              <Para classification="unclassified">
                   One if by land; two if by sea.
              </Para>
        </Document>
    

And suppose the system's data requirements are:

  1. The <Para> classification value cannot be more sensitive than the <Document> classification value (top-secret is more sensitive than secret, which is more sensitive than confidential, which is more sensitive than unclassified).
  2. The <Document> element must have a classification attribute, whose value is either top-secret, secret, confidential, or unclassified.
  3. The <Para> element must have a classification attribute, whose value is either top-secret, secret, confidential, or unclassified.

The first requirement is a co-constraint and cannot currently be expressed using a grammar-based language. It must be implemented using Schematron. At the bottom of this document is a Schematron implementation of this Security Classification co-constraint.

For the next two requirements, however, there are alternative XML validation languages that could be used. Here's how the requirements could be implemented using the W3C XML Schemas:

        <attribute name="classification">
            <simpleType>
                <enumeration value="top-secret" />
                <enumeration value="secret" />
                <enumeration value="confidential" />
                <enumeration value="unclassified" />
            </simpleType>
        </attribute>
    

Here's how the requirements could be implemented using Relax NG:

        <attribute name="classification">
            <choice>
                <value>top-secret</value>
                <value>secret</value>
                <value>confidential</value>
                <value>unclassified</value>
            </choice>
        </attribute>
    

Here's how the requirements could be implemented using Schematron:

        <sch:pattern name="Classification Values"> 

           <sch:rule context="*[@classification]">

              <sch:assert test="@classification='top-secret' or
                                @classification='secret' or
                                @classification='confidential' or
                                @classification='unclassified'">
                  The value of a classification must be one of top-secret,
                  secret, confidential, or unclassified.
              </sch:assert>

           </sch:rule>

         </sch:pattern>
    

All the implementations seem equally plausible. So how does one decide which language to use? What factors should enter into the decision?

Factors to take into Consideration

Recommendation

For the example that we have been considering, Schematron must be used at least minimally to implement the Security Classification co-constraint. Suppose that after considering the various factors you decide to implement the second and third data requirements using a grammar-based language. That is, the implementation of the system's data requirements will be divided up across multiple languages.

The Schematron schema could be written to "assume" that all classification values are legal. However, to be safe, it is good practice to provide a "catch all" rule to catch any errors (not just illegal classification values). Here's how to implement the Security classification co-constraint with a catch all rule (see the last rule, in blue):

Security Classification Implementation with Catch-All Rule

<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">

   <sch:pattern name="Security Classification Policy">

      <sch:p>A Para's classification value cannot be more sensitive
             than the Document's classification value.</sch:p>

      <sch:rule context="Para[@classification='top-secret']">

         <sch:assert test="/Document/@classification='top-secret'">
             If there is a Para labeled "top-secret" then the Document
             must be labeled top-secret
         </sch:assert>

      </sch:rule>

      <sch:rule context="Para[@classification='secret']">

         <sch:assert test="(/Document/@classification='top-secret') or
                           (/Document/@classification='secret')">
             If there is a Para labeled "secret" then the Document
             must be labeled either secret or top-secret
         </sch:assert>

      </sch:rule>

      <sch:rule context="Para[@classification='confidential']">

         <sch:assert test="(/Document/@classification='top-secret') or
                           (/Document/@classification='secret') or
                           (/Document/@classification='confidential')">
             If there is a Para labeled "confidential" then the Document 
             must be labeled either confidential, secret or top-secret
         </sch:assert>

      </sch:rule>

      <sch:rule context="Para[@classification='unclassified']">

         <sch:assert test="(/Document/@classification='top-secret') or
                           (/Document/@classification='secret') or
                           (/Document/@classification='confidentia') or
                           (/Document/@classification='unclassified')">
         </sch:assert>

      </sch:rule>

      <sch:p>Catch all rule: a valid Para element should fire on one
         of the above rules. If for whatever reason none of the above
         rules fire then drop into this "catch all" rule.
         This rule will be fired if a Para doesn't have a classification
         attribute or if it has an illegal classification value.</sch:p>

      <sch:rule context="Para">

         <sch:assert test="false()">
             If there is a Para without a classification or with a classification 
             label other than top-secret, secret, confidential, or unclassified 
             then the document is in error
         </sch:assert>

      </sch:rule>


   </sch:pattern>

</sch:schema>
    

Acknowledgements

The following people contributed to the creation of this document:

Schematron Implementation of the Security Classification Co-Constraint

<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">

   <sch:pattern name="Security Classification Policy">

      <sch:p>A Para's classification value cannot be more sensitive
             than the Document's classification value.</sch:p>

      <sch:rule context="Para[@classification='top-secret']">

         <sch:assert test="/Document/@classification='top-secret'">
             If there is a Para labeled "top-secret" then the Document
             must be labeled top-secret
         </sch:assert>

      </sch:rule>

      <sch:rule context="Para[@classification='secret']">

         <sch:assert test="(/Document/@classification='top-secret') or
                           (/Document/@classification='secret')">
             If there is a Para labeled "secret" then the Document
             must be labeled either secret or top-secret
         </sch:assert>

      </sch:rule>

      <sch:rule context="Para[@classification='confidential']">

         <sch:assert test="(/Document/@classification='top-secret') or
                           (/Document/@classification='secret') or
                           (/Document/@classification='confidential')">
             If there is a Para labeled "confidential" then the Document 
             must be labeled either confidential, secret or top-secret
         </sch:assert>

      </sch:rule>

      <sch:rule context="Para[@classification='unclassified']">

         <sch:assert test="(/Document/@classification='top-secret') or
                           (/Document/@classification='secret') or
                           (/Document/@classification='confidentia') or
                           (/Document/@classification='unclassified')">
         </sch:assert>

      </sch:rule>

   </sch:pattern>

</sch:schema>
    

Tags

Last Updated: July 19, 2007