Enhancing Data Interoperability with Ontologies, Canonical Forms, and Include Files

by Roger L. Costello

August 10, 2003

Issue: There are many different units-of-measure. Lengths, speeds, locations, etc can be expressed in many different ways. Fusing two independent pieces of data that, for example, express the same length but use different units requires the fusing agent to recognize that the data is related via a transformation. How can relationships which involve transformations be declaratively expressed? How can an application process input data that may be expressed in many different ways? As an application interacts with a broader range of trading partners it can expect increasing diversity of expression. How can the application be designed to effectively process data in different forms, without constant code updates?

Executive Summary

Declaratively expressing relationships between entities is good. Relationship information can be used to enhance application interoperability.
RDF Schema (RDFS) and OWL are ontology languages designed for declaratively expressing relationships
There are many important relationships that cannot be captured by RDFS or OWL. Namely, relationships which involve transformation between entities. Here are some examples:
1. Length: A physical length can be expressed in many different forms - as miles, kilometers, furlongs, etc. Relating one form to another requires a transformation. For example, a distance expressed in miles is related to a distance expressed in kilometers by this transformation: kilometer = mile * 1.609344
2. Location: A location can be expressed in may different forms - as cartesian coordinates, polar coordinates, etc. Relating one form to another requires a transformation.
3. Speed: A physical speed can be expressed in many different forms - as miles per hour, meters per second, etc. Relating one form to another requires a transformation.
4. Rate of Fuel Consumption: The rate at which an object consumes fuel can be expressed in many different forms - as miles per gallon, kilometers per liter, etc. Relating one form to another requires a transformation.
Interoperability will be further enhanced if we can find a good approach for expressing these important transformation relationships.
The approach proposed in this paper is:
1. With an ontology language (OWL) declaratively express the canonical form. For example, here are the canonical forms of the above:
  1. Length: The canonical form of a physical length is a value with Meter units.
  2. Location: The canonical form of a location is a value in cartesian coordinates.
  3. Speed: The canonical form of a speed is a value with Meters/Second units.
  4. Rate of Fuel Consumption: The canonical form of a rate of fuel consumption is a value with Meters/Liter units.
2. As well as declaratively expressing the canonical form, the ontology also identifies the location of an include file.
3. With a programming language create an include file that codes the transformation from non-canonical forms to the canonical form.
4. Applications use the ontology and include file as follows:
  1. Applications include the include file.
  2. Applications receiving data that is not in canonical form invokes a function in the include file. The function returns the data in canonical form. The application processes the canonical form.

Enhancing Interoperability with Ontologies

Ontology languages such as RDF Schema (RDFS) and OWL provide the ability to declaratively express the relationships between entities. For example, you can state "type of" relationships such as:

SLR (Single Lens Reflex) is a type of Camera

You can state synonym relationships such as:

focal-length is synonymous with aperture

Applications can use this relationship information to enhance interoperability. For example, suppose an application is coded to process Camera data, but has not been coded to understand SLR. If the input contains SLR data then the application can dynamically discover, by consulting a Camera ontology, that a SLR is a type of Camera. Thus, the ontology provides the relationship information that is needed to make the bridge between terms that the application doesn't understand (e.g., SLR) to terms that it does understand (e.g., Camera). In this way ontologies can enhance interoperability. For more information on this topic, see [1].

Lesson Learned: Declaratively expressing relationships is good! It helps applications dynamically understand data. Interoperability is enhanced.

Corollary: Ontologies are languages intended for declaratively expressing relationships. Ontologies are good! They promote interoperability. Use ontologies!

Ontologies can't State Relationships which Involve Transformations

Ontologies are not able to state relationships between entities that are related via a transformation.

Example 1: Same Length, Different Units

For example, these two XML fragments represent the same physical length:


    <River id="Yangtze">
        <length units="kilometer">6300</length>
    </River>

    <River id="Yangtze">
        <length units="mile">3914</length>
    </River>

The two lengths are related by this mathematical relationship (transformation):


    kilometer = mile * 1.609344

Neither RDFS nor OWL can express this important relationship.

Example 2: Same Location, Different Coordinate Systems

Here's another example ... these two XML fragments represent the same location:


    <Map id="M1">
        <location>
            <cartesian-coordinate>
                <x units="kilometer">100</x>
                <y units="kilometer">100</y>
            </cartesian-coordinate>
        </location>
    </Map>

    <Map id="M2">
        <location>
            <polar-coordinate> 
                <r units="kilometer">141.421</x>
                <theta units="radian">0.7341</theta>
            </polar-coordinate>
        </location>
    </Map>

The two locations are related by this mathematical formula:


    x = r cos theta
    y = r sin theta

Again, neither RDFS nor OWL can express this relationship.

Example 3: Same Speed, Different Units

There are many ways to express the speed of an object - miles per hour, kilometers per hour, meters per second, etc. It would be very useful if an application could recognize that two expressions represent the same speed, despite using different units. For example, these two XML fragments express the same speed, but use different units:


    <Comet id="Hale-Bopp">
        <avg-speed units="miles-per-hour">93951.3</avg-speed>
    </Comet>

    <Comet id="Hale-Bopp">
        <avg-speed units="kilometers-per-second">42.0</avg-speed>
    </Comet>

The two speeds are related by this mathematical formula:


    miles-per-hour = kilometers-per-sec * 3600 / 1.609344

Neither RDFS nor OWL can express this relationship.

Example 4: Same Rate of Fuel Consumption, Different Units

There are many ways to express the rate of fuel consumption of an object - miles per gallon, kilometers per liter, etc. It would be very useful if an application could recognize that two expressions represent the same fuel consumption rate, despite using different units. For example, these two XML fragments show the same fuel consumption rate, but use different units:


    <Toyota id="Tercel-95">
        <mileage units="miles-per-gallon">29.0</mileage>
    </Toyota>

    <Toyota id="Tercel-95">
        <mileage units="kilometers-per-liter">12.3</mileage>
    </Toyota>

The two fuel consumption rates are related by this mathematical formula:


    miles-per-gallon = kilometers-per-liter * 1.609344 / 3.785

Neither RDFS nor OWL can express this relationship.

Lesson Learned: There are many important relationships that cannot be expressed using the current ontology languages RDFS and OWL. Thus, the full potential for interoperability cannot be realized with the current form of RDFS and OWL.

Purpose of this Document

As we saw above relationship information is useful - it can be used to enhance application interoperability. We also saw the RDFS and OWL cannot express many important relationships, such as those that require transformations. It would be beneficial if we could declaratively express these relationships. [For the Map example above, imagine how useful it would be if applications could dynamically recognize that the two Map documents are providing data for the same location, just using different coordinate systems. Very powerful!]

How can relationships involving transformations be expressed? The purpose of this document is to provide a concrete approach to expressing these relationships, and for using these relationships.

State the Relationship Between Each Pair of Units-of-Measure?

Consider the number of different units that can be used to express length - kilometer, mile, meter, inches, centimeter, furlong, etc. Stating the relationship between every possible length unit-of-measure would be very complex ("n" units would require n-squared pairs). Likewise, consider the number of different coordinate systems - rectangular cartesian, spherical polar, cylindrical polar, etc. Again, the complexity of defining the relationship between every possible pair of coordinate systems is too great. The same is true for speed and rate of fuel consumption. This is not a good approach.

State the Relationship to the Canonical Form

All units-of-measure have a Standard International (SI) canonical form. For example, the canonical form for a length measure is meter. For coordinate systems the canonical form is the cartesian coordinate system. For speed the canonical form is meters/sec. For rate of fuel consumption the canonical form is meters/liter.

Above we looked at the problem of stating the relationship of every possible pair. A better approach is to simply state the relationship to the canonical form. This reduces the complexity greatly (for "n" units, the complexity is n).

How do Applications Execute the Transformations?

RDFS and OWL are designed to express equivalence and type-of relationships of objects. They weren't designed to express relationships involving transformations. We would need a rather large extension to these ontology languages to enable them to express mathematical relationships.

Further, even if it were possible to extend the languages to express the mathematical relationships there still remains the problem of executing the transformation. For example, suppose that an ontology declaratively expressed the formula to convert a polar coordinate to cartesian coordinate. And suppose an application received input data in polar coordinates, but needs it in cartesian coordinates. Suppose the ontology provides the conversion formula, and suppose the application dynamically retrieves the formula. The application may not be smart enough to dynamically convert the polar coordinates to cartesian coordinates. (In the general case, the problem for the application is to be able to dynamically convert an arbitrary coordinate system to another by using formulas that are dynamically provided. This is very difficult indeed.)

Lesson Learned: Simply expressing a mathematical formula in an ontology may not be very helpful to applications.

A Better Approach: Use Ontologies, Canonical Forms and Include Files

A better approach is to capitalize on what ontologies are good at, as well as on what programming language are good at. With a small extension to OWL we can state the canonical form. With a programming language we can code the conversion from non-canonical forms to the canonical form.

The approach is very simple:

With an ontology language state the canonical form.
With a programming language create code to convert non-canonical forms to the canonical form.

Let's take an example. The SI canonical form for length measures is the meter. For example, this XML instance document is expressing the length of the Yangtze River in the canonical form:


    <River rdf:ID="Yangtze">
        <length>
            <Length>
                <value>6300000</value>
                <units rdf:resource="#Meter"/>
            </Length>
        </length>
    </River>

The canonical form of the Length class is:

a value property with a range of xsd:decimal in canonical form
a units property with a range of len:Units-of-Measure in canonical form

Note that the canonical form is defined recursively: the canonical form of Length is the set of properties in their canonical form

Expressing the Canonical Form of the Length Class in an Ontology

With a small extension to OWL the canonical form of Length can be expressed:
[Note: owlx = OWL eXtension. The <canonicalForm>, <IncludeFile>, and <location> elements are extensions to OWL.]


    <owl:Class rdf:ID="Length">
        <owlx:canonicalForm>
            <owl:Class>
                <owl:unionOf rdf:parseType="Collection">
                    <owl:Restriction>
                        <owl:onProperty rdf:resource="#value"/>
                        <owl:hasValue rdf:resource="http://www.w3.org/2001/XMLSchema#decimal"/>
                    </owl:Restriction>
                    <owl:Restriction>
                        <owl:onProperty rdf:resource="#units"/>
                        <owl:hasValue rdf:resource="#Length-Unit-of-Measure"/>
                    </owl:Restriction>
                </owl:unionOf>
            </owl:Class>
            <owlx:IncludeFile>
                <rdf:type rdf:resource="XSLT2.0"/>
                <owlx:location rdf:resource="Length-Include-File.xsl"/>
            </owlx:IncludeFile>
        </owlx:canonicalForm>
    </owl:Class>

This is read as: "The canonical form of the Length class are instances that have a 'value' property in canonical decimal form, and a 'units' property in canonical Length-Unit-of-Measure form. An include file for converting Length classes to canonical form may be found in Length-Include-File.xsl, and it is implemented using XSLT2.0"

The canonical form of xsd:decimal is defined by the XML Schema specification.

Expressing the Canonical Form of Length-Unit-of-Measure

The canonical form for Length-Unit-of-Measure is the Meter:


    <owl:Class rdf:ID="Length-Unit-of-Measure">
        <owlx:canonicalForm rdf:resource="#Meter"/>
    </owl:Class>

This class does not have an <IncludeFile> element. A <canonicalForm> element may contain zero or more <IncludeFile> elements. The Length-Unit-of-Measure class does not have a <canonicalForm> element, indicating that there is no function available specifically for converting Length-Unit-of-Measure to canonical form. This makes sense, since this class is never used - only its subclasses are used.

Meter, Kilometer, Mile, etc are all subclasses of Length-Unit-of-Measure:


    <owl:Class rdf:ID="Meter">
        <rdfs:subClassOf rdf:resource="#Length-Unit-of-Measure"/>
    </owl:Class>

    <owl:Class rdf:ID="Kilometer">
        <rdfs:subClassOf rdf:resource="#Length-Unit-of-Measure"/>
    </owl:Class>

    <owl:Class rdf:ID="Mile">
        <rdfs:subClassOf rdf:resource="#Length-Unit-of-Measure"/>
    </owl:Class>

The units property is defined to have any Length-Unit-of-Measure value:


    <owl:ObjectProperty rdf:ID="units">
        <rdfs:range rdf:resource="#Length-Unit-of-Measure"/>
    </owl:ObjectProperty>

Thus, this is a valid instance document:


    <River rdf:ID="Yangtze">
        <length>
            <Length>
                <value>6300</value>
                <units rdf:resource="#Kilometer"/>
            </Length>
        </length>
    </River>

However, as the ontology shows, Length is not in canonical form.

Applications can Process any Length, Regardless of the Units

It is important to note that instance documents are not prohibited from expressing Length in a non-canonical form. Quite the contrary. Diversity is encouraged! Applications are empowered to process any data that it receives, in any form. To achieve this, an include file is created that provides code for converting each non-canonical form into a canonical form.

Length Include File

Here is the XSLT 2.0 include file referenced by the ontology for converting Length data that is not in canonical form:


    <xsl:function name="len:Length" as="element()">
        <xsl:param name="length" as="item()"/>

        <xsl:choose>
            <xsl:when test="$length/len:units/@rdf:resource='http://www.xfront.com/owl/ontologies/Length/#Kilometer'">
                <Length xmlns="http://www.xfront.com/owl/ontologies/Length/#">
                    <value><xsl:value-of select="$length/len:value * 1000"/></value>
                    <units rdf:resource="http://www.xfront.com/owl/ontologies/Length/#Meter"/>
                </Length>
            </xsl:when>
            <xsl:when test="$length/len:units/@rdf:resource='http://www.xfront.com/owl/ontologies/Length/#Mile'">
                <Length xmlns="http://www.xfront.com/owl/ontologies/Length/#">
                    <value><xsl:value-of select="$length/len:value * 1609.344"/></value>
                    <units rdf:resource="http://www.xfront.com/owl/ontologies/Length/#Meter"/>
                </Length>
            </xsl:when>
            <xsl:when test="$length/len:units/@rdf:resource='http://www.xfront.com/owl/ontologies/Length/#Furlong'">
                <Length xmlns="http://www.xfront.com/owl/ontologies/Length/#">
                    <value><xsl:value-of select="$length/len:value * 201.168"/></value>
                    <units rdf:resource="http://www.xfront.com/owl/ontologies/Length/#Meter"/>
                </Length>
            </xsl:when>
            <xsl:otherwise>
                <xsl:sequence select="$length"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:function>

The name of the function is "Length", which is in the units-of-measure namespace. The input parameter to the function is a Length node, such as:


    <Length>
        <value>6300</value>
        <units rdf:resource="#Kilometer"/>
    </Length>

If the value of "units" is Kilometer then the function converts the data to canonical form by returning an identical XML fragment, except the content of the <value> element has been multiplied by 1000, and <units> is set to have the value #Meter.

For example, if the function is invoked with the above Length it will return:


    <Length>
        <value>6300000</value>
        <units rdf:resource="#Meter"/>
    </Length>

If the value of "units" is Mile then the function converts the input to canonical form by returning an identical XML fragment, except the content of the <value> element has been multiplied by 1609.344, and <units> is set to have the value #Meter.

A complete version of this function would have code to convert any length unit-of-measure into canonical form.

The above function is written in XSLT 2.0. However, it could also be written in Java, C++, etc. In fact, there could be several implementations, each in a different programming language. For each implementation there will be an <IncludeFile> element in the ontology.

Designing Applications for Interoperability

An application simply needs to include the Length include file. Suppose that an application's "preferred" format is length expressed in Kilometers. The application is coded to process length data in Kilometers. However, the application designer anticipates that, with new trading partners, the input may contain length data in other forms. So, the application is coded to also process length data in the canonical format (Meter).

The application processes input data as follows: it checks the data to determine if it is in the "preferred" format. If so, then it processes the data directly. If not, it invokes the include file function. The function returns the data in canonical form, and the application then processes the canonical version.

This strategy enables applications to process any length data, regardless of the units-of-measure that are used!

Sample Application

Here is a very simple (XSLT 2.0) application which directly processes length data that is in the "preferred" Kilometer form. For all other forms it converts the data to the canonical form, and then processes the canonical form:


    <xsl:include href="Length-Include-File.xsl"/>

    <xsl:template match="len:Length[len:units/@rdf:resource!='http://www.xfront.com/owl/ontologies/Length/#Kilometer']
                                   [len:units/@rdf:resource!='http://www.xfront.com/owl/ontologies/Length/#Meter']">
        <xsl:variable name="canonical-Length" select="len:Length(.)"/>
        <xsl:apply-templates select="$canonical-Length"/>
    </xsl:template>

    <xsl:template match="len:Length[len:units/@rdf:resource='http://www.xfront.com/owl/ontologies/Length/#Kilometer']">
        <xsl:text>The input data is in the preferred Kilometer format</xsl:text>
    </xsl:template>

    <xsl:template match="len:Length[len:units/@rdf:resource='http://www.xfront.com/owl/ontologies/Length/#Meter']">
        <xsl:text>Either the input data was originally in the canonical format, or it was converted to the canonical format</xsl:text>
    </xsl:template>

Summary of the Approach

These four steps summarize the approach:

Create an ontology that declaratively expresses the canonical form, and provide a link to an include file.
Using your favorite programming language, create in include file that transforms any non-canonical form into canonical form.
Include the include file into your application.
Design your application to be able to process the data in the canonical form (your application may also be coded to process data that is in the application's "preferred" form). For any data not in the canonical form call the include file function to transform it to canonical form.

Conclusion

This document describes a simple, concrete design approach that applications may use today to deal with data that uses different units-of-measure. The benefit of this approach is that it provides a means for applications to process input data that is in a different form than what the application was originally coded for. The application doesn't need to be updated each time a new form is encountered. This enhances interoperability and lowers maintenance costs.

Second Example: Coordinate System Expressed in Different Forms

Step 1: Create an ontology that defines the canonical coordinate system

The canonical form for coordinate systems is the cartesian coordinate system. Here is the OWL ontology that declaratively expresses the canonical form:


    <owl:Class rdf:ID="Coordinate-System">
        <owlx:canonicalForm>
            <rdfs:Class rdf:resource="#Cartesian-Coordinate-System"/>
            <owlx:IncludeFile>
                <rdf:type rdf:resource="XSLT2.0"/>
                <owlx:location rdf:resource="CoordinateSystem-Include-File.xsl"/>
            </owlx:IncludeFile>
        </owlx:canonicalForm>
    </owl:Class>

"The canonical coordinate system is the cartesian coordinate system. An XSLT2.0 function for converting non-canonical forms to canonical form may be found in CoordinateSystem-Include-File.xsl"

Here is the definition of the cartesian coordinate system as well as the polar coordinate system:


    <owl:Class rdf:ID="Cartesian-Coordinate-System">
        <rdfs:subClassOf rdf:resource="#CoordinateSystem"/>
    </owl:Class>

    <owl:Class rdf:ID="Polar-Coordinate-System">
        <rdfs:subClassOf rdf:resource="#CoordinateSystem"/>
    </owl:Class>

For the complete ontology see the links at the bottom of this document.

Step 2: Create an include file to transform non-canonical forms to the canonical form

Below is an XSLT 2.0 function that converts a Polar Coordinate to the canonical Cartesian Coordinate form:


    <xsl:include href="Length-Include-File.xsl"/>

    <xsl:function name="cs:CoordinateSystem" as="element()">
        <xsl:param name="coordinateSystem" as="item()"/>

        <xsl:choose>
            <xsl:when test="local-name($coordinateSystem)='Polar-Coordinate-System'">
                <Cartesian-Coordinate-System xmlns="http://www.xfront.com/owl/ontologies/CoordinateSystem/#">
                    <xsl:variable name="canonical-r-length" select="len:Length($coordinateSystem/cs:r/len:Length)"/>
                    <xsl:variable name="canonical-theta-angle" select="cs:Angle($coordinateSystem/cs:theta/cs:Angle)"/>
                    <x>
                        <Length xmlns="http://www.xfront.com/owl/ontologies/Length/#">
                            <value>
                                <!-- x = r cos theta -->
                                <xsl:value-of select="$canonical-r-length/len:value * exslt:cos($canonical-theta-angle/cs:value)"/>
                            </value>
                            <units rdf:resource="http://www.xfront.com/owl/ontologies/Length/#Meter"/>
                        </Length>
                    </x>
                    <y>
                        <Length xmlns="http://www.xfront.com/owl/ontologies/Length/#">
                            <value>
                                <!-- y = r sin theta -->
                                <xsl:value-of select="$canonical-r-length/len:value * exslt:sin($canonical-theta-angle/cs:value)"/>
                            </value>
                            <units rdf:resource="http://www.xfront.com/owl/ontologies/Length/#Meter"/>
                        </Length>
                    </y>
                </Cartesian-Coordinate-System>
            </xsl:when>
            ...
        </xsl:choose>
    </xsl:function>

Note that this include file reuses the Length include file.

Again, the full version of this may be seen by following the link at the bottom of this document.

Step 3: The application includes the "include file"

I have created an XSLT 2.0 application. Here is what I have at the top of my application:


    <xsl:include href="CoordinateSystem-Include-File.xsl"/>

Step 4: Design the application to process any "foreign" data by converting to canonical form

My XSLT application checks to see if the location is not in cartesian coordinates, and calls the include file function to transform to cartesian coordinates:


    <xsl:template match="*[local-name(.) != 'Cartesian-Coordinate-System']">
        <xsl:text>The input data is not in the canonical coordinate system. Converting ...
        <xsl:variable name="canonical-CoordinateSystem" select="cs:CoordinateSystem(.)"/>
        <xsl:apply-templates select="$canonical-CoordinateSystem"/>
    </xsl:template>

    <xsl:template match="cs:Cartesian-Coordinate-System[(cs:x/len:Length/len:units/@rdf:resource != 'http://www.xfront.com/owl/ontologies/Length/#Meter') or
                                                        (cs:y/len:Length/len:units/@rdf:resource != 'http://www.xfront.com/owl/ontologies/Length/#Meter')]">
        <xsl:text>The input data is in the canonical coordinate system, but the Length of x and/or y is not in canonical form. Converting ...
        <xsl:variable name="canonicalForm" select="cs:CoordinateSystem(.)"/>
        <xsl:apply-templates select="$canonicalForm"/>
    </xsl:template>

    <xsl:template match="cs:Cartesian-Coordinate-System[(cs:x/len:Length/len:units/@rdf:resource = 'http://www.xfront.com/owl/ontologies/Length/#Meter') and
                                                        (cs:y/len:Length/len:units/@rdf:resource = 'http://www.xfront.com/owl/ontologies/Length/#Meter')]">
        <xsl:text>The input data is in the canonical coordinate system, and it is in canonical form.

    </xsl:template>

Acknowlegements

A great many people contributed to this work:

Tom Passin
John Cowan
Benja Fallenstein
Manos Batsis
Pete Kirkham
David Carlisle
Jon Hanna
Ken Laskey
Paul Swett
Bob Foster
John DeCarlo
Terry Alford
Kit Lueder
Frank Manola
Jeff Grief
Richard McCullough
Nikki Rogers
Joe Chiusano
Bill de hOra

Thanks everyone!

References

[1] http://wwww.xfront.com/owl/

Links to the Complete Version of the Above Examples

Example 1: Length Example

The Length Ontology: Length.owl
The Length Include File: Length-Include-File.xsl
The Length Application: Length-application.xsl

Example 2: Coordinate System Example

The Coordinate System Ontology: CoordinateSystem.owl
The Coordinate System Include File: CoordinateSystem-Include-File.xsl
The Coordinate System Application: CoordinateSystem-application.xsl