Enhancing Data Interoperability with Ontologies, Canonical Forms, and Include Files
Enhancing Data Interoperability with Ontologies, Canonical Forms, and Include Files
by Roger L. Costello
August 10, 2003
Issue: There are many different units-of-measure. Lengths,
speeds, locations, etc can be expressed in many different ways.
Fusing two independent pieces of data that, for example, express the same length but use different units
requires the fusing agent to recognize that
the data is related via a transformation. How can relationships
which involve transformations be declaratively expressed? How can an application
process input data that may be expressed in many different ways? As an application
interacts with a broader range of trading partners it can expect increasing diversity of expression.
How can the application be designed to effectively process
data in different forms, without constant code updates?
Executive Summary
- Declaratively expressing relationships between entities is good. Relationship information
can be used to enhance application interoperability.
- RDF Schema (RDFS) and OWL are ontology languages designed for declaratively expressing relationships
- There are many important relationships that cannot be captured by RDFS or OWL. Namely, relationships
which involve transformation between entities. Here are some examples:
- Length: A physical length can be expressed in many different forms - as miles, kilometers, furlongs, etc.
Relating one form to another requires a transformation. For example, a distance expressed in
miles is related to a distance expressed in kilometers by this transformation:
kilometer = mile * 1.609344
- Location: A location can be expressed in may different forms - as cartesian coordinates, polar coordinates, etc.
Relating one form to another requires a transformation.
- Speed: A physical speed can be expressed in many different forms - as miles per hour, meters per second, etc.
Relating one form to another requires a transformation.
- Rate of Fuel Consumption: The rate at which an object consumes fuel can be expressed in many
different forms - as miles per gallon, kilometers per liter, etc.
Relating one form to another requires a transformation.
- Interoperability will be further enhanced if we can find a good approach for expressing these important
transformation relationships.
- The approach proposed in this paper is:
- With an ontology language (OWL) declaratively express the canonical form. For example, here
are the canonical forms of the above:
- Length: The canonical form of a physical length is a value with Meter units.
- Location: The canonical form of a location is a value in cartesian coordinates.
- Speed: The canonical form of a speed is a value with Meters/Second units.
- Rate of Fuel Consumption: The canonical form of a rate of fuel consumption is a value with Meters/Liter units.
- As well as declaratively expressing the canonical form, the ontology also identifies the location of an include file.
- With a programming language create an include file that codes the transformation from non-canonical
forms to the canonical form.
- Applications use the ontology and include file as follows:
- Applications include the include file.
- Applications receiving data that is not in canonical form invokes a function in the include file. The function
returns the data in canonical form. The application processes the canonical form.
- Applications are required to understand only one form - the canonical form. (They may understand other forms,
but are not required.)
- Applications can interoperate with trading partners regardless of the form their data is expressed in!
Enhancing Interoperability with Ontologies
Ontology languages such as RDF Schema (RDFS) and OWL provide the ability
to declaratively express the relationships between entities. For example,
you can state "type of" relationships such as:
- SLR (Single Lens Reflex) is a type of Camera
You can state synonym relationships such as:
- focal-length is synonymous with aperture
Applications can use this relationship information to enhance interoperability.
For example, suppose an application is coded to process Camera data, but has
not been coded to understand SLR. If the input contains SLR data then the
application can dynamically discover, by consulting a Camera ontology,
that a SLR is a type of Camera. Thus, the ontology provides the relationship
information that is needed to make the bridge between terms that the application
doesn't understand (e.g., SLR) to terms that it does understand (e.g.,
Camera). In this way ontologies can enhance interoperability. For more information
on this topic, see [1].
Lesson Learned: Declaratively expressing relationships is good! It helps
applications dynamically understand data. Interoperability is enhanced.
Corollary: Ontologies are languages intended for declaratively expressing relationships.
Ontologies are good! They promote interoperability. Use ontologies!
Ontologies can't State Relationships which Involve Transformations
Ontologies are not able to state relationships between entities that are
related via a transformation.
Example 1: Same Length, Different Units
For example, these two XML fragments represent
the same physical length:
<River id="Yangtze">
<length units="kilometer">6300</length>
</River>
<River id="Yangtze">
<length units="mile">3914</length>
</River>
The two lengths are related by this mathematical relationship (transformation):
kilometer = mile * 1.609344
Neither RDFS nor OWL can express this important relationship.
Example 2: Same Location, Different Coordinate Systems
Here's another example ... these two XML fragments represent the same location:
<Map id="M1">
<location>
<cartesian-coordinate>
<x units="kilometer">100</x>
<y units="kilometer">100</y>
</cartesian-coordinate>
</location>
</Map>
<Map id="M2">
<location>
<polar-coordinate>
<r units="kilometer">141.421</x>
<theta units="radian">0.7341</theta>
</polar-coordinate>
</location>
</Map>
The two locations are related by this mathematical formula:
x = r cos theta
y = r sin theta
Again, neither RDFS nor OWL can express this relationship.
Example 3: Same Speed, Different Units
There are many ways to express the speed of an object - miles per hour,
kilometers per hour, meters per second, etc. It would be very useful
if an application could recognize that two expressions represent the same speed,
despite using different units. For example, these two XML fragments
express the same speed, but use different units:
<Comet id="Hale-Bopp">
<avg-speed units="miles-per-hour">93951.3</avg-speed>
</Comet>
<Comet id="Hale-Bopp">
<avg-speed units="kilometers-per-second">42.0</avg-speed>
</Comet>
The two speeds are related by this mathematical formula:
miles-per-hour = kilometers-per-sec * 3600 / 1.609344
Neither RDFS nor OWL can express this relationship.
Example 4: Same Rate of Fuel Consumption, Different Units
There are many ways to express the rate of fuel consumption of an object - miles per gallon,
kilometers per liter, etc. It would be very useful
if an application could recognize that two expressions represent the same fuel consumption rate,
despite using different units. For example, these two XML fragments
show the same fuel consumption rate, but use different units:
<Toyota id="Tercel-95">
<mileage units="miles-per-gallon">29.0</mileage>
</Toyota>
<Toyota id="Tercel-95">
<mileage units="kilometers-per-liter">12.3</mileage>
</Toyota>
The two fuel consumption rates are related by this mathematical formula:
miles-per-gallon = kilometers-per-liter * 1.609344 / 3.785
Neither RDFS nor OWL can express this relationship.
Lesson Learned: There are many important relationships that cannot
be expressed using the current ontology languages RDFS and OWL. Thus, the
full potential for interoperability cannot be realized with the current form of RDFS and OWL.
Purpose of this Document
As we saw above relationship information is useful - it can be used to enhance application
interoperability. We also saw the RDFS and OWL cannot express many important
relationships, such as those that require transformations.
It would be beneficial if we could declaratively express these
relationships. [For the Map example above, imagine how useful
it would be if applications could dynamically recognize that the two
Map documents are providing data for the same location, just using different
coordinate systems. Very powerful!]
How can relationships involving transformations be expressed?
The purpose of this document is to provide a concrete approach to expressing
these relationships, and for using these relationships.
State the Relationship Between Each Pair of Units-of-Measure?
Consider the number of different units that can be used to express length -
kilometer, mile, meter, inches, centimeter, furlong, etc. Stating the relationship
between every possible length unit-of-measure would be very complex ("n" units would require n-squared
pairs). Likewise, consider the number of different coordinate systems -
rectangular cartesian, spherical polar, cylindrical polar, etc. Again, the
complexity of defining the relationship between every possible pair of coordinate
systems is too great. The same is true for speed and rate of fuel consumption.
This is not a good approach.
State the Relationship to the Canonical Form
All units-of-measure have a Standard International (SI) canonical form. For example,
the canonical form for a length measure is meter. For coordinate systems
the canonical form is the cartesian coordinate system. For speed the canonical form
is meters/sec. For rate of fuel consumption the canonical form is meters/liter.
Above we looked at the problem of stating the relationship of every possible
pair. A better approach is to simply state the relationship to the canonical form.
This reduces the complexity greatly (for "n" units, the complexity is n).
How do Applications Execute the Transformations?
RDFS and OWL are designed to express equivalence and type-of relationships
of objects. They weren't designed to express relationships involving transformations. We would
need a rather large extension to these ontology languages to enable them to
express mathematical relationships.
Further, even if it were possible to extend
the languages to express the mathematical relationships there still remains the
problem of executing the transformation. For example, suppose that an ontology
declaratively expressed the formula to convert a polar coordinate to cartesian coordinate.
And suppose an application received input data in polar coordinates, but needs it
in cartesian coordinates. Suppose the ontology provides the conversion formula,
and suppose the application dynamically retrieves the formula. The
application may not be smart enough to dynamically convert the polar coordinates to
cartesian coordinates. (In the general case, the problem for the application is to be
able to dynamically convert an arbitrary coordinate system to another by using
formulas that are dynamically provided. This is very difficult indeed.)
Lesson Learned: Simply expressing a mathematical formula in an ontology
may not be very helpful to applications.
A Better Approach: Use Ontologies, Canonical Forms and Include Files
A better approach is to capitalize on what ontologies are good at, as well as on
what programming language are good at. With a small extension to OWL we
can state the canonical form. With a programming language we can code the
conversion from non-canonical forms to the canonical form.
The approach is very simple:
- With an ontology language state the canonical form.
- With a programming language create code to convert non-canonical forms to the canonical form.
Let's take an example. The SI canonical form for length measures is the meter.
For example, this XML instance document is expressing the length of the Yangtze
River in the canonical form:
<River rdf:ID="Yangtze">
<length>
<Length>
<value>6300000</value>
<units rdf:resource="#Meter"/>
</Length>
</length>
</River>
The canonical form of the Length class is:
- a value property with a range of xsd:decimal in canonical form
- a units property with a range of len:Units-of-Measure in canonical form
Note that the canonical form is defined recursively: the canonical form of Length is the set of
properties in their canonical form
Expressing the Canonical Form of the Length Class in an Ontology
With a small extension to OWL the canonical form of Length can be expressed:
[Note: owlx = OWL eXtension. The <canonicalForm>, <IncludeFile>, and <location> elements
are extensions to OWL.]
<owl:Class rdf:ID="Length">
<owlx:canonicalForm>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<owl:Restriction>
<owl:onProperty rdf:resource="#value"/>
<owl:hasValue rdf:resource="http://www.w3.org/2001/XMLSchema#decimal"/>
</owl:Restriction>
<owl:Restriction>
<owl:onProperty rdf:resource="#units"/>
<owl:hasValue rdf:resource="#Length-Unit-of-Measure"/>
</owl:Restriction>
</owl:unionOf>
</owl:Class>
<owlx:IncludeFile>
<rdf:type rdf:resource="XSLT2.0"/>
<owlx:location rdf:resource="Length-Include-File.xsl"/>
</owlx:IncludeFile>
</owlx:canonicalForm>
</owl:Class>
This is read as: "The canonical form of the Length class are instances
that have a 'value' property in canonical decimal form, and a 'units' property
in canonical Length-Unit-of-Measure form. An include file for converting Length
classes to canonical form may be found in Length-Include-File.xsl, and it is
implemented using XSLT2.0"
The canonical form of xsd:decimal is defined by the XML Schema specification.
Expressing the Canonical Form of Length-Unit-of-Measure
The canonical form for Length-Unit-of-Measure is the Meter:
<owl:Class rdf:ID="Length-Unit-of-Measure">
<owlx:canonicalForm rdf:resource="#Meter"/>
</owl:Class>
This class does not have an <IncludeFile> element.
A <canonicalForm> element may contain zero or more <IncludeFile> elements.
The Length-Unit-of-Measure class does not have a <canonicalForm> element, indicating that there
is no function available specifically for converting Length-Unit-of-Measure to
canonical form. This makes sense, since this class is never used - only its
subclasses are used.
Meter, Kilometer, Mile, etc are all subclasses of Length-Unit-of-Measure:
<owl:Class rdf:ID="Meter">
<rdfs:subClassOf rdf:resource="#Length-Unit-of-Measure"/>
</owl:Class>
<owl:Class rdf:ID="Kilometer">
<rdfs:subClassOf rdf:resource="#Length-Unit-of-Measure"/>
</owl:Class>
<owl:Class rdf:ID="Mile">
<rdfs:subClassOf rdf:resource="#Length-Unit-of-Measure"/>
</owl:Class>
The units property is defined to have any Length-Unit-of-Measure value:
<owl:ObjectProperty rdf:ID="units">
<rdfs:range rdf:resource="#Length-Unit-of-Measure"/>
</owl:ObjectProperty>
Thus, this is a valid instance document:
<River rdf:ID="Yangtze">
<length>
<Length>
<value>6300</value>
<units rdf:resource="#Kilometer"/>
</Length>
</length>
</River>
However, as the ontology shows, Length is not in canonical form.
Applications can Process any Length, Regardless of the Units
It is important to note that
instance documents are not prohibited from expressing Length in a non-canonical
form. Quite the contrary. Diversity is encouraged! Applications are empowered to process
any data that it receives, in any form. To achieve this,
an include file is created that provides code for converting
each non-canonical form into a canonical form.
Length Include File
Here is the XSLT 2.0 include file referenced by the ontology for converting Length data that is not in
canonical form:
<xsl:function name="len:Length" as="element()">
<xsl:param name="length" as="item()"/>
<xsl:choose>
<xsl:when test="$length/len:units/@rdf:resource='http://www.xfront.com/owl/ontologies/Length/#Kilometer'">
<Length xmlns="http://www.xfront.com/owl/ontologies/Length/#">
<value><xsl:value-of select="$length/len:value * 1000"/></value>
<units rdf:resource="http://www.xfront.com/owl/ontologies/Length/#Meter"/>
</Length>
</xsl:when>
<xsl:when test="$length/len:units/@rdf:resource='http://www.xfront.com/owl/ontologies/Length/#Mile'">
<Length xmlns="http://www.xfront.com/owl/ontologies/Length/#">
<value><xsl:value-of select="$length/len:value * 1609.344"/></value>
<units rdf:resource="http://www.xfront.com/owl/ontologies/Length/#Meter"/>
</Length>
</xsl:when>
<xsl:when test="$length/len:units/@rdf:resource='http://www.xfront.com/owl/ontologies/Length/#Furlong'">
<Length xmlns="http://www.xfront.com/owl/ontologies/Length/#">
<value><xsl:value-of select="$length/len:value * 201.168"/></value>
<units rdf:resource="http://www.xfront.com/owl/ontologies/Length/#Meter"/>
</Length>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$length"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
The name of the function is "Length", which is in the units-of-measure namespace.
The input parameter to the function is a Length node, such as:
<Length>
<value>6300</value>
<units rdf:resource="#Kilometer"/>
</Length>
If the value of "units" is Kilometer then the function converts the data
to canonical form by returning an identical
XML fragment, except the content of the <value> element has been multiplied by 1000, and <units> is
set to have the value #Meter.
For example, if the function is invoked with the above Length it will return:
<Length>
<value>6300000</value>
<units rdf:resource="#Meter"/>
</Length>
If the value of "units" is Mile then the function converts the input
to canonical form by returning an identical
XML fragment, except the content of the <value> element has been multiplied by 1609.344, and <units> is
set to have the value #Meter.
A complete version of this function would have code to convert any length unit-of-measure
into canonical form.
The above function is written in XSLT 2.0. However, it could also be written
in Java, C++, etc. In fact, there could be several implementations, each in
a different programming language. For each implementation there will be an <IncludeFile> element
in the ontology.
Designing Applications for Interoperability
An application simply needs to include the Length include file. Suppose that an application's "preferred"
format is length expressed in Kilometers. The application
is coded to process length data in Kilometers. However, the application designer
anticipates that, with new trading partners, the input may contain length data in other forms. So, the application
is coded to also process length data in the canonical format (Meter).
The application processes input data as follows: it checks the data to determine if it is in the "preferred"
format. If so, then it processes the data directly. If not, it invokes the
include file function. The function returns the data in canonical form, and the application
then processes the canonical version.
This strategy enables applications
to process any length data, regardless of the units-of-measure that are used!
Sample Application
Here is a very simple (XSLT 2.0) application which directly processes length data that is in the
"preferred" Kilometer form. For all other forms it converts the data to
the canonical form, and then processes the canonical form:
<xsl:include href="Length-Include-File.xsl"/>
<xsl:template match="len:Length[len:units/@rdf:resource!='http://www.xfront.com/owl/ontologies/Length/#Kilometer']
[len:units/@rdf:resource!='http://www.xfront.com/owl/ontologies/Length/#Meter']">
<xsl:variable name="canonical-Length" select="len:Length(.)"/>
<xsl:apply-templates select="$canonical-Length"/>
</xsl:template>
<xsl:template match="len:Length[len:units/@rdf:resource='http://www.xfront.com/owl/ontologies/Length/#Kilometer']">
<xsl:text>The input data is in the preferred Kilometer format</xsl:text>
</xsl:template>
<xsl:template match="len:Length[len:units/@rdf:resource='http://www.xfront.com/owl/ontologies/Length/#Meter']">
<xsl:text>Either the input data was originally in the canonical format, or it was converted to the canonical format</xsl:text>
</xsl:template>
Summary of the Approach
These four steps summarize the approach:
- Create an ontology that declaratively expresses the canonical form, and provide
a link to an include file.
- Using your favorite programming language, create in include file that transforms any non-canonical
form into canonical form.
- Include the include file into your application.
- Design your application to be able to process the
data in the canonical form (your application may also be coded to process data that is
in the application's "preferred" form). For any data not in the canonical
form call the include file function to transform it to canonical form.
Conclusion
This document describes a simple, concrete design approach that applications may use
today to deal with data that uses different units-of-measure. The benefit of this approach is that it provides a means for applications to
process input data that is in a different form than what the application was originally
coded for. The application doesn't need to be updated each time a new form is encountered.
This enhances interoperability and lowers maintenance costs.
Second Example: Coordinate System Expressed in Different Forms
Step 1: Create an ontology that defines the canonical coordinate system
The canonical form for coordinate systems is the cartesian coordinate system. Here is the
OWL ontology that declaratively expresses the canonical form:
<owl:Class rdf:ID="Coordinate-System">
<owlx:canonicalForm>
<rdfs:Class rdf:resource="#Cartesian-Coordinate-System"/>
<owlx:IncludeFile>
<rdf:type rdf:resource="XSLT2.0"/>
<owlx:location rdf:resource="CoordinateSystem-Include-File.xsl"/>
</owlx:IncludeFile>
</owlx:canonicalForm>
</owl:Class>
"The canonical coordinate system is the cartesian coordinate system. An
XSLT2.0 function for converting non-canonical forms to canonical form may be found in
CoordinateSystem-Include-File.xsl"
Here is the definition of the cartesian coordinate system as well as the polar
coordinate system:
<owl:Class rdf:ID="Cartesian-Coordinate-System">
<rdfs:subClassOf rdf:resource="#CoordinateSystem"/>
</owl:Class>
<owl:Class rdf:ID="Polar-Coordinate-System">
<rdfs:subClassOf rdf:resource="#CoordinateSystem"/>
</owl:Class>
For the complete ontology see the links at the bottom of this document.
Step 2: Create an include file to transform non-canonical forms to the canonical form
Below is an XSLT 2.0 function that converts a Polar Coordinate to the canonical Cartesian Coordinate form:
<xsl:include href="Length-Include-File.xsl"/>
<xsl:function name="cs:CoordinateSystem" as="element()">
<xsl:param name="coordinateSystem" as="item()"/>
<xsl:choose>
<xsl:when test="local-name($coordinateSystem)='Polar-Coordinate-System'">
<Cartesian-Coordinate-System xmlns="http://www.xfront.com/owl/ontologies/CoordinateSystem/#">
<xsl:variable name="canonical-r-length" select="len:Length($coordinateSystem/cs:r/len:Length)"/>
<xsl:variable name="canonical-theta-angle" select="cs:Angle($coordinateSystem/cs:theta/cs:Angle)"/>
<x>
<Length xmlns="http://www.xfront.com/owl/ontologies/Length/#">
<value>
<!-- x = r cos theta -->
<xsl:value-of select="$canonical-r-length/len:value * exslt:cos($canonical-theta-angle/cs:value)"/>
</value>
<units rdf:resource="http://www.xfront.com/owl/ontologies/Length/#Meter"/>
</Length>
</x>
<y>
<Length xmlns="http://www.xfront.com/owl/ontologies/Length/#">
<value>
<!-- y = r sin theta -->
<xsl:value-of select="$canonical-r-length/len:value * exslt:sin($canonical-theta-angle/cs:value)"/>
</value>
<units rdf:resource="http://www.xfront.com/owl/ontologies/Length/#Meter"/>
</Length>
</y>
</Cartesian-Coordinate-System>
</xsl:when>
...
</xsl:choose>
</xsl:function>
Note that this include file reuses the Length include file.
Again, the full version of this may be seen by following the link at the bottom of this document.
Step 3: The application includes the "include file"
I have created an XSLT 2.0 application. Here is what I have at the top of my application:
<xsl:include href="CoordinateSystem-Include-File.xsl"/>
Step 4: Design the application to process any "foreign" data by converting to canonical form
My XSLT application checks to see if the location is not in cartesian coordinates, and calls
the include file function to transform to cartesian coordinates:
<xsl:template match="*[local-name(.) != 'Cartesian-Coordinate-System']">
<xsl:text>The input data is not in the canonical coordinate system. Converting ...
<xsl:variable name="canonical-CoordinateSystem" select="cs:CoordinateSystem(.)"/>
<xsl:apply-templates select="$canonical-CoordinateSystem"/>
</xsl:template>
<xsl:template match="cs:Cartesian-Coordinate-System[(cs:x/len:Length/len:units/@rdf:resource != 'http://www.xfront.com/owl/ontologies/Length/#Meter') or
(cs:y/len:Length/len:units/@rdf:resource != 'http://www.xfront.com/owl/ontologies/Length/#Meter')]">
<xsl:text>The input data is in the canonical coordinate system, but the Length of x and/or y is not in canonical form. Converting ...
<xsl:variable name="canonicalForm" select="cs:CoordinateSystem(.)"/>
<xsl:apply-templates select="$canonicalForm"/>
</xsl:template>
<xsl:template match="cs:Cartesian-Coordinate-System[(cs:x/len:Length/len:units/@rdf:resource = 'http://www.xfront.com/owl/ontologies/Length/#Meter') and
(cs:y/len:Length/len:units/@rdf:resource = 'http://www.xfront.com/owl/ontologies/Length/#Meter')]">
<xsl:text>The input data is in the canonical coordinate system, and it is in canonical form.
</xsl:template>
Acknowlegements
A great many people contributed to this work:
- Tom Passin
- John Cowan
- Benja Fallenstein
- Manos Batsis
- Pete Kirkham
- David Carlisle
- Jon Hanna
- Ken Laskey
- Paul Swett
- Bob Foster
- John DeCarlo
- Terry Alford
- Kit Lueder
- Frank Manola
- Jeff Grief
- Richard McCullough
- Nikki Rogers
- Joe Chiusano
- Bill de hOra
Thanks everyone!
References
[1] http://wwww.xfront.com/owl/
Links to the Complete Version of the Above Examples
Example 1: Length Example
The Length Ontology: Length.owl
The Length Include File: Length-Include-File.xsl
The Length Application: Length-application.xsl
Example 2: Coordinate System Example
The Coordinate System Ontology: CoordinateSystem.owl
The Coordinate System Include File: CoordinateSystem-Include-File.xsl
The Coordinate System Application: CoordinateSystem-application.xsl