The purpose of this document is to identify issues that you should be aware of when creating XPath expressions.
Our approach is to first look at an example problem. It will reveal many issues.
Consider this snippet of an XML instance document:
<airplane tailnum="C3H1"> <altitude unit="feet">20000</altitude> </airplane>
Read as: "An airplane, with tail number C3H1, is flying at an altitude of 20,000 feet."
Here's the complete XML instance document: airplane.xml
Design an XPath expression which specifies this behavior: convert from feet to meters the value of the altitude element.
Which of these two XPath expressions represents best practice:
Before converting the altitude from feet to meters, check to see if the <airplane> element has been validated against an XML Schema (here's the XML Schema, airplane.xsd):
if (//airplane[@tailnum='C3H1'] instance of schema-element(airplane)) then //airplane[@tailnum='C3H1']/altitude * .3048 else if (//airplane[@tailnum='C3H1']/altitude[@unit='feet'] castable as xs:double) then //airplane[@tailnum='C3H1']/altitude * .3048 else 'Error'
Read as: "Check to see if the airplane element has been validated against an XML Schema; if it has, perform the conversion; otherwise, check to see if the altitude value is of the appropriate datatype (the XML Schema double datatype); if so, perform the conversion; otherwise report an error."
Starting from the top of the input document, navigate to the <airplane> element with tailnum 'C3H1', from there navigate to the <altitude> element with units in feet, and then multiple its value by .3048:
/FAA/airplane[@tailnum='C3H1']/altitude[@unit='feet'] * .3048
Read as: "Navigate to the altitude element, atomize it, and then multiply its atomic value by .3048"
Rarely is a strategy best-suited for every circumstance. With our example, what constitutes best practice depends on the circumstance. So let's add this requirement on our XPath: the XPath must be capable of being run in any context. That is, the XPath must be capable of being executed in a browser, by an XPath 1 processor or XPath 2 processor, by a schema-aware processor or a non-schema-aware processor.
Given this requirement, Version 2 is "best practice." The following shows why this is the case.
By virtue of the semantics of path expressions, we implicitly get structure and value checking.
An XPath processor will check that:
Further, an XPath processor will check that:
Finally, an XPath processor will check that:
By virtue of the semantics of the multiplication operator, we implicitly get datatype checking.
An XPath processor will check that the value of the <altitude> element is compatible with the multiplication operator
Summary: Version 2 does everything Version 1 does, but more simply; the XPath is context-independent, i.e. it can be processed by a greater variety of XPath processors.
Our example has implicitly exposed several issues. Let's now make the issues explicit:
Version 2 illustrates the benefit of full, absolute XPath expressions for implicit structure and value checking.
However, a full, absolute XPath expression has the disadvantage of tying the expression to a particular input structure. Sometimes you need an XPath expression that is flexible in the face of varying input. For these situations, relative XPath expressions are better suited.
Example: Compare these two XPath expressions:
(a) //altitude * .3048
(b) /FAA/airplane[@tailnum='C3H1']/altitude[@unit='feet'] * .3048
The later is getting the XPath processor to perform a lot more checking; any errors in the input will be revealed.
The former is hiding a lot of potential problems; for example, the <altitude> being operated on may not be for the airplane of interest.
Conversely, the later is tied to a particular input structure, whereas the former can be applied to a variety of input structures.
Version 2 can be executed by XPath 1.0 and XPath 2.0 processors.
If your XPath needs to execute in a browser then the XPath must be restricted to XPath 1.0
XPath 2.0 provides many new features that can make coding much easier.
XPath may be used to express processing, e.g. in our example XPath is used to convert an altitude from feet to meters.
XPath may also be used to express validation requirements, as is the case with using XPath in Schematron.
Sometimes you need an XPath expression that can run anywhere, without a-priori knowledge of the context in which it will be executed.
Version 2 is well suited for this context-independent processing capability.
Alternatively, in many situations you know the context in which the XPath will be executed. For example, you know that you are going to schema-validate the input and you know you have a schema-aware XPath processor. Version 1 may be better-suited for this situation.
The following people contributed to the creation of this document:
Last Updated: February 1, 2008