Namespaces is for Markup and Data

Ouch! Learning a Lesson

Recently I was bitten and I'd like to share the lesson I learned.

Namespaces are great. They modularize your document. They enable module-specific processing. I highly recommend them.

But here's the thing ...

Namespaces are not just important for markup, they are also important for data.

How often have you namespace qualified a piece of data? I suspect not often. (Until recently, I hadn't)

Learn from my mistake and start putting your data in namespaces!

How I Got Bitten

I have an xml document containing a list of all the xml schema datatypes:

    <Datatype type="anyURI">...

    <Datatype type="base64Binary">...
    
    ...

I have an application that takes the datatype in an xml schema element declaration, e.g.,

   <xs:element name="Retailer" type="xs:anyURI" />

and finds the matching <Datatype> element in the xml document. (The application strips off the xs: and searches for anyURI)

It worked great until xml schema 1.1 came along ...

In xml schema 1.1 vendors can create their own primitive datatypes, provided they are in a different namespace. For example, a vendor might create his own version of anyURI; then a schema could contain this:

   <xs:element name="Retailer" type="vendor:anyURI" />

Suddenly I realized that my application has a major problem. It is not able to distinguish xs:anyURI from vendor:anyURI.

Sigh ...

As my xml document evolved, my application was unable to adapt.

And it all boils down to the fact that I didn't use namespaces with my data.

Fixing the Problem

I should have designed my xml document like this:

    <Datatype type="xs:anyURI">...

    <Datatype type="xs:base64Binary">...
    
    ...

Notice that the *value* of the type attribute is now a namespace-qualified name (i.e., its value is a QName).

With the advent of xml schema 1.1, I simply add the vendor unique datatypes to my xml document:

    <Datatype type="xs:anyURI">...

    <Datatype type="vendor:anyURI">...
    
    ...

And then my application searches the xml document based on QNames rather than unqualified names (NCName).

Here's some xslt code that shows how to do this:

    <xsl:template match="xsd:element">

          <xsl:variable name="element" select="." />
          <xsl:variable name="datatype" select="@type" />

          <xsl:value-of select="doc('datatypes.xml')//Datatype[resolve-QName(@type, .) eq 
                                        resolve-QName($datatype, $element)]" />

    </xsl:template>

Lesson Learned

By putting my data in a namespace then it could have evolved without breaking my application.

I am going to start using the QName datatype a lot more frequently, e.g.

    <xs:attribute name="type" use="required" type="xs:QName" />

* Retraction *

The excellent minds on the xml-dev list have helped me understand that using QNames in element and attribute content is not a good idea. Let's see why ...

EXAMPLE: Create an <object-id> element, whose value is a URL-based identifier for any object. Here are two approaches:

Approach #1: QName Content

The content of <object-id> is a QName:

    <example xmlns:aquarium="http://www.aquarium.org#">

        <object-id>aquarium:tank</object-id>

    </example>

Approach #2: anyURI Content

The content of <object-id> is an anyURI value:

    <example>

        <object-id>http://www.aquarium.org#tank</object-id>

    </example>

*** Approach #2 is preferred. ***

There are several disadvantages to Approach #1:

It's easier to compare two anyURI values than it is to compare QName values. A QName value has to be "resolved," i.e. the prefix has to be associated with the namespace URI, and then you have to compare the namespace URIs and the local names.

Comparing the value of these two <object-id> elements:

    <example xmlns:aquarium="http://www.aquarium.org#">

        <object-id>aquarium:tank</object-id>

    </example>

    <Boston xmlns:a="http://www.aquarium.org#">

        <object-id>a:tank</object-id>

    </Boston>

is more challenging than comparing these two <object-id> elements:

    <example>

        <object-id>http://www.aquarium.org#tank</object-id>

    </example>

    <Boston>

        <object-id>http://www.aquarium.org#tank</object-id>

    </Boston>

If you, say, use the xslt document() function to copy this element:
```
       <object-id>aquarium:tank</object-id>
```
into another document then you've lost the association between the prefix (aquarium) and its namespace URI (http://www.aquarium.org#).

Thus, QName content doesn't lend itself to copy-and-paste.

The prefix looses its value once it's outside the xml document where it's defined.
A good division of labor is:
- xml parsers deal with markup
- applications deal with data
By using QNames in content you've lost the division of labor between the xml parser and application because the xml parser must process the prefix that's in the content.

Last Updated: August 12, 2009