What advice would you give to someone who was to ask you, "In general, when should an element (or type) be declared global versus when should it be declared local"? The purpose of this document is to provide answers to that question.
Below is a snippet of an XML instance document. We will explore the different design strategies using this example.
The instance document has all its components bundled together. Likewise, the schema is designed to bundle together all its element declarations.
This design represents one end of the design spectrum.
Note how the schema declared each component individually (Title, and Author) and then assembled them together (by ref'ing them) in the creation of the Book component.
These two designs represent opposite ends of the design spectrum.
To understand these designs it may help to think in terms of boxes, where a box represents an element or type:
[1] Opaque content. The content of Book is opaque to other schemas, and to other parts of the same schema. The impact of this is that none of the types or elements within Book are reusable.
[2] Localized scope. The region of the schema where the Title and Author element declarations are applicable is localized to within the Book element. The impact of this is that if the schema has set elementFormDefault="unqualified" then the namespaces of Title and Author are hidden (localized) within the schema.
[3] Compact. Everything is bundled together into a tidy, single unit.
[4] Decoupled. With this design approach each component is self-contained (i.e., they don't interact with other components). Consequently, changes to the components will have limited impact. For example, if the components within Book changes it will have a limited impact since they are not coupled to components outside of Book.
[5] Cohesive. With this design approach all the related data is grouped together into self-contained components, i.e., the components are cohesive.
[1] Transparent content. The components which make up Book are visible to other schemas, and to other parts of the same schema. The impact of this is that the types and elements within Book are reusable.
[2] Global scope. All components have global scope. The impact of this is that, irrespective of the value of elementFormDefault, the namespaces of Title and Author will be exposed in instance documents.
[3] Verbose. Everything is laid out and clearly visible.
[4] Coupled. In our example we saw that the Book element depends on the Title and Author elements. If those elements were to change it would impact the Book element. Thus, this design produces a set of interconnected (coupled) components.
[5] Cohesive. With this design approach all the related data is also grouped together into self-contained components. Thus, the components are cohesive.
The two design approaches differ in a couple of important ways:
Consider the Book example again. An alternative design is to create a global type definition which nests the Title and Author element declarations within it:
This design has both benefits:This design has:
Salami Slice Design:
The Salami Slice design also results in creating reusable (element) components, but it has absolutely no potential for namespace hiding."However", you argue, "Suppose that I want namespaces exposed in instance documents. [We have seen cases where this is desired.] So the Salami Slice design is a good approach for me. Right?"
Let's think about this for a moment. What if at a later date you change your mind and wish to hide namespaces (what if your users hate seeing all those namespace qualifiers in instance documents)? You will need to redesign your schema (possibly scraping it and starting over).
Better to adopt the Venetian Blind Design, which allows you to control whether namespaces are hidden or exposed by simply setting the value of elementFormDefault. No redesign of your schema is needed as you switch from exposing to hiding, or vice versa.
[That said ... your particular project may need to sacrifice the ability to turn on/off namespace exposure because you require instance documents to be able to use element substitution. In such circumstances the Salami Slice design approach is the only viable alternative.]
Here are the characteristics of the Venetian Blind Design.
[1] Maximum reuse. The primary component of reuse are type definitions.
[2] Maximum namespace hiding. Element declarations are nested within types, thus maximizing the potential for namespace hiding.
[3] Easy exposure switching. Whether namespaces are hidden (localized) in the schema or exposed in instance documents is controlled by the elementFormDefault switch.
[4] Coupled. This design generates a set of components which are interconnected (i.e., dependent).
[5] Cohesive. As with the other designs, the components group together related data. Thus, the components are cohesive.
[2] Where your task requires that you make available to instance document authors the option to use element substitution, then use the Salami Slice design.
[3] Where mimimizing size and coupling of components is of utmost concern then use the Russian Doll design.
The Person element contains an id attribute to uniquely identify it.
Now that we have a Person component it can be referenced by the Book component:
"Book is comprised of a Title followed by an Author. The Author is a Person whose name is Richard Bach."
In this design there are two main components ("objects"), Person and Book. The Author element bridges (associates) the two components. [One could imagine an O-O Object model of Person and Book mapping to these two XML components.]
Let's see how each of the three design approaches would model the Person and Book components.
As is characteristic of the Russian Doll design, all the elements and attributes are nested together in a tidy, compact bundle.
The notable characteristic of this design is that all the elements and attributes are global. Consequently, their namespaces will always be exposed in instance documents, irrespective of the value of the elementFormDefault "switch".
Note that the elements are nested within the types. This enables us to toggle on/off namespace exposure in instance documents.
The Russian Doll and Venetian Blind designs dramatically distinguish themselves from the Salami Slice design in the instance documents. The instance documents corresponding to the Russian Doll and Venetian Blind designs are simple and unencumbered by namespace qualifiers. Above we assumed that the Person and Book elements were declared in the same namespace as a Catalog element. Let's suppose that they were defined in a separate namespace, and then <import>ed into the catalog schema. Here's a representative instance document for the Russian Doll and Venetian Blind designs:
Where the Catalog schema got the Person and Book elements is totally transparent to the instance document. That namespace knowledge is hidden within the schema.
On the other hand, the instance document for the Salami Slice design is filled with many namespace qualifiers:
Where the Catalog schema obtained the Person and Book elements is visibly exposed in the instance document. And this is true irrespective of the value of elementFormDefault. This is, of course, due to the fact that this design declares all the elements globally. This is a really good example to demonstrate the inability of the Salami Slice design to hide namespaces.
What do these principles have to do with XML and Schema design? To see, let's recast the two principles into data modeling terms:
Let's demonstrate applicability of the principles to XML data modeling with an example. Consider this snippet of XML:
Here we see some data, structured using XML syntax. If we think of Book and Person as components, then these components are very loosely coupled (in fact, not coupled at all), and the related data is nicely bundled together in the appropriate component (i.e., high cohesiveness).This instance document exhibits minimal coupling and maximal cohesiveness of its components. This instance document is well designed with regards to the coupling/cohesion properties.
Schemas are also XML documents. Thus we should be able to analyze a schema to see if its components are decoupled and cohesive. Let's analyze the Russian Doll and the Venetian Blind designs to see how coupled/cohesive are the components that these schema designs produce.
Here's the Schema for the above Book and Person elements using the Russian Doll Design:
Recall that the Russian Doll design mirrors the instance document structure; thus in this schema we see two self-contained element declarations, which mirror the two self-contained elements in the instance document.Here's what the schema looks like using the Venetian Blind Design:
Recall that the Venetian Blind design spreads out the components into type definitions and then reuses the types.The instance document we saw earlier is valid for either of these schema designs. Thus both designs enable us to create instance documents which exhibit minimal coupling and maximal cohesiveness.
How do the two schema designs measure up with respect to coupling and cohesiveness of the components they create?
Principle | Russian Doll components | Venetian Blind components |
---|---|---|
Cohesion | High | High |
Coupling | Low | High |
Reusable Components | Low | High |
In general, the Russian Doll design results in components that are highly cohesive, with minimal coupling, but with few reusable components. The Venetian Blind design results in components that are also highly cohesive, with many reusable components, but the components have high coupling.
"Salami Slice" captures both the disassembly process, the resulting flat look of the schema, and implies reassembly as well (into a sandwich).
"Venetian Blind" captures the ability to expose or hide namespaces with a simple switch, and the assembly of slats captures reuse of components.