Domain-Specific Tagging

Universal Tags Plus Semantic Enhancers

Introduction

Below are examples of two "design styles". They both structure information about a book. But they do so in fundamentally different ways.

The objective of this document is to:

Characterize the two design styles
Describe the advantages and disadvantages of each design style
Provide guidance on when each design style should be used

Design Style #1: Structure information using domain-specific tags


<Book>
    <Title>The Wisdom of Crowds</Title>
    <Author>James Surowiecki</Author>
    <Date>2005</Date>
    <ISBN>0-385-72170-6</ISBN>
    <Publisher>First Anchor Books</Publisher>
</Book>

Design Style #2: Structure information using universal tags (XHTML); enhance semantics with class attributes


<div class="Book">
    <span class="tag">Book</span>
    <ul>
        <li>
            <span class="Title">
                <span class="tag">Title</span>:
                <span class="value">The Wisdom of Crowds</span>
            </span>
        </li>
        <li>
            <span class="Author">
                <span class="tag">Author</span>:
                <span class="value"><cite>James Surowiecki</cite></span>
            </span>
        </li>
        <li>
            <span class="Date">
                <span class="tag">Date</span>:
                <span class="value">2005</span>
            </span>
        </li>
        <li>
            <span class="ISBN">
                <span class="tag">ISBN</span>:
                <span class="value">0-385-72170-6</span>
            </span>
        </li>
        <li>
            <span class="Publisher">
                <span class="tag">Publisher</span>:
                <span class="value"><cite>First Anchor Books</cite></span>
            </span>
        </li>
    </ul>
</div>

Note 1: The XHTML employed by design style #2 is "strict XHTML" - content and presentation is completely separate.

Note 2: the above XHTML is "one" way of structuring the information using XHTML tags. If you can suggest a better way, using more semantically appropriate XHTML tags, please send me a note.

Characterize the two design styles

The first thing to notice is that both design styles structure the book information using tags. That is, they are both XML documents.

The first design style uses a collection of tags that presumably were created by a community of like-minded individuals, such as a publishing community. The tags are specific to the information.

The second design style uses a set of tags that are universal descriptors of documents. The tags are not specific to the information. The tags that were used in design style #1 have become class attribute values in design style #2. The class attribute is a mechanism for extending the semantics of the general tags.

Note 3: Namespace-qualified tags in design style #1 become QName class attribute values in design style #2.

Note 4: in addition to the "class" attribute, there is the "id" attribute, and on links (anchors) there are the "rel" and "rev" attributes. All of these are used to extend the semantics of an XHTML document.

Describe the advantages and disadvantages of each design style

Advantages of Design Style #1

The tags have meaning (semantics) specific to the information. This facilitates development of applications that can perform processing specific to the information. For example, an application will have a much easier time fetching the author of the book when the information is structured in this fashion:
<Book> <Title>The Wisdom of Crowds</Title> <Author>James Surowiecki</Author> <Date>2005</Date> <ISBN>0-385-72170-6</ISBN> <Publisher>First Anchor Books</Publisher> </Book>
than when the information is structured in this fashion:
<ul> <li>The Wisdom of Crowds</li> <li>James Surowiecki</li> <li>2005</li> <li>0-385-72170-6</li> <li>First Anchor Books</li> </ul>
On the other hand, fetching the author of the book will be equally easy when the information is structured in this fashion:
<ul class="Book"> <li class="Title">The Wisdom of Crowds</li> <li class="Author">James Surowiecki</li> <li class="Date">2005</li> <li class="ISBN">0-385-72170-6</li> <li class="Publisher">First Anchor Books</li> </ul>
And this later version has the additional advantage that it can be instantly processed by any XHTML-aware application, such as browsers, screen-readers, cellphones, PDAs, and so on.

Disadvantages of Design Style #1

The tags have meaning within the community that created them, but outside that community the tags may have no meaning.
Cross-community machine-to-machine interoperability is difficult.

Advantages of Design Style #2

The tags are widely understood and broadly adopted. There are over a billion devices (browsers, cellphones, screen-readers, PDAs) that understand and can process information marked up using XHTML tags.
The tags are independent of any community. Their meaning is understood by all communities. Thus, there is universal interoperability across communities at the presentation level.
The class attributes enhance the semantics, thereby facilitating the development of applications which can perform processing specific to the information.
"Each XHTML element can have multiple classes (it's a space-delimited list)" [1]. For example, James Surowiecki is both an author and a writer for the New York Times:
 Author: <cite>James Surowiecki</cite> 
Information is seamlessly usable both client-side and server-side.
The information is both presentable and parseable.

Disadvantages of Design Style #2

There is more markup. For applications that are designed for specific purposes, the extra markup may be an annoyance.
Cross-community machine-to-machine interoperability is difficult.

Provide guidance on when each design style should be used

I probably would not want an XSLT document expressed using the second design style.

On the other hand, I probably would want information about a Book expressed using the second design style so that I can reap the benefits of the information being both presentable and parseable.

In general, if the information is to be used as machine instructions, then use the first design style. Otherwise, use design style #2.

XHTML is no longer just for client-side. It can have a significant role server-side. In adopting design style #2, information becomes seamlessly usable both client-side and server-side.

References

A Pragmatic Path to the Semantic Web http://wiki.commerce.net/images/e/ea/CN-TR-06-01.pdf#search=%22Presentable%20and%20Parseable%20Information%22
Presentable and Parseable Information http://www.xfront.com/presentable-parseable-information.html

XHTML Refresher

div: division, a way of indicating a "chunk of information"

span: an inline "chunk of information"

ul: unordered list

li: list item

class: nearly every XHTML element can have an class attribute. This attribute may be used to extend the semantics of an XHTML document.