Domain-Specific Tagging

versus

Universal Tags Plus Semantic Enhancers

Roger Costello

Introduction

Below are examples of two "design styles". They both structure information about a book. But they do so in fundamentally different ways.

The objective of this document is to:

Design Style #1: Structure information using domain-specific tags

<Book> <Title>The Wisdom of Crowds</Title> <Author>James Surowiecki</Author> <Date>2005</Date> <ISBN>0-385-72170-6</ISBN> <Publisher>First Anchor Books</Publisher> </Book>

Design Style #2: Structure information using universal tags (XHTML); enhance semantics with class attributes

<div class="Book"> <span class="tag">Book</span> <ul> <li> <span class="Title"> <span class="tag">Title</span>: <span class="value">The Wisdom of Crowds</span> </span> </li> <li> <span class="Author"> <span class="tag">Author</span>: <span class="value"><cite>James Surowiecki</cite></span> </span> </li> <li> <span class="Date"> <span class="tag">Date</span>: <span class="value">2005</span> </span> </li> <li> <span class="ISBN"> <span class="tag">ISBN</span>: <span class="value">0-385-72170-6</span> </span> </li> <li> <span class="Publisher"> <span class="tag">Publisher</span>: <span class="value"><cite>First Anchor Books</cite></span> </span> </li> </ul> </div>

Note 1: The XHTML employed by design style #2 is "strict XHTML" - content and presentation is completely separate.

Note 2: the above XHTML is "one" way of structuring the information using XHTML tags. If you can suggest a better way, using more semantically appropriate XHTML tags, please send me a note.

Characterize the two design styles

The first thing to notice is that both design styles structure the book information using tags. That is, they are both XML documents.

The first design style uses a collection of tags that presumably were created by a community of like-minded individuals, such as a publishing community. The tags are specific to the information.

The second design style uses a set of tags that are universal descriptors of documents. The tags are not specific to the information. The tags that were used in design style #1 have become class attribute values in design style #2. The class attribute is a mechanism for extending the semantics of the general tags.

Note 3: Namespace-qualified tags in design style #1 become QName class attribute values in design style #2.

Note 4: in addition to the "class" attribute, there is the "id" attribute, and on links (anchors) there are the "rel" and "rev" attributes. All of these are used to extend the semantics of an XHTML document.

Describe the advantages and disadvantages of each design style

Advantages of Design Style #1

Disadvantages of Design Style #1

Advantages of Design Style #2

Disadvantages of Design Style #2

Provide guidance on when each design style should be used

I probably would not want an XSLT document expressed using the second design style.

On the other hand, I probably would want information about a Book expressed using the second design style so that I can reap the benefits of the information being both presentable and parseable.

In general, if the information is to be used as machine instructions, then use the first design style. Otherwise, use design style #2.

XHTML is no longer just for client-side. It can have a significant role server-side. In adopting design style #2, information becomes seamlessly usable both client-side and server-side.

References

  1. A Pragmatic Path to the Semantic Web http://wiki.commerce.net/images/e/ea/CN-TR-06-01.pdf#search=%22Presentable%20and%20Parseable%20Information%22
  2. Presentable and Parseable Information http://www.xfront.com/presentable-parseable-information.html

XHTML Refresher

div
division, a way of indicating a "chunk of information"
span
an inline "chunk of information"
ul
unordered list
li
list item
class
nearly every XHTML element can have an class attribute. This attribute may be used to extend the semantics of an XHTML document.