An XML Document is Pure Text

An XML document is comprised purely of text. That is, the contents of an XML document is a string of characters. There are no integers in an XML document. There are no floating point values in an XML document. There are only characters.

Example

Here is a simple XML document. It would appear that the value of the <x> element is an integer:

   <?xml version="1.0"?>
   <x>23</x>

However, that is not the case. The 23 is two characters, 2 and 3.

You can prove to yourself that they are indeed characters by viewing the document in an editor that allows you to see the byte values that are used to encode the XML file. In the graphic below each line of the XML file is shown, along with its bytes (displayed in hex):

Hex values of the XML document

You can see that the byte values of 23 are hex 32 and hex 33, which corresponds, in UTF-8, to the character 2, and the character 3.

Compare with the integer value 23; it's binary value is 10111 (hex 17). If the 23 in the XML file really was an integer then its byte value would be 10111 (hex 17). [Note: an alternate way of storing integers is using Binary Coded Decimal.]

Thus, an XML document is pure text. And you use a text editor to create, edit and read an XML document.

Manipulating XML

Consider manipulating an XML document using XSLT. Here is shown an XSLT statement which multiplies the value of the <x> element by the number 2:

    <xsl:value-of select="x * 2"/>

How can the two characters 23 be multiplied by an integer 2?

Answer: the two characters are first converted into an integer.

Where this conversion takes place depends on whether XML Schema validation is involved. An XML Schema validator will create a model (PSVI, Post-Schema-Validation Infoset) of the XML document, and the validator may put into the model both the text value "23" as well as its converted (typed) integer value (hex 17). An XSLT 2.0 processor may use the PSVI's typed value directly to perform the multiplication. If no PSVI is available then the XSLT processor itself may perform the conversion.

Declaring an Element's Datatype in a Schema

Consider an XML Schema that declares the element <x> as an integer:

   <element name="x" type="integer"/>

This element declaration is not stating: The value of the element <x> in an XML instance document is an integer.

Rather, it is stating:

Why is an XML Document Pure Text?

XML was designed to be a human-readable language for marking up data.

Because XML documents are pure text, you can use any text editor to create, edit, and read your data — you can view the 23 in the above XML document.

Because XML documents are pure text, different processors can act on the information in ways unique to the processor:

[Text is a lowest-common-denominator, consumable by varied applications.]

Because an XML document is pure text, it can be read by humans. And because XML organizes the text into a hierarchical structure it gives rise to the ability to make some sense out of the text even if the text is in a language that you don't understand.

Acknowledgements

The following people contributed to the creation of this document:

Tags

Last Updated: September 27, 2007