Michael Kay on Application Processing
Michael Kay is one of the xml community's most respected and influential leaders. During the first part of December 2008 was a discussion on the xml-dev list on the topic of application processing. Below is a summary of Michael Kay's remarks.
You can read more about Michael Kay's ideas on this topic by reading Building Workflow Applications with XML and XQuery.
Q What language(s) do you recommend using for application processing?
... do everything using xml based processing languages (xslt and xquery).
The idea I'm pushing is that you write the application end-to-end using xml based languages. That means that you never convert the data into C++ or Java data types. If you only go half way, by calling xquery from Java and then converting the results to Java values, you lose half the benefits.
Q By "write the application end-to-end using xml based languages" do you mean xml end-to-end or psvi end-to-end?
I'm not too fussed how the xml is represented, the important thing is to avoid converting it unnecessarily into non-xml models such as Java objects or SQL tables. Where "unnecessarily" means "except as required by the need to interface with other applications".
Q Isn't xslt a special-purpose programming language?
It's true that xslt is a specialized language rather than a general-purpose language. But it's capable of a lot more than some people imagine. Some people never really get below the surface to discover its depth.
Q What's wrong with using a mix of xml based processing languages and imperative languages (Java, C++, etc)?
[If you use a mix of languages then you end up] spending 75% of your programming effort converting your data between one type system and another.
Writing your application in a single language, if you can do it, will always give you a substantial reduction in complexity versus using multiple languages.
And writing in a language that is well adapted to the data it is required to process will save you a lot of effort (human and machine effort) in doing data conversion.
Q Aren't there some applications for which an imperative language is needed? Aren't some problems so complex that they need an imperative language?
There are no applications that need imperative programming (functional programming has provably the same computational power). Whether things are more easily done that way is in large measure a matter of your skills and experience. (I remember working with programmers from an older generation who claimed coding was easier if they used GOTO statements.) There are one or two things I still find easier in imperative languages - notably some graph-walking applications - but they are few and far between.
Many problems that appear to be so complex that you need an imperative language turn out, on examination, to be complex only because you are using an imperative language.
Q I agree that functional languages like xslt are quite useful. But many of the underlying hardware architectures are von Neumann based and von Neumann machines work best with imperative programs (languages).
If we believed that we would all still be writing in Assembler, or at any rate using GOTO statements.
Q Isn't application processing faster using an imperative language? Aren't imperative languages easier for programmers to use?
... on both ease-of-use and performance I would go for xslt or xquery in preference to lower-level languages every time.
If you're receiving lexical xml from a web service, the time taken to process it in xslt or xquery is usually less than the time taken to parse and validate it. I would take a lot of convincing that a data binding approach is likely to be faster, given the cost of marshalling and unmarshalling the data.
And on ease of use, I've seen programmers struggling with regenerating all their Java classes when the schema changes and it's horrendous. (Worse, I've seen people refuse to change the schema because it has become too expensive to contemplate!) Having two different models of the same data, understanding how they relate, and organizing yourself to keep them in sync is simply complexity that you don't need.
Q xquery can be usable when you need to access a small subset of an xml document. However, when one needs to access most of the data, or, worse, access the same data many times, data binding will have speed/memory advantage.
Evidence please! I don't see any reason why it should.
(There are still people who maintain that coding in assembler is faster than in C, or that coding in C is faster than in Java. All the evidence is that there are very few people with the skills in the lower-level language to beat the optimizers for the higher-level language. It can be done if you try hard enough, but not by the average programmer.)
Q For most web service applications the Java (or other programming language) representation is actually primary, and xml is just being used for interchange. So surely application processing should be done in an imperative language, right?
I don't think it matters which is primary, the complexity comes from having two representations and keeping them aligned. But I would have thought the format used for data interchange is primary in the sense that it needs to be agreed with other parties, whereas the Java representation is under local control. (Unless of course you're running the same code at both ends, in which case I'm not sure why you're using xml at all.)
Q Suppose I have a web page that contains a form that users fill in. When the submit button is pressed the form data is sent to a server. Don't we need something like Java to receive the data? Doesn't this break the "all-xml" approach you advocate?
It doesn't really matter too much if you have a bit of Java glue to accept the input and fire off the [xslt] transformation, so long as it doesn't try to manipulate the data. But why use Java for this? There are plenty of higher-level frameworks that will do the job - since you're using forms, Orbeon does the job nicely.
Q Why is there such variance in data binding tools with respect to mapping xml schema structures into data structures in an imperative language?
xsd is used for other things [in addition to validation], notably data binding. To a large extent data binding is outside the scope of the xml schema specification itself (xml schema doesn't tell you how its types map to Java or C++) and this probably explains why there is wide variation between products in this area.
Q I would be interested to hear about any non-trivial and practical application other than "xml in, xml out", format changing kind, which are written exclusively in xslt/xquery.
How about an application for managing the creation, processing, and review of capital spending proposals in a large international corporation. The proposals are entered by form-filling using xforms, and each proposal is an xml document in an xml database. The rules defining the approval process (based on the nature of the capital project) are defined in a business rules document, also xml, (for example "anything over $1m requires CEO approval") and the abstract roles defined in that document (such as "CEO", or "finance controller, Taiwan") are mapped to real people in another xml document generated by a transformation of an xml dump of the ldap directory. There's also an xml document that defines the corporate reporting structure, or actually two structures one functional and one geographic. These rules are used to construct an approval schedule for the proposal, which is also stored in the xml database, and emails are sent to the relevant people at the relevant times, with clickable urls that they can use to access the application and approve or deny requests, or ask for more information. There's a full query/reporting system built using the same technology: you define a query by form-filling using xforms, and this generates an xquery to get the data from the database and an xslt stylesheet to format the query results.
I believe the only parts of this application which are not written in xslt, xquery, xforms, or the Orbeon xml pipeline language (a precursor to XProc) are one or two simple extension functions to do things like sending an email or translating Base64 data from the ldap directory.
Last Updated: December 07, 2008