XML, Warts and All

Learning to code with angle brackets, and loving it, can be easy if you get the perfect guide to help you. Megginson's Imperfect XML is that guide.

When you are up to your ass in alligators, it is difficult to remember that your original mission was to drain the swamp.
— Traditional engineering wisdom

Every other month in this space I try to review some book of general interest to developers. Often that is some high-minded philosophical tome from which you can learn the finer points of the software development mindset.

Sometimes, though, you just have to roll up your sleeves and get down in the muck. This month, the book I've chosen is from an author who's clearly right down there alongside you. David Megginson's Imperfect XML is an XML book for anyone who has ever tried to make angle brackets work in the real world and come away bruised and bloodied by the experience. If you haven't been through the battle yet, all the better; Megginson's advice can help you avoid some of the traps that are lurking out there for the unwary beginning XML developer.

Standards with a Grain of Salt
Imperfect XML starts by diving right into the morass of XML standards that litter the development landscape. We all know about the benefits of standards — interoperability, reuse, and abstraction — and Megginson disposes of those in just a few pages. Then he spends rather more time pointing out that standardization can also have drawbacks. You don't often see these bandied about, especially by members of standards committees, but they're real:

  • FUD: the mere existence of a standardization effort can stamp out innovation in a field.
  • Monoculture: standardized software is vulnerable to standardized bugs
  • Square pegs: you can probably name areas where XML has turned up even though it has no real business being there
  • Abstraction: A two-edged sword, taking too abstract a view can hide low-evel problems

Megginson also surveys the major players in the XML standards world and introduces the most important specifications (at least, the ones that are most important in early 2005). If you're new to XML this will probably be good background reading; if not, you can just skim through it.

The most important message on standards is this: You need to choose the standards and specification for your XML projects carefully, not have them chosen for you simply because they are specifications. Look at how widely the specification is supported, who wants to use it to share information with you, and what concrete benefits it will bring to your project. Above all, avoid the temptation to become compliant with a particular specification just to add another set of letters to your resume.

Get With the Plan
The second chapter is short but critical: It discusses planning your XML project. This is an area that often goes wrong, in part because developers tend to ignore the disruptive effects of XML technology on existing organizations. Many XML projects require changes to workflow and ways of doing things as well as to the way that things are coded. Here, Megginson emphasizes influencing users and getting people on board, as well as setting realistic expectations. He's also got a sobering section on the pitfalls that are lurking for many XML projects out there.

The XML Triumvirate
Megginson breaks XML projects down into three broad types, and part two of the book, "XML Implementations," devotes a solid section to each type:

  • XML documents, which use XML for storage but are ultimately designed for people to peruse
  • XML data, which are files designed for machines to pass around and munch on
  • XML networking, the use of XML over the wire (including the ubiquitous Web services)

Although some projects will be hybrids or stranger things, in all likelihood you can identify your own XML project as being one of these three types. Once you do, I suggest that you read the appropriate chapter. And then re-read it as necessary until the points sink in. If necessary, photocopy the pages and sleep with them under your pillow. This central section of the book is where you really get the benefit of someone else's pain. In the theoretical world, XML is the wonderful solution to everything. In the real world, there are the inevitable places where things just don't work out quite right, and you need to get out the duct tape and baling wire to make it all fit together. These chapters are your road maps to the pain points.

For example, XML documents are sometimes promoted as ideal for single-source publishing: write once, and then use transforms to publish to both print and Web with no further work. That's fine, until you realize that what works as one long book chapter makes for an atrociously long Web page with too much scrolling. Do you then introduce additional XML markup to indicate how to break up the Web version into multiple pages, at the cost of breaking the neat separation between form and content? Or do you accept the hideous Web page to preserve the cost savings? There's no fixed answer, but it's good to know that the issue is there, and to understand the tradeoffs up front.

As for XML networking ... Megginson reviews everything from RSS to XML-RPC to RESTful ideas to Web services to grid computing to SOAP and a whole lot more, showing two things. First, there is a heck of a lot of activity in this area. Second, there is a lot of competition and a huge tug of war between completeness and simplicity going on, and it's not at all clear what will win out in the long run, and which styles of communication will end up in little backwaters used by only a few developers. This is an area in which to tread carefully for the time being.

For the Advanced Student
Megginson wraps up the book with a medley of topics that anyone doing serious XML work should at least be familiar with. If you're just getting started on your first few XML projects, you might want to set this part aside until later, but it's worth at least knowing what's waiting for you. These are the minefields that have been marked out so that you don't step in them by accident.

The first topic here is XML searching — not searching within XML files, per se, but using XML information to make regular full-text search results better. This chapter, I fear, suffers from a bit of myopia in that it mentions but doesn't emphasize the problem of deceptive markup. For the public Internet, I fear, any search mechanism that relies on metadata supplied by document authors will fall prey to the same forces that give us spam in all its forms; it's cheap to generate whatever markup appears to give good search engine results in the hopes of generating any clicks at all. On a private network, these techniques might prove to be useful and interesting.

The next chapter is devoted to XML and legacy information. This is an excellent chapter, discussing both converting the legacy data and alternatives such as placing an XML façade in front of the legacy data and using XML metadata as a sort of card catalog to locate legacy data. If you're faced with a large mass of old data to integrate into a modern system, the alternatives here may get your creative juices flowing.

The final chapter goes over issues of XML performance and size. There are issues here, and Megginson shows what they are (and are not) and discusses how to work around them when they crop up. The important thing to remember is that those who dismiss XML out of hand as being fat and slow are not precisely correct.

Learning to Love the Angle Brackets
If you're like me, your own relationship with XML has not always been wonderful. Certainly it's become an unavoidable part of the development toolset these days, but it's also a darned nuisance to work with in many cases due to primitive tools (a situation that's been getting better in recent years). But the other thing that's been missing, at least from my own XML development life, has been a solid body of knowledge on XML best practices. This book goes a long way to fill that gap, and it's earned a spot on the bookshelf right next to my desk, where it will be handy the next time that a document sprinkled with angle brackets lands in my life.

Want to read more of Mike's work? Visit his Larkware site for daily updates at http://www.larkware.com.

About the Author

Mike Gunderloy, MCSE, MCSD, MCDBA, is a former MCP columnist and the author of numerous development books.

comments powered by Disqus
Most   Popular