XML, Warts and All
Learning to code with angle brackets, and loving it, can be easy if you get the perfect guide to help you. Megginson's Imperfect XML is that guide.
When you are up to your ass in alligators, it is difficult to remember
that your original mission was to drain the swamp.
Traditional engineering wisdom
Every other month in this space I try to review some book of general
interest to developers. Often that is some high-minded philosophical tome
from which you can learn the finer points of the software development
mindset.
Sometimes, though, you just have to roll up your sleeves and get down
in the muck. This month, the book I've chosen is from an author who's
clearly right down there alongside you. David Megginson's Imperfect
XML is an XML book for anyone who has ever tried to make angle
brackets work in the real world and come away bruised and bloodied by
the experience. If you haven't been through the battle yet, all the better;
Megginson's advice can help you avoid some of the traps that are lurking
out there for the unwary beginning XML developer.
Standards with a Grain of Salt
Imperfect XML starts by diving right into the morass of
XML standards that litter the development landscape. We all know about
the benefits of standards interoperability, reuse, and abstraction
and Megginson disposes of those in just a few pages. Then he spends
rather more time pointing out that standardization can also have drawbacks.
You don't often see these bandied about, especially by members of standards
committees, but they're real:
- FUD: the mere existence of a standardization effort can stamp
out innovation in a field.
- Monoculture: standardized software is vulnerable to standardized
bugs
- Square pegs: you can probably name areas where XML has turned
up even though it has no real business being there
- Abstraction: A two-edged sword, taking too abstract a view
can hide low-evel problems
Megginson also surveys the major players in the XML standards world and
introduces the most important specifications (at least, the ones that
are most important in early 2005). If you're new to XML this will probably
be good background reading; if not, you can just skim through it.
The most important message on standards is this: You need to choose the
standards and specification for your XML projects carefully, not have
them chosen for you simply because they are specifications. Look at how
widely the specification is supported, who wants to use it to share information
with you, and what concrete benefits it will bring to your project. Above
all, avoid the temptation to become compliant with a particular specification
just to add another set of letters to your resume.
Get With the Plan
The second chapter is short but critical: It discusses planning
your XML project. This is an area that often goes wrong, in part because
developers tend to ignore the disruptive effects of XML technology on
existing organizations. Many XML projects require changes to workflow
and ways of doing things as well as to the way that things are coded.
Here, Megginson emphasizes influencing users and getting people on board,
as well as setting realistic expectations. He's also got a sobering section
on the pitfalls that are lurking for many XML projects out there.
The XML Triumvirate
Megginson breaks XML projects down into three broad types, and
part two of the book, "XML Implementations," devotes a solid
section to each type:
- XML documents, which use XML for storage but are ultimately designed
for people to peruse
- XML data, which are files designed for machines to pass around and
munch on
- XML networking, the use of XML over the wire (including the ubiquitous
Web services)
Although some projects will be hybrids or stranger things, in all likelihood
you can identify your own XML project as being one of these three types.
Once you do, I suggest that you read the appropriate chapter. And then
re-read it as necessary until the points sink in. If necessary, photocopy
the pages and sleep with them under your pillow. This central section
of the book is where you really get the benefit of someone else's pain.
In the theoretical world, XML is the wonderful solution to everything.
In the real world, there are the inevitable places where things just don't
work out quite right, and you need to get out the duct tape and baling
wire to make it all fit together. These chapters are your road maps to
the pain points.
For example, XML documents are sometimes promoted as ideal for single-source
publishing: write once, and then use transforms to publish to both print
and Web with no further work. That's fine, until you realize that what
works as one long book chapter makes for an atrociously long Web page
with too much scrolling. Do you then introduce additional XML markup to
indicate how to break up the Web version into multiple pages, at the cost
of breaking the neat separation between form and content? Or do you accept
the hideous Web page to preserve the cost savings? There's no fixed answer,
but it's good to know that the issue is there, and to understand the tradeoffs
up front.
As for XML networking ... Megginson reviews everything from RSS to XML-RPC
to RESTful ideas to Web services to grid computing to SOAP and a whole
lot more, showing two things. First, there is a heck of a lot of activity
in this area. Second, there is a lot of competition and a huge tug of
war between completeness and simplicity going on, and it's not at all
clear what will win out in the long run, and which styles of communication
will end up in little backwaters used by only a few developers. This is
an area in which to tread carefully for the time being.
For the Advanced Student
Megginson wraps up the book with a medley of topics that anyone
doing serious XML work should at least be familiar with. If you're just
getting started on your first few XML projects, you might want to set
this part aside until later, but it's worth at least knowing what's waiting
for you. These are the minefields that have been marked out so that you
don't step in them by accident.
The first topic here is XML searching not searching within XML
files, per se, but using XML information to make regular full-text search
results better. This chapter, I fear, suffers from a bit of myopia in
that it mentions but doesn't emphasize the problem of deceptive markup.
For the public Internet, I fear, any search mechanism that relies on metadata
supplied by document authors will fall prey to the same forces that give
us spam in all its forms; it's cheap to generate whatever markup appears
to give good search engine results in the hopes of generating any clicks
at all. On a private network, these techniques might prove to be useful
and interesting.
The next chapter is devoted to XML and legacy information. This is an
excellent chapter, discussing both converting the legacy data and alternatives
such as placing an XML façade in front of the legacy data and using
XML metadata as a sort of card catalog to locate legacy data. If you're
faced with a large mass of old data to integrate into a modern system,
the alternatives here may get your creative juices flowing.
The final chapter goes over issues of XML performance and size. There
are issues here, and Megginson shows what they are (and are not) and discusses
how to work around them when they crop up. The important thing to remember
is that those who dismiss XML out of hand as being fat and slow are not
precisely correct.
Learning to Love the Angle Brackets
If you're like me, your own relationship with XML has not always
been wonderful. Certainly it's become an unavoidable part of the development
toolset these days, but it's also a darned nuisance to work with in many
cases due to primitive tools (a situation that's been getting better in
recent years). But the other thing that's been missing, at least from
my own XML development life, has been a solid body of knowledge on XML
best practices. This book goes a long way to fill that gap, and it's earned
a spot on the bookshelf right next to my desk, where it will be handy
the next time that a document sprinkled with angle brackets lands in my
life.
Want to read more of Mike's work? Visit his Larkware site for daily
updates at http://www.larkware.com.
About the Author
Mike Gunderloy, MCSE, MCSD, MCDBA, is a former MCP columnist and the author of numerous development books.