Statistics
50
Views
0
Downloads
0
Donations
Support
Share
Uploader

高宏飞

Shared on 2025-12-07

AuthorBrett McLaughlin

XML has been the biggest buzzword on the Internet community for the past year. But how do you cut through all the hype and actually put it to work? Java revolutionized the programming world by providing a platform-independent programming language. XML takes the revolution a step further by providing a platform-independent language for interchanging data. Java and XML share many features that are ideal for building Web-based enterprise applications, such as platform-independence, extensibility, reusability, global language (Unicode) support, and both are based on industry standards. Together Java and XML allow enterprises to simplify and lower cost of information sharing and data exchange. Java and XML shows you how to put the two together, building real-world applications in which both the code and the data are truly portable.This book covers: * The basics of XML * Using standard Java APIs to parse XML * Designing new document types using DTDs and Schemas * Writing programs that generate XML data * Transforming XML into different forms using XSL transformations (XSL/T) * XML-RPC * Using a web publishing framework like Apache-Cocoon * XML as a configuration languageThis is the first book to cover the most recent versions of the DOM specification (DOM 2), the SAX API (SAX 2) and Sun's Java API for XML.

Tags
No tags
ISBN: 0596001975
Publisher: O'Reilly
Publish Year: 2001
Language: 英文
Pages: 428
File Format: PDF
File Size: 5.2 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

(This page has no text content)
Java & XML, 2nd Edition Brett McLaughlin Publisher: O'Reilly Second Edition September 2001 ISBN: 0-596-00197-5, 528 pages New chapters on Advanced SAX, Advanced DOM, SOAP and data binding, as well as new examples throughout, bring the second edition of Java & XML thoroughly up to date. Except for a concise introduction to XML basics, the book focuses entirely on using XML from Java applications. It's a worthy companion for Java developers working with XML or involved in messaging, web services, or the new peer-to-peer movement.
Table of Contents Preface ..................................................... Organization ................................................. Who Should Read This Book? ..................................... Software and Versions .......................................... Conventions Used in This Book .................................... Comments and Questions ........................................ Acknowledgments ............................................. 1 1 4 4 5 5 6 1. Introduction ................................................ 1.1 XML Matters .............................................. 1.2 What's Important? ........................................... 1.3 The Essentials ............................................. 1.4 What's Next? .............................................. 8 8 10 11 14 2. Nuts and Bolts .............................................. 2.1 The Basics ............................................... 2.2 Constraints ............................................... 2.3 Transformations ............................................ 2.4 And More... ............................................... 2.5 What's Next? .............................................. 15 15 24 31 38 38 3. SAX ...................................................... 3.1 Getting Prepared ............................................ 3.2 SAX Readers .............................................. 3.3 Content Handlers ........................................... 3.4 Error Handlers ............................................. 3.5 Gotcha! ................................................. 3.6 What's Next? .............................................. 39 39 41 47 60 65 68 4. Advanced SAX .............................................. 4.1 Properties and Features ....................................... 4.2 More Handlers ............................................. 4.3 Filters and Writers .......................................... 4.4 Even More Handlers ......................................... 4.5 Gotcha! ................................................. 4.6 What's Next? .............................................. 69 69 75 80 86 90 92 5. DOM ..................................................... 5.1 The Document Object Model ................................... 5.2 Serialization .............................................. 5.3 Mutability ................................................ 5.4 Gotcha! ................................................. 5.5 What's Next? .............................................. 93 93 97 108 109 110 6. Advanced DOM ............................................. 6.1 Changes ................................................. 6.2 Namespaces ............................................... 6.3 DOM Level 2 Modules ....................................... 6.4 DOM Level 3 ............................................. 6.5 Gotcha! ................................................. 6.6 What's Next? .............................................. 111 111 120 124 136 139 140
7. JDOM .................................................... 7.1 The Basics ............................................... 7.2 PropsToXML ............................................. 7.3 XMLProperties ............................................ 7.4 Is JDOM a Standard? ........................................ 7.5 Gotcha! ................................................. 7.6 What's Next? .............................................. 141 141 145 154 164 165 167 8. Advanced JDOM ............................................ 8.1 Helpful JDOM Internals ....................................... 8.2 JDOM and Factories ......................................... 8.3 Wrappers and Decorators ...................................... 8.4 Gotcha! ................................................. 8.5 What's Next? .............................................. 168 168 172 177 188 190 9. JAXP ..................................................... 9.1 API or Abstraction .......................................... 9.2 JAXP 1.0 ................................................ 9.3 JAXP 1.1 ................................................ 9.4 Gotcha! ................................................. 9.5 What's Next? .............................................. 191 191 192 199 208 209 10. Web Publishing Frameworks ................................... 10.1 Selecting a Framework ....................................... 10.2 Installation ............................................... 10.3 Using a Publishing Framework ................................. 10.4 XSP ................................................... 10.5 Cocoon 2.0 and Beyond ...................................... 10.6 What's Next? ............................................. 210 211 213 217 230 244 247 11. XML-RPC ................................................ 11.1 RPC Versus RMI .......................................... 11.2 Saying Hello ............................................. 11.3 Putting the Load on the Server .................................. 11.4 The Real World ........................................... 11.5 What's Next? ............................................. 248 248 250 261 274 277 12. SOAP .................................................... 12.1 Starting Out .............................................. 12.2 Setting Up ............................................... 12.3 Getting Dirty ............................................. 12.4 Going Further ............................................. 12.5 What's Next? ............................................. 278 278 281 285 293 300 13. Web Services ............................................... 13.1 Web Services ............................................. 13.2 UDDI .................................................. 13.3 WSDL ................................................. 13.4 Putting It All Together ....................................... 13.5 What's Next? ............................................. 301 301 302 303 306 323 14. Content Syndication .......................................... 14.1 The Foobar Public Library .................................... 14.2 mytechbooks.com .......................................... 14.3 Push Versus Pull ........................................... 14.4 What's Next? ............................................. 324 325 333 341 350
15. Data Binding ............................................... 15.1 First Principles ............................................ 15.2 Castor .................................................. 15.3 Zeus ................................................... 15.4 JAXB .................................................. 15.5 What's Next? ............................................. 351 352 357 364 372 379 16. Looking Forward ........................................... 16.1 XLink .................................................. 16.2 XPointer ................................................ 16.3 XML Schema Bindings ...................................... 16.4 And the Rest... ............................................ 16.5 What's Next? ............................................. 380 380 382 385 386 386 A. API Reference .............................................. A.1 SAX 2.0 ................................................. A.2 DOM Level 2 ............................................. A.3 JAXP 1.1 ................................................ A.4 JDOM 1.0 (Beta 7) .......................................... 387 387 398 404 410 B. SAX 2.0 Features and Properties ................................. B.1 Core Features ............................................. B.2 Core Properties ............................................ 420 420 421 Colophon .................................................... 423
Java & XML, 2nd Edition 1 Preface When I wrote the preface to the first edition of Java & XML just over a year ago, I had no idea what I was getting into. I made jokes about XML appearing on hats and t-shirts; yet as I sit writing this, I'm wearing a t-shirt with "XML" emblazoned across it, and yes, I have a hat with XML on it also (in fact, I have two!). So, the promise of XML has been recognized, without any doubt. And that's good. However, it has meant that more development is occurring every day, and the XML landscape is growing at a pace I never anticipated, even in my wildest dreams. While that's great for XML, it has made looking back at the first edition of this book somewhat depressing; why is everything so out of date? I talked about SAX 2.0, and DOM Level 2 as twinklings in eyes. They are now industry standard. I introduced JDOM, and now it's in JSR (Sun's Java Specification Request process). I hadn't even looked at SOAP, UDDI, WSDL, and XML data binding. They take up three chapters in this edition! Things have changed, to say the least. If you're even remotely suspicious that you may have to work with XML in the next few months, this book can help. And if you've got the first edition lying somewhere on your desk at work right now, I invite you to browse the new one; I think you'll see that this book is still important to you. I've thrown out all the excessive descriptions of basic concepts, condensed the basic XML material into a single chapter, and rewritten nearly every example; I've also added many new examples and chapters. In other words, I tried to make this an in-depth technical book with lots of grit. It will take you beginners a little longer, as I do less handholding, but you'll find the knowledge to be gained much greater. Organization This book is structured in a very particular way: the first half of the book, Chapter 1 through Chapter 9, focuses on grounding you in XML and the core Java APIs for handling XML. For each of the three XML manipulation APIs (SAX, DOM, and JDOM), I'll give you a chapter on the basics, and then a chapter on more advanced concepts. Chapter 10 is a transition chapter, starting to move up the XML "stack" a bit. It covers JAXP, which is an abstraction layer over SAX and DOM. The remainder of the book, Chapter 11 through Chapter 15, focuses on specific XML topics that continually are brought up at conferences and tutorials I am involved with, and seek to get you neck-deep in using XML in your applications. These topics include new chapters on SOAP, data binding, and an updated look at business-to-business. Finally, there are two appendixes to wrap up the book. The summary of this content is as follows: Chapter 1 We will look at what all the hype is about, examine the XML alphabet soup, and spend time discussing why XML is so important to the present and future of enterprise development.
Java & XML, 2nd Edition 2 Chapter 2 This is a crash course in XML basics, from XML 1.0 to DTDs and XML Schema to XSLT to Namespaces. For readers of the first edition, this is the sum total (and then some) of all the various chapters on working with XML. Chapter 3 The Simple API for XML (SAX), our first Java API for handling XML, is introduced and covered in this chapter. The parsing lifecycle is detailed, and the events that can be caught by SAX and used by developers are demonstrated. Chapter 4 We'll push further with SAX in this chapter, covering less-used but still powerful items in the API. You'll find out how to use XML filters to chain callback behavior, use XML writers to output XML with SAX, and look at some of the less commonly used SAX handlers like LexicalHandler and DeclHandler. Chapter 5 This chapter moves on through the XML landscape to the next Java and XML API, the DOM (Document Object Model). You'll learn DOM basics, find out what is in the current specification (DOM Level 2), and how to read and write DOM trees. Chapter 6 Moving on through DOM, you'll learn about the various DOM modules like Traversal, Range, Events, CSS, and HTML. We'll also look at what the new version, DOM Level 3, offers and how to use these new features. Chapter 7 This chapter introduces JDOM, and describes how it is similar to and different from DOM and SAX. It covers reading and writing XML using this API. Chapter 8 In a closer examination of JDOM, we'll look at practical applications of the API, how JDOM can use factories with your own JDOM subclasses, and JAXP integration. You'll also see XPath in action in tandem with JDOM. Chapter 9 Now a full-fledged API with support for parsing and transformations, JAXP merits its own chapter. Here, we'll look at both the 1.0 and 1.1 versions, and you'll learn how to use this API to its fullest.
Java & XML, 2nd Edition 3 Chapter 10 This chapter looks at what a web publishing framework is, why it matters to you, and how to choose a good one. We then cover the Apache Cocoon framework, taking an in-depth look at its feature set and how it can be used to serve highly dynamic content over the Web. Chapter 11 In this chapter, we'll cover Remote Procedure Calls (RPC), its relevance in distributed computing as compared to RMI, and how XML makes RPC a viable solution for some problems. We'll then look at using XML-RPC Java libraries and building XML-RPC clients and servers. Chapter 12 In this chapter, we'll look at using configuration data in an XML format, and see why that format is so important to cross-platform applications, particularly as it relates to distributed systems and web services. Chapter 13 Continuing the discussions of SOAP and web services, this chapter details two important technologies, UDDI and WSDL. Chapter 14 Continuing in the vein of business-to-business applications, this chapter introduces another way for businesses to interoperate, using content syndication. You'll learn about Rich Site Summary, building information channels, and even a little Perl. Chapter 15 Moving up the XML "stack," this chapter covers one of the higher-level Java and XML APIs, XML data binding. You'll learn what data binding is, how it can make working with XML a piece of cake, and the current offerings. I'll look at three frameworks: Castor, Zeus, and Sun's early access release of JAXB, the Java Architecture for XML Data Binding. Chapter 16 This chapter points out some of the interesting things coming up over the horizon, and lets you in on some extra knowledge on each. Some of these guesses may be completely off; others may be the next big thing. Appendix A This appendix details all the classes, interfaces, and methods available for use in the SAX, DOM, JAXP, and JDOM APIs.
Java & XML, 2nd Edition 4 Appendix B This appendix details the features and properties available to SAX 2.0 parser implementations. Who Should Read This Book? This book is based on the premise that XML is quickly becoming (and to some extent has already become) an essential part of Java programming. The chapters instruct you in the use of XML and Java, and other than in Chapter 1, they do not focus on if you should use XML. If you are a Java developer, you should use XML, without question. For this reason, if you are a Java programmer, want to be a Java programmer, manage Java programmers, or are associated with a Java project, this book is for you. If you want to advance, become a better developer, write cleaner code, or have projects succeed on time and under budget; if you need to access legacy data, need to distribute system components, or just want to know what the XML hype is about, this book is for you. I tried to make as few assumptions about you as possible; I don't believe in setting the entry point for XML so high that it is impossible to get started. However, I also believe that if you spent your money on this book, you want more than the basics. For this reason, I only assumed that you know the Java language and understand some server-side programming concepts (such as Java servlets and Enterprise JavaBeans). If you have never coded Java before or are just getting started with the language, you may want to read Learning Javaby Pat Niemeyer and Jonathan Knudsen (O'Reilly) before starting this book. I do not assume that you know anything about XML, and start with the basics. However, I do assume that you are willing to work hard and learn quickly; for this reason we move rapidly through the basics so that the bulk of the book can deal with advanced concepts. Material is not repeated unless appropriate, so you may need to reread previous sections or flip back and forth as we use previously covered concepts in later chapters. If you know some Java, want to learn XML, and are prepared to enter some example code into your favorite editor, you should be able to get through this book without any real problem. Software and Versions This book covers XML 1.0 and the various XML vocabularies in their latest form as of July of 2001. Because various XML specifications covered are not final, there may be minor inconsistencies between printed publications of this book and the current version of the specification in question. All the Java code used is based on the Java 1.2 platform. If you're not using Java 1.2 by now, start to work to get there; the collections classes alone are worth it. The Apache Xerces parser, Apache Xalan processor, Apache SOAP library, and Apache FOP libraries were the latest stable versions available as of June of 2000, and the Apache Cocoon web publishing framework used is Version 1.8.2. The XML-RPC Java libraries used are Version 1.0 beta 4. All software used is freely available and can be obtained online from http://java.sun.com/, http://xml.apache.org/, and http://www.xml-rpc.com/. The source for the examples in this book is contained completely within the book itself. Both source and binary forms of all examples (including extensive Javadoc not necessarily included in the text) are available online from http://www.oreilly.com/catalog/javaxml2/ and
Java & XML, 2nd Edition 5 http://www.newinstance.com/. All of the examples that could run as servlets, or be converted to run as servlets, can be viewed and used online at http://www.newinstance.com/. Conventions Used in This Book The following font conventions are used in this book. Italic is used for: • Unix pathnames, filenames, and program names • Internet addresses, such as domain names and URLs • New terms where they are defined Boldface is used for: • Names of GUI items: window names, buttons, menu choices, etc. Constant Width is used for: • Command lines and options that should be typed verbatim • Names and keywords in Java programs, including method names, variable names, and class names • XML element names and tags, attribute names, and other XML constructs that appear as they would within an XML document Comments and Questions Please address comments and questions concerning this book to the publisher: O'Reilly & Associates, Inc. 101 Morris Street Sebastopol, CA 95472 (800) 998-9938 (in the U.S. or Canada) (707) 829-0515 (international or local) (707) 829-0104 (fax) You can also send us messages electronically. To be put on the mailing list or request a catalog, send email to: info@oreilly.com To ask technical questions or comment on the book, send email to: bookquestions@oreilly.com We have a web site for the book, where we'll list examples, errata, and any plans for future editions. You can access this page at: http://www.oreilly.com/catalog/javaxml2/
Java & XML, 2nd Edition 6 For more information about this book and others, see the O'Reilly web site: http://www.oreilly.com/ Acknowledgments Well, here I am writing acknowledgments again. It's no easier to remember everybody this time than it was the first. My editor, Mike Loukides, keeps me up at night stressing out about getting things done, which is exactly what a good editor does! Kyle Hart, marketing superwoman, keeps things going and reminds me that there's light at the end of the tunnel. Tim O'Reilly and Frank Willison are patient, yet pushy, just what good bosses should be. And Bob Eckstein and Marc Loy were there for me for pesky Swing GUI problems. (Besides, Bob's just funny. Face it.) O'Reilly is as good as it gets, all around. I'm honored to be associated with them. I also want to think the incredible team of reviewers for this book. Many times, these folks turned a chapter around in less than 24 hours, yet still managed to give honest technical feedback. These guys are a large part of why this book stayed technical. Robert Sese, Philip Nelson, and Victor Brilon, you guys are amazing. Of course, I've always got to thank my partner in crime, Jason Hunter, for being annoyingly dedicated to JDOM and other technical issues (take a night off, man!). Finally, my company, Lutris Technologies, is about as good a place as you could hope to work for. They let me work long hours on this book, with never a complaint. In particular, Yancy Lind, Paul Morgan, David Young, and Keith Bigelow are simply the best at what they do. Thanks, guys! To my parents, Larry and Judy McLaughlin, thanks again. I love you both for putting up with your rather ambitious and driven son (you realize, of course, those characteristics also make for a terribly obnoxious child!). Sarah Jane, my aunt, and my grandparents, Dean and Gladys McLaughlin, don't ever think that because I don't see you often I don't think about you all the time. Granddad, I'm more thankful than you'll ever know that you're getting to see a second edition. I love you all. To my second set of parents (my wife's folks), Gary and Shirley Greathouse, you're just the best. One day I'll learn to take these writing skills and explain what you both mean to me, but it might take a whole book on its own. I love you both, for your humor and your wisdom. To Quinn and Joni for providing such levity at Sunday lunches. To Lonnie and Laura, can't wait to see Baby J. To Bill and Terri for being friends, and very wise ones at that, and to Bill for being a pastor like no other. The laughter in my life comes from several hilarious characters, and I just can't pass up mentioning them here: Kendra, Brittany, Lisette, Janay, Rocky, Dustin, Tony, Stephanie, Robbie, Erin, Angela, Mike, Matt, Carlos, and John. I'll see you all Sunday, and can we please stop going to Mazzio's? And to the nonhuman part of my life, my dogs: Seth, Charlie, Jake, Moses, Molly, and Daisy. You haven't lived until the cold tongue of a basset hound wakes you up in the morning. Finally, to the two people that mean more to me than anyone; my grandfather, Robert Earl Burden, who one day I'll see again. I think about you every day, and my children will hear about you soon. Most of all, to my wife, Leigh. Words just don't cut it. One day all the songs
Java & XML, 2nd Edition 7 and tears that have come to me because of what you mean to me will come out, and you'll finally understand how much you mean to me. And to the Lord who got me this far. Even so, come Lord Jesus.
Java & XML, 2nd Edition 8 Chapter 1. Introduction Introductory chapters are typically pretty easy to write. In most books, you give an overview of the technology covered, explain a few basics, and try and get the reader interested. However, for this second edition of Java and XML, things aren't so easy. In the first edition, there were still a lot of people coming to XML, or skeptics wanting to see if this new type of markup was really as good as the hype. Over a year later, everyone is using XML in hundreds of ways. In a sense, you probably don't need an introduction. But I'll give you an idea of what's going to be covered, why it matters, and what you'll need to get up and running. 1.1 XML Matters First, let me simply say that XML matters. I know that sounds like the beginning of a self-help seminar, but it's worth starting with. There are still many developers, managers, and executives who are afraid of XML. They are afraid of the perception that XML is "cutting-edge," and of XML's high rate of change. (This is a second edition, a year later, right? Has that much changed?) They are afraid of the cost of hiring folks like you and me to work in XML. Most of all, they are afraid of adding yet another piece to their application puzzles. To try and assuage these fears, let me quickly run down the major reasons that you should start working with XML, today. First, XML is portable. Second, it allows an unprecedented degree of interoperability. And finally, XML matters. . . because it doesn't matter! If that's completely confusing, read on and all will soon make sense. 1.1.1 Portability XML is portable. If you've been around Java long, or have ever wandered through Moscone Center at JavaOne, you've heard the mantra of Java: "portable code." Compile Java code, drop those .class or .jar files onto any operating system, and the code runs. All you need is a Java Runtime Environment (JRE) or Java Virtual Machine (JVM), and you're set. This has continually been one of Java's biggest draws, because developers can work on Linux or Windows workstations, develop and test code, and then deploy on Sparcs, E4000s, HP-UX, or anything else you could imagine. As a result, XML is worth more than a passing look. Because XML is simply text, it can obviously be moved between various platforms. Even more importantly, XML must conform to a specification defined by the World Wide Web Consortium (W3C) at http://www.w3.org/. This means that XML is a standard. When you send XML, it conforms to this standard; when some other application receives it, the XML still conforms to that standard. The receiving application can count on that. This is essentially what Java provides: any JVM knows what to expect, and as long as code conforms to those expectations, it will run. By using XML, you get portable data. In fact, recently you may have heard the phrase "portable code, portable data" in reference to the combination of Java and XML. It's a good saying, because it turns out (as not all marketing-type slogans do) to be true.
Java & XML, 2nd Edition 9 1.1.2 Interoperability Second, XML allows interoperability above and beyond what we've ever seen in enterprise applications. Some of you probably think this is just another form of portability, but it's more than that. Remember that XML stands for the Extensible Markup Language. And it is extensibility that is so important in business interoperating. Consider HTML, the hypertext markup language, for example. HTML is a standard. It's all text. So, in those respects, it's just as portable as XML. In fact, clients using different browsers on different operating systems can all view HTML more or less identically. However, HTML is aimed specifically at presentation. You couldn't use HTML to represent a furniture manifest, or a billing invoice. That's because the standard tightly defines the allowed tags, the format, and everything else in HTML. This allows it to remain focused on presentation, which is both an advantage and a disadvantage. However, XML says very little about the elements and content of a document. Instead, it focuses on the structure of the document; elements must begin and end, each attribute must have a single value, and so on. The content of the document and the elements and attributes used remain up to you. You can develop your own document formatting, content, and custom specifications for representing your data. And this allows interoperability. The various furniture chains can agree upon a certain set of constraints for XML, and then exchange data in those formats; they get all the advantages of XML (like portability), as well as the ability to apply their business knowledge to the data being exchanged to make it meaningful. A billing system can include a customized format appropriate for invoices, broadcast this format, and export and import invoices from other billing systems. XML's extensibility makes it perfect for cross-application operation. Even more intriguing is the large number of vertical standards1 being developed. Browse the ebXML project at http://www.ebxml.org/ and see what's going on. Here, businesses are working together to develop standards built upon XML that allow global electronic commerce. The telecommunications industry has undertaken similar efforts. Soon, vertical markets across the world will have agreed upon standards for exchanging data, all built on XML. 1.1.3 It Doesn't Matter When all is said and done, XML matters because it doesn't matter. I said this earlier, and I want to say it again, because it's at the root of why XML is so important. Proprietary solutions for data, formats that are binary and must be decoded in certain ways, and other data solutions all matter in the final analysis. They involve communication with other companies, extensive documentation, coding efforts, and reinvention of tools for transmission. XML is so attractive because you don't need any special expertise and can spend your time doing other things. In Chapter 2, I describe in 25 or so pages most of what you'll ever need to author XML. It doesn't require documentation, because that documentation is already written. You don't need special encoders or decoders; there are APIs and parsers already written that handle all of this for you. And you don't have to incur risk; XML is now a proven technology, with millions of developers working, fixing, and extending it every day. 1 A vertical standard, or vertical market, refers to a standard or market targeting a specific business. Instead of moving horizontally (where common functionality is preferred), the focus is on moving vertically, providing functionality for a specific audience, like shoe manufacturers or guitar makers.
Java & XML, 2nd Edition 10 XML is important because it becomes such a reliable, unimportant part of your application. Write your constraints, encode your data in XML, and forget about it. Then go on to the important things; the complex business logic and presentation that involves weeks and months of thought and hard work. Meanwhile, XML will happily chug along representing your data with nary a whimper or whine (OK, I'm getting a bit dramatic, but you get the idea). So if you've been afraid of XML, or even skeptical, jump on board now. It might be the most important decision, with the fewest side effects, that you'll ever make. The rest of this book will get you up and running with APIs, transport protocols, and more odds and ends than you can shake a stick at. 1.2 What's Important? Once you've accepted that XML can help you out, the next question is what part of it you need. As I mentioned earlier, there are literally hundreds of applications of XML, and trying to find the right one is not an easy task. I've got to pick out twelve or thirteen key topics from these hundreds, and manage to make them all applicable to you; not an easy task! Fortunately, I've had a year to gather feedback from the first edition of this book, and have been working with XML in production applications for well over two years now. That means that I've at least got an idea of what's interesting and useful. When you boil all the various XML machinery down, you end up with just a few categories. 1.2.1 Low-Level APIs An API is an application programming interface, and a low-level API is one that lets you deal directly with an XML document's content. In other words, there is little to no preprocessing, and you get raw XML content to work with. It is the most efficient way to deal with XML, and also the most powerful. At the same time, it requires the most knowledge about XML, and generally involves the most work to turn document content into something useful. The two most common low-level APIs today are SAX, the Simple API for XML, and DOM, the Document Object Model. Additionally, JDOM (which is not an acronym, nor is it an extension of DOM) has gained a lot of momentum lately. All three of these are in some form of standardization (SAX as a de facto, DOM by the W3C, and JDOM by Sun), and are good bets to be long-lasting technologies. All three offer you access to an XML document, in differing forms, and let you do pretty much anything you want with the document. I'll spend quite a bit of time on these APIs, as they are the basis for everything else you'll do in XML. I've also devoted a chapter to JAXP, Sun's Java API for XML Processing, which provides a thin abstraction layer over SAX and DOM. 1.2.2 High-Level APIs High-level APIs are the next step up the ladder. Instead of offering direct access to a document, they rely on low-level APIs to do that work for them. Additionally, these APIs present the document in a different form, either more user-friendly, or modeled in a certain way, or in some form other than a basic XML document structure. While these APIs are often easier to use and quicker to develop with, you may pay an additional processing cost while your data is converted to a different format. Also, you'll need to spend some time learning the API, most likely in addition to some lower-level APIs.
Java & XML, 2nd Edition 11 In this book, the main example of a high-level API is XML data binding. Data binding allows for taking an XML document and providing that document as a Java object. Not a tree-based object, mind you, but a custom Java object. If you had elements named "person" and "firstName", you would get an object with methods like getPerson( ) and setFirstName( ). Obviously, this is a simple way to quickly get going with XML; hardly any in-depth knowledge is required! However, you can't easily change the structure of the document (like making that "person" element become an "employee" element), so data binding is suited for only certain applications. You can find out all about data binding in Chapter 14. 1.2.3 XML-Based Applications In addition to APIs built specifically for working with a document or its content, there are a number of applications built on XML. These applications use XML directly or indirectly, but are focused on a specific business process, like displaying stylized web content or communicating between applications. These are all examples of XML-based applications that use XML as a part of their core behavior. Some require extensive XML knowledge, some require none; but all belong in discussions about Java and XML. I've picked out the most popular and useful to discuss here. First, I'll cover web publishing frameworks, which are used to take XML and format them as HTML, WML (Wireless Markup Language), or as binary formats like Adobe's PDF (Portable Document Format). These frameworks are typically used to serve clients complex, highly customized web applications. Next, I'll look at XML-RPC, which provides an XML variant on remote procedure calls. This is the beginning of a complete suite of tools for application communication. Building on XML-RPC, I'll describe SOAP, the Simple Object Access Protocol, and how it expands upon what XML-RPC provides. Then you'll get to see the emerging players in the web services field by examining UDDI (Universal Discovery, Description, and Integration) and WSDL (Web Services Descriptor Language) in a business-to-business chapter. Putting all these tools in your toolbox will make you formidable not only in XML, but in any enterprise application environment. And finally, in the last chapter I'll gaze into my crystal ball and point out what appears to be gathering strength in the coming months and years, and try and give you a heads-up on what is worth monitoring. This should keep you ahead of the curve, which is where any good developer should be. 1.3 The Essentials Now you're ready to learn how to use Java and XML to their best. What do you need? I will address that subject, give you some basics, and then let you get after it. 1.3.1 An Operating System and Java I say this almost tongue in cheek; if you expect to get through this book with no OS (operating system) and no Java installation, you just might be in a bit over your head. Still, it's worth letting you know what I expect. I wrote the first half of this book and the examples for those chapters on a Windows 2000 machine, running both JDK 1.2 and JDK 1.3 (as well as 1.3.1). I did most of my compiling under Cygwin (from Cygnus), so I usually operate in a Unix-esque environment. The last half of the book was written on my (at the time) brand
Java & XML, 2nd Edition 12 new Macintosh G4 running OS X. That system comes with JDK 1.3, and is a beauty, for those of you who are curious. In any case, all the examples should work unchanged with Java 1.2 or above; I used no features of JDK 1.3. However, I did not write this code to compile under Java 1.1, as I felt using the Java 2 Collections classes was important. Additionally, if you're working with XML, you need to take a long hard look at updating your JDK if you're still on 1.1 (I know some of you have no choice). If you are stuck on a 1.1 JVM, you should be able to get the collections from Sun (http://java.sun.com/), make some small modifications, and be up and running. 1.3.2 A Parser You will need an XML parser. One of the most important layers to any XML-aware application is the XML parser. This component handles the important task of taking a raw XML document as input and making sense of the document; it will ensure that the document is well-formed, and if a DTD or schema is referenced, it may be able to ensure that the document is valid. What results from an XML document being parsed is typically a data structure that can be manipulated and handled by other XML tools or Java APIs. I'm going to leave the detailed discussions of these APIs for later chapters. For now, just be aware that the parser is one of the core building blocks to using XML data. Selecting an XML parser is not an easy task. There are no hard and fast rules, but two main criteria are typically used. The first is the speed of the parser. As XML documents are used more often and their complexity grows, the speed of an XML parser becomes extremely important to the overall performance of an application. The second factor is conformity to the XML specification. Because performance is often more of a priority than some of the obscure features in XML, some parsers may not conform to finer points of the XML specification in order to squeeze out additional speed. You must decide on the proper balance between these factors based on your application's needs. In addition, most XML parsers are validating, which means they offer the option to validate your XML with a DTD or XML Schema, but some are not. Make sure you use a validating parser if that capability is needed in your applications. Here's a list of the most commonly used XML parsers. The list does not show whether a parser validates or not, as there are current efforts to add validation to several of the parsers that do not yet offer it. No overall ranking is suggested here, but there is a wealth of information on the web pages for each parser: • Apache Xerces: http://xml.apache.org/ • IBM XML4J: http://alphaworks.ibm.com/tech/xml4j • James Clark's XP: http://www.jclark.com/xml/xp • Oracle XML Parser: http://technet.oracle.com/tech/xml • Sun Microsystems Crimson: http://xml.apache.org/crimson • Tim Bray's Lark and Larval: http://www.textuality.com/Lark • The Mind Electric's Electric XML: http://www.themindelectric.com/products/xml/xml.html • Microsoft's MXSML Parser: http://msdn.microsoft.com/xml/default.asp
Java & XML, 2nd Edition 13 I've included Microsoft's MSXML parser in this list in deference to their efforts to address numerous compliance issues in their latest versions. However, their parser still tends to be "doing its own thing" and is not guaranteed to work with the examples in this book because of that. Use it if you need to, but be willing to do a little extra work if you make this decision. Throughout this book, I tend to use Apache Xerces because it is open source. This is a huge plus to me, so I'd recommend you try out Xerces if you don't already have a parser selected. 1.3.3 APIs Once you've gotten the parser part of the equation taken care of, you'll need the various APIs I'll be talking about (low-level and high-level). Some of these will be included with your parser download, while others need to be downloaded manually. I'll expect you to either have these on hand, or be able to get them from an Internet web site, so ensure you've got web access before getting too far into any of the chapters. First, the low-level APIs: SAX, DOM, JDOM, and JAXP. SAX and DOM should be included with any parser you download, as those APIs are interface-based and will be implemented within the parser. You'll also get JAXP with most of these, although you may end up with an older version; hopefully by the time this book is out, most parsers will have full JAXP 1.1 (the latest production version) support. JDOM is currently bundled as a separate download, and you can get it from the web site at http://www.jdom.org/. As for the high-level APIs, I cover a couple of alternatives in the data binding chapter. I'll look briefly at Castor and Quick, available online at http://castor.exolab.org/ and http://sourceforge.net/projects/jxquick, respectively. I'll also take some time to look at Zeus, available at http://zeus.enhydra.org/. All of these packages contain any needed dependencies within the downloaded bundles. 1.3.4 Application Software Last in this list is the myriad of specific technologies I'll talk about in the chapters. These technologies include things like SOAP toolkits, WSDL validators, the Cocoon web publishing framework, and so on. Rather than try and cover each of these here, I'll address the more specific applications in appropriate chapters, including where to get the packages, what versions are needed, installation issues, and anything else you'll need to get up and running. I can spare you all the ugly details here, and only bore those of you who choose to be bored (just kidding! I'll try to stay entertaining). In any case, you can follow along and learn everything you need to know. In some cases, I do build on examples in previous chapters. For example, if you start reading Chapter 6 before going through Chapter 5, you'll probably get a bit lost. If this occurs, just back up a chapter and you'll see where the confusing code originated. As I already mentioned, you can skim Chapter 2 on XML basics, but I'd recommend you go through the rest of the book in order, as I try to logically build up concepts and knowledge.
Java & XML, 2nd Edition 14 1.4 What's Next? Now you're probably ready to get on with it. In the next chapter, I'm going to give you a crash course in XML. If you're new to XML, or are shaky on the basics, this chapter will fill in the gaps. If you're an old hand to XML, I'd recommend you skim the chapter, and move on to the code in Chapter 3. In either case, get ready to dive into Java and XML; things get exciting from here on in.
Java & XML, 2nd Edition 15 Chapter 2. Nuts and Bolts With the introductions behind us, let's get to it. Before heading straight into Java, though, some basic structures must be laid down. These address a fundamental understanding of the concepts in XML and how the extensible markup language works. In other words, you need an XML primer. If you are already an XML expert, skim through this chapter to make sure you're comfortable with the topics addressed. If you're completely new to XML, on the other hand, this chapter can get you ready for the rest of the book without hours, days, or weeks of study. Where Did All the Chapters Go? Readers of the first edition of Java & XML may be a little confused. In that edition, there were (count 'em!) three full chapters just on XML itself. When I worked on the first edition over a year ago, I was faced with writing a book that was part XML, part Java, and couldn't completely address either. There was no other reliable resource to direct you to for additional help. Today, books like Learning XML by Erik Ray (O'Reilly) and XML in a Nutshell by Elliotte Rusty Harold and W. Scott Means (O'Reilly) have rectified that problem. It's now enough to give you a whirlwind tour of XML in this chapter, and let you refer to one of those excellent books for more detail on "pure" XML. As a result, I was able to condense several chapters into this one, paving the way for new chapters on Java, which I'm sure is what you want! Be prepared for some radical departures from the first edition; now at least you know why. You can use this chapter as a glossary while you read the rest of the book. I won't spend time in future chapters explaining XML concepts, in order to deal strictly with Java and get to some more advanced concepts. So if you hit something that completely befuddles you, check this chapter for information. And if you are still a little lost, I highly recommended that this book be read with a copy of Elliotte Harold and Scott Means' excellent book XML in a Nutshell (O'Reilly) open. That will give you all the information you need on XML concepts, and then I can focus on Java ones. Finally, I'm big on examples. I'm going to load the rest of the chapters as full of them as possible. I'd rather give you too much information than barely engage you. To get started along those lines, I'll introduce several XML and related documents in this chapter to illustrate the concepts in this primer. You might want to take the time to either type these into your editor or download them from the book's web site (http://www.newinstance.com/), as they will be used in this chapter and throughout the rest of the book. It will save you time later on. 2.1 The Basics It all begins with the XML 1.0 Recommendation, which you can read in its entirety at http://www.w3.org/TR/REC-xml. Example 2-1 shows a simple XML document that conforms to this specification. It's a portion of the XML table of contents for this book (I've only included part of it because it's long!). The complete file is included with the samples for the book, available online at http://www.oreilly.com/catalog/javaxml2 and http://www.newinstance.com/. I'll use it to illustrate several important concepts.