What is a XML parser

Q

What are the standard ways of parsing XML document?

✍: Guest

A

XML parser sits in between the XML document and the application who want to use the XML document. Parser exposes set of well defined interfaces which can be used by the application for adding, modifying and deleting the XML document contents. Now whatever interfaces XML parser exposes should be standard or else that would lead to different vendors preparing there own custom way of interacting with XML document. There are two standard specifications which are very common and should be followed by a XML parser:
DOM: Document Object Model.
DOM is a W3C recommended way for treating XML documents. In DOM we load entire XML document into memory and allows us to manipulate the structure and data of XML document.
SAX: Simple API for XML.
SAX is event driven way for processing XML documents. In DOM we load the whole XML document in to memory and then application manipulates the XML document. But this is not always the best way to process large XML documents which have huge data elements. For instance you only want one element from the whole XML document or you only want to see if the XML is proper which means loading the whole XML in memory will be quiet resource intensive. SAX parsers parse the XML document sequentially and emit events like start and end of the document, elements, text content etc. So applications who are interested in processing these events can register implementations of callback interfaces. SAX parser then only sends those event messages which the application has demanded.

Above is a pictorial representation of how DOM parser works. Application queries the DOM Parser for “quantity” field. DOM parser loads the complete XML file in to memory.

DOM parser then picks up the “quantity” tag from the memory loaded XML file and returns back to the application.

SAX parser does not load the whole DOM in to memory but has event based approach. SAX parser while parsing the XML file emits events. For example in the above figure its has emitted Invoice tag start event, Amount Tag event, Quantity tag event and Invoice end tag event. But our application software is only interested in quantity value. So the application has to register to the SAX parser saying that he is only interested in quantity field and not any other field or element of the XML document. Depending on what interest the application software has SAX parser only sends those events to the application the rest of events is suppressed. For instance in the above figure only quantity tag event is sent to the application software and the rest of the events are suppressed.

2007-10-31, 8483👍, 0💬