- Python Data Persistence - Discussion
- Python Data Persistence - Useful Resources
- Python Data Persistence - Quick Guide
- Data Persistence - Openpyxl Module
- Data Persistence - ZODB
- Python Data Persistence - Cassandra Driver
- Python Data Persistence - PyMongo module
- Python Data Persistence - SQLAlchemy
- Python Data Persistence - Sqlite3 Module
- Python Data Persistence - Plistlib Module
- Python Data Persistence - XML Parsers
- Python Data Persistence - JSON Module
- Python Data Persistence - CSV Module
- Python Data Persistence - dbm Package
- Python Data Persistence - Shelve Module
- Python Data Persistence - Marshal Module
- Python Data Persistence - Pickle Module
- Python Data Persistence - Object Serialization
- File Handling with os Module
- Python Data Persistence - File API
- Python Data Persistence - Introduction
- Python Data Persistence - Home
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Python Data Persistence - XML Parsers
XML is acronym for eXtensible Markup Language. It is a portable, open source and cross platform language very much pke HTML or SGML and recommended by the World Wide Web Consortium.
It is a well-known data interchange format, used by a large number of apppcations such as web services, office tools, and Service Oriented Architectures (SOA). XML format is both machine readable and human readable.
Standard Python pbrary s xml package consists of following modules for XML processing −
Sr.No. | Modules & Description |
---|---|
1 |
xml.etree.ElementTree the ElementTree API, a simple and pghtweight XML processor |
2 |
xml.dom the DOM API definition |
3 |
xml.dom.minidom a minimal DOM implementation |
4 |
xml.sax SAX2 interface implementation |
5 |
xml.parsers.expat the Expat parser binding |
Data in the XML document is arranged in a tree-pke hierarchical format, starting with root and elements. Each element is a single node in the tree and has an attribute enclosed in <> and </> tags. One or more sub-elements may be assigned to each element.
Following is a typical example of a XML document −
<?xml version = "1.0" encoding = "iso-8859-1"?> <studentpst> <student> <name>Ratna</name> <subject>Physics</subject> <marks>85</marks> </student> <student> <name>Kiran</name> <subject>Maths</subject> <marks>100</marks> </student> <student> <name>Mohit</name> <subject>Biology</subject> <marks>92</marks> </student> </studentpst>
While using ElementTree module, first step is to set up root element of the tree. Each Element has a tag and attrib which is a dict object. For the root element, an attrib is an empty dictionary.
import xml.etree.ElementTree as xmlobj root=xmlobj.Element( studentList )
Now, we can add one or more elements under root element. Each element object may have SubElements. Each subelement has an attribute and text property.
student=xmlobj.Element( student ) nm=xmlobj.SubElement(student, name ) nm.text= name subject=xmlobj.SubElement(student, subject ) nm.text= Ratna subject.text= Physics marks=xmlobj.SubElement(student, marks ) marks.text= 85
This new element is appended to the root using append() method.
root.append(student)
Append as many elements as desired using above method. Finally, the root element object is written to a file.
tree = xmlobj.ElementTree(root) file = open( studentpst.xml , wb ) tree.write(file) file.close()
Now, we see how to parse the XML file. For that, construct document tree giving its name as file parameter in ElementTree constructor.
tree = xmlobj.ElementTree(file= studentpst.xml )
The tree object has getroot() method to obtain root element and getchildren() returns a pst of elements below it.
root = tree.getroot() children = root.getchildren()
A dictionary object corresponding to each sub element is constructed by iterating over sub-element collection of each child node.
for child in children: student={} pairs = child.getchildren() for pair in pairs: product[pair.tag]=pair.text
Each dictionary is then appended to a pst returning original pst of dictionary objects.
SAX is a standard interface for event-driven XML parsing. Parsing XML with SAX requires ContentHandler by subclassing xml.sax.ContentHandler. You register callbacks for events of interest and then, let the parser proceed through the document.
SAX is useful when your documents are large or you have memory pmitations as it parses the file as it reads it from disk as a result entire file is never stored in the memory.
Document Object Model
(DOM) API is a World Wide Web Consortium recommendation. In this case, entire file is read into the memory and stored in a hierarchical (tree-based) form to represent all the features of an XML document.
SAX, not as fast as DOM, with large files. On the other hand, DOM can kill resources, if used on many small files. SAX is read-only, while DOM allows changes to the XML file.
Advertisements