Pages

Thursday, March 26, 2015

Python Module – Minidom - Parsing XML

This is a first of the series of articles on python modules. We will have articles that I will explain the basics of the python module and how to use that.

Parsing XML is one of the important features that every programming language has to provide. We need to parse XML many times when we get response from a Rest call or parsing locally stored XML files.

Xml.dom.minidom is a minimal implementation of the Document Object Model interface. This is much simpler and also smaller. As said in the doc ,users who are not good at Full DOM can use other XML processing module called “xml.tree.ElementTree”.

In this article we will see how we can process a XML file using xml.dom.minidom module.
Consider Sample XML content as

<?xml version="1.0"?>
<company>
          <name>Animal Care Enterprise</name>
          <staff id="1">
                   <nickname>Rats</nickname>
                   <salary>100,000</salary>
          </staff>
          <staff id="2">
                   <nickname>Dogs</nickname>
                   <salary>200,000</salary>
          </staff>
          <staff id="3">
                   <nickname>Cats</nickname>
                   <salary>20,000</salary>
          </staff>
</company>

Case 1 – Printing values
In order to process this we write the code as,

from xml.dom import minidom
from xml.dom.minidom import parse, parseString
from xml.dom.minidom import Document

dot = minidom.parse('dot.xml')
staffs = dot.getElementsByTagName('staff')

for staff in staffs:
       sid = staff.getAttribute("id")
       nickname = staff.getElementsByTagName("nickname")[0]
       salary = staff.getElementsByTagName("salary")[0]
       print("id:%s, nickname:%s, salary:%s" %(sid, nickname.firstChild.data, salary.firstChild.data))

We need to import the necessary Minidom module for processing the XML files. In order to read a file we use,

dot = minidom.parse('dot.xml')
Then we get the staff elements using the getElementsByTagName() method passing the element name.

staffs = dot.getElementsByTagName('staff')

This gives us the array with all the staff element details and we just need to parse them. After executing the code we can see

id:1, nickname:Rats, salary:100,000
id:2, nickname:Dogs, salary:200,000
id:3, nickname:Cats, salary:20,000

If we need to parse xml that is obtained in the Rest response we can use
dom = parseString(assetXML)

This will parse the String as an XML. The parse() and parseString() functions do is connect an XML parser with a “DOM builder” that can accept parse events from any SAX parser and convert them into a DOM tree. 

Case 2 – Adding an Element
Now once we were able to parse the XML doc and get various details of

dot = minidom.parse('dot.xml')
element=dot.createElement("Staff")
dot.childNodes[0].appendChild(element)
print dot.toxml()

We can also see that the <Staff> element was added to the end of the node as below

<?xml version="1.0" ?>
<company>
    <name>Animal Care Enterprise</name>
    <staff id="1">
        <nickname>Rats</nickname>
        <salary>100,000</salary>
    </staff>
    <staff id="2">
        <nickname>Dogs</nickname>
        <salary>200,000</salary>
    </staff>
    <staff id="3">
        <nickname>Cats</nickname>
        <salary>20,000</salary>
    </staff>
<Staff/></company>

Case 3 – Adding an Text Node
As we know the text of an element node is stored in a text node. In order to create a Text node we can use

dot = minidom.parse('dot.xml')
element=dot.createElement("Staff")
txt = dot.createTextNode("hello, world!")
element.appendChild(txt)
dot.childNodes[0].appendChild(element)
print dot.toxml()

and we can see the output as,

<?xml version="1.0" ?>
<company>
    <name>Animal Care Enterprise</name>
    <staff id="1">
        <nickname>Rats</nickname>
        <salary>100,000</salary>
    </staff>
    <staff id="2">
        <nickname>Dogs</nickname>
        <salary>200,000</salary>
    </staff>
    <staff id="3">
        <nickname>Cats</nickname>
        <salary>20,000</salary>
    </staff>
<Staff>hello, world!</Staff></company>

Case 4 - Node Import
Nodes can be imported using Minidom. We can use this import feature to copy nodes between multiple xml files. This can be done as

dom1 = parse("foo.xml")
dom2 = parse("bar.xml")
element = dom1.importNode(dom2.childNodes[1], True) 
#  take 2nd node in "bar.xml" and do  deep copy
dom1.childNodes[1].appendChild(x)  # append to children of 2nd node in "foo.xml"
print dom1.toxml()

Using the above examples we can start working on minidom – a XML processing module available in Python.

No comments :

Post a Comment