This is a first of the series of articles on python modules. We will have
articles that I will explain the basics of the python module and how to use
that.
Parsing XML is one of the important features that every programming language
has to provide. We need to parse XML many times when we get response from a
Rest call or parsing locally stored XML files.
Xml.dom.minidom is a minimal implementation of the Document Object Model
interface. This is much simpler and also smaller. As said in the doc ,users who
are not good at Full DOM can use other XML processing module called “xml.tree.ElementTree”.
In this article we will see how we can process a XML file using
xml.dom.minidom module.
Consider Sample XML content as
<?xml version="1.0"?>
<company>
<name>Animal Care
Enterprise</name>
<staff
id="1">
<nickname>Rats</nickname>
<salary>100,000</salary>
</staff>
<staff
id="2">
<nickname>Dogs</nickname>
<salary>200,000</salary>
</staff>
<staff
id="3">
<nickname>Cats</nickname>
<salary>20,000</salary>
</staff>
</company>
Case 1 – Printing values
In order to process this we write the code as,
from xml.dom import minidom
from xml.dom.minidom import parse, parseString
from xml.dom.minidom import Document
dot = minidom.parse('dot.xml')
staffs = dot.getElementsByTagName('staff')
for staff in staffs:
sid =
staff.getAttribute("id")
nickname =
staff.getElementsByTagName("nickname")[0]
salary =
staff.getElementsByTagName("salary")[0]
print("id:%s,
nickname:%s, salary:%s" %(sid, nickname.firstChild.data,
salary.firstChild.data))
We need to import the necessary Minidom module for processing the XML
files. In order to read a file we use,
dot = minidom.parse('dot.xml')
Then we get the staff elements using the getElementsByTagName() method
passing the element name.
staffs = dot.getElementsByTagName('staff')
This gives us the array with all the staff element details and we just
need to parse them. After executing the code we can see
id:1, nickname:Rats, salary:100,000
id:2, nickname:Dogs, salary:200,000
id:3, nickname:Cats, salary:20,000
If we need to parse xml that is obtained in the Rest response we can use
dom = parseString(assetXML)
This will parse the String as an XML. The parse() and parseString() functions do is
connect an XML parser with a “DOM builder” that can accept parse events from
any SAX parser and convert them into a DOM tree.
Case 2 – Adding an
Element
Now once we were able to parse the XML doc and get various details of
dot = minidom.parse('dot.xml')
element=dot.createElement("Staff")
dot.childNodes[0].appendChild(element)
print dot.toxml()
We can also see that the <Staff> element was added to the end of
the node as below
<?xml version="1.0" ?>
<company>
<name>Animal Care
Enterprise</name>
<staff id="1">
<nickname>Rats</nickname>
<salary>100,000</salary>
</staff>
<staff id="2">
<nickname>Dogs</nickname>
<salary>200,000</salary>
</staff>
<staff id="3">
<nickname>Cats</nickname>
<salary>20,000</salary>
</staff>
<Staff/></company>
Case 3 – Adding an Text
Node
As we know the text of an element node is stored in a text node. In order
to create a Text node we can use
dot = minidom.parse('dot.xml')
element=dot.createElement("Staff")
txt = dot.createTextNode("hello, world!")
element.appendChild(txt)
dot.childNodes[0].appendChild(element)
print dot.toxml()
and we can see the output as,
<?xml version="1.0" ?>
<company>
<name>Animal Care
Enterprise</name>
<staff id="1">
<nickname>Rats</nickname>
<salary>100,000</salary>
</staff>
<staff id="2">
<nickname>Dogs</nickname>
<salary>200,000</salary>
</staff>
<staff id="3">
<nickname>Cats</nickname>
<salary>20,000</salary>
</staff>
<Staff>hello, world!</Staff></company>
Case 4 - Node
Import
Nodes can be imported using Minidom. We can use this import feature to
copy nodes between multiple xml files. This can be done as
dom1 =
parse("foo.xml")
dom2 =
parse("bar.xml")
element =
dom1.importNode(dom2.childNodes[1], True)
# take 2nd node in "bar.xml" and do deep copy
dom1.childNodes[1].appendChild(x) # append to children of 2nd node in
"foo.xml"
print dom1.toxml()
Using the above
examples we can start working on minidom – a XML processing module available in
Python.
No comments :
Post a Comment