Skip to content Skip to sidebar Skip to footer

Using Minidom To Parse Xml

Hi I have trouble understanding the minidom module for Python. I have xml that looks like this: Dexter7

Solution 1:

Each episode element has child-elements, including a title element. Your code, however, is looking for attributes instead.

To get text out of a minidom element, you need a helper function:

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return''.join(rc)

And then you can more easily print all the titles:

forepisodein xml.getElementsByTagName('episode'):
    fortitlein episode.getElementsByTagName('title'):
        print getText(title)

Solution 2:

title is not an attribute, its a tag. An attribute is like src in <img src="foo.jpg" />

>>> parsed = parseString(s)
>>> titles = [n.firstChild.data for n in parsed.getElementsByTagName('title')]
>>> titles
[u'Dexter', u'Crocodile', u'Popping Cherry']

You can extend the above to fetch other details. lxml is better suited for this though. As you can see from the snippet above minidom is not that friendly.

Solution 3:

Thanks to Martijn Pieters who tipped me with the ElementTree API I solved this problem.

xml = ET.parse(urlopen("http://services.tvrage.com/feeds/episode_list.php?sid=7296"))
                print'xml fetched..'for episode in xml.iter('episode'):
                    print episode.find('title').text

Thanks

Post a Comment for "Using Minidom To Parse Xml"