Skip to content Skip to sidebar Skip to footer

Parsing Google Earth KML File In Python (lxml, Namespaces)

I am trying to parse a .kml file into Python using the xml module (after failing to make this work in BeautifulSoup, which I use for HTML). As this is my first time doing this, I

Solution 1:

Here is my solution. So, the most important thing to do is read this as posted by Tomalak. It's a really good description of namespaces and easy to understand.

We are going to use XPath to navigate the XML document. Its notation is similar to file systems, where parents and descendants are separated by slashes /. The syntax is explained here, but note that some commands are different for the lxml implementation.

###Problem

Our goal is to extract the city name: the content of <name> which is under <Placemark>. Here's the relevant XML:

<Placemark> <name>CITY NAME</name> 

The XPath equivalent to the non-functional code I posted above is:

tree=etree.parse('kml document')
result=tree.xpath('//Placemark/name/text()')

Where the text() part is needed to get the text contained in the location //Placemark/name.

Now this doesn't work, as Tomalak pointed out, cause the name of these two nodes are actually {http://www.opengis.net/kml/2.2}Placemark and {http://www.opengis.net/kml/2.2}name. The part in curly brackets is the default namespace. It does not show up in the actual document (which confused me) but it is defined at the beginning of the XML document like this:

xmlns="http://www.opengis.net/kml/2.2"

###Solution

We can supply namespaces to xpath by setting the namespaces argument:

xpath(X, namespaces={prefix: namespace})

This is easy enough for the namespaces that have actual prefixes, in this document for instance <gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode> where the gx prefix is defined in the document as xmlns:gx="http://www.google.com/kml/ext/2.2".

However, Xpath does not understand what a default namespace is (cf docs). Therefore, we need to trick it, like Tomalak suggested above: We invent a prefix for the default and add it to our search terms. We can just call it kml for instance. This piece of code actually does the trick:

tree.xpath('//kml:Placemark/kml:name/text()', namespaces={"kml":"http://www.opengis.net/kml/2.2"})

The tutorial mentions that there is also an ETXPath method, that works just like Xpath except that one writes the namespaces out in curly brackets instead of defining them in a dictionary. Thus, the input would be of the style {http://www.opengis.net/kml/2.2}Placemark.


Post a Comment for "Parsing Google Earth KML File In Python (lxml, Namespaces)"