Skip to content Skip to sidebar Skip to footer

Loop Through Xml In Python

My data set is as following:

Solution 1:

Looping can be done in a list comprehension then building dict from navigating the DOM. Following code goes straight to a data frame.

xml = """<deptsxmlns="http://SOMELINK"xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"date="2021-01-15"><deptdept_id="00001"col_two="00001value"col_three="00001false"><owners><currentownercol_four="00001value"col_five="00001value"col_six="00001false"><addrcol_seven="00001value"col_eight="00001value"col_nine="00001false"/></currentowner></owners></dept><deptdept_id="00002"col_two="00002value"col_three="00002value"><owners><currentownercol_four="00002value"col_five="00002value"col_six="00002false"><addrcol_seven="00002value"col_eight="00002value"col_nine="00002false"/></currentowner></owners></dept></depts>"""

import xml.etree.ElementTree as ET
import pandas as pd

root = ET.fromstring(xml)

root.attrib
ns = {'ns0': 'http://SOMELINK'}
pd.DataFrame([{**d.attrib, 
  **d.find("ns0:owners/ns0:currentowner", ns).attrib, 
  **d.find("ns0:owners/ns0:currentowner/ns0:addr", ns).attrib}
 for d in root.findall("ns0:dept", ns)
])

safer version

if any dept had no currentowner or currentowner/addr using .attrib would fail. Walk the DOM considering these elements to be optional. dict keys construction changed to name based on tag of element as well as attribute name. Structure the way the comprehensions are structured based on your data design. Need to consider 1 to 1, 1 to optional, 1 to many. Really goes back to papers that Codd wrote in 1970

import xml.etree.ElementTree as ET
import pandas as pd

root = ET.fromstring(xml)
ns = {'ns0': 'http://SOMELINK'}
pd.DataFrame([{**{f"{d.tag.split('}')[1]}.{k}":v for k,v in d.items()}, 
  **{f"{co.tag.split('}')[1]}.{k}":v  for k,v in co.items()}, 
  **{f"{addr.tag.split('}')[1]}.{k}":v for addr in co.findall("ns0:addr", ns) for k,v in addr.items()} }
 for d in root.findall("ns0:dept", ns)
 for co in d.findall("ns0:owners/ns0:currentowner", ns)
])

Solution 2:

You can perform a depth-first search:

root = ElementTree.parse('data.xml').getroot()
ns = {'ns0': 'http://SOMELINK'}

date_from = root.get('date')
print(f'{date_from=}')

for dept in root.findall(f'./ns0:dept', ns):
    for key, value in dept.items():
        print(f'{key}: {value}')
    
    for node in dept.findall('.//*'):
        for key, value in node.items():
            print(f'{key}: {value}')
            
    print()

Post a Comment for "Loop Through Xml In Python"