Loop Through Xml In Python

Question

My data set is as following:

Solution 1:

Looping can be done in a list comprehension then building dict from navigating the DOM. Following code goes straight to a data frame.

xml = """<deptsxmlns="http://SOMELINK"xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"date="2021-01-15"><deptdept_id="00001"col_two="00001value"col_three="00001false"><owners><currentownercol_four="00001value"col_five="00001value"col_six="00001false"><addrcol_seven="00001value"col_eight="00001value"col_nine="00001false"/></currentowner></owners></dept><deptdept_id="00002"col_two="00002value"col_three="00002value"><owners><currentownercol_four="00002value"col_five="00002value"col_six="00002false"><addrcol_seven="00002value"col_eight="00002value"col_nine="00002false"/></currentowner></owners></dept></depts>"""

import xml.etree.ElementTree as ET
import pandas as pd

root = ET.fromstring(xml)

root.attrib
ns = {'ns0': 'http://SOMELINK'}
pd.DataFrame([{**d.attrib, 
  **d.find("ns0:owners/ns0:currentowner", ns).attrib, 
  **d.find("ns0:owners/ns0:currentowner/ns0:addr", ns).attrib}
 for d in root.findall("ns0:dept", ns)
])

safer version

if any dept had no currentowner or currentowner/addr using .attrib would fail. Walk the DOM considering these elements to be optional. dict keys construction changed to name based on tag of element as well as attribute name. Structure the way the comprehensions are structured based on your data design. Need to consider 1 to 1, 1 to optional, 1 to many. Really goes back to papers that Codd wrote in 1970

import xml.etree.ElementTree as ET
import pandas as pd

root = ET.fromstring(xml)
ns = {'ns0': 'http://SOMELINK'}
pd.DataFrame([{**{f"{d.tag.split('}')[1]}.{k}":v for k,v in d.items()}, 
  **{f"{co.tag.split('}')[1]}.{k}":v  for k,v in co.items()}, 
  **{f"{addr.tag.split('}')[1]}.{k}":v for addr in co.findall("ns0:addr", ns) for k,v in addr.items()} }
 for d in root.findall("ns0:dept", ns)
 for co in d.findall("ns0:owners/ns0:currentowner", ns)
])

Solution 2:

You can perform a depth-first search:

root = ElementTree.parse('data.xml').getroot()
ns = {'ns0': 'http://SOMELINK'}

date_from = root.get('date')
print(f'{date_from=}')

for dept in root.findall(f'./ns0:dept', ns):
    for key, value in dept.items():
        print(f'{key}: {value}')
    
    for node in dept.findall('.//*'):
        for key, value in node.items():
            print(f'{key}: {value}')
            
    print()

Python Guru

Loop Through Xml In Python

Solution 1:

safer version

Solution 2:

Post a Comment for "Loop Through Xml In Python"