Loop Through Xml In Python
My data set is as following:
Solution 1:
Looping can be done in a list comprehension then building dict from navigating the DOM. Following code goes straight to a data frame.
xml = """<deptsxmlns="http://SOMELINK"xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"date="2021-01-15"><deptdept_id="00001"col_two="00001value"col_three="00001false"><owners><currentownercol_four="00001value"col_five="00001value"col_six="00001false"><addrcol_seven="00001value"col_eight="00001value"col_nine="00001false"/></currentowner></owners></dept><deptdept_id="00002"col_two="00002value"col_three="00002value"><owners><currentownercol_four="00002value"col_five="00002value"col_six="00002false"><addrcol_seven="00002value"col_eight="00002value"col_nine="00002false"/></currentowner></owners></dept></depts>"""
import xml.etree.ElementTree as ET
import pandas as pd
root = ET.fromstring(xml)
root.attrib
ns = {'ns0': 'http://SOMELINK'}
pd.DataFrame([{**d.attrib,
**d.find("ns0:owners/ns0:currentowner", ns).attrib,
**d.find("ns0:owners/ns0:currentowner/ns0:addr", ns).attrib}
for d in root.findall("ns0:dept", ns)
])
safer version
if any dept had no currentowner or currentowner/addr using .attrib
would fail. Walk the DOM considering these elements to be optional. dict
keys construction changed to name based on tag of element as well as attribute name. Structure the way the comprehensions are structured based on your data design. Need to consider 1 to 1, 1 to optional, 1 to many. Really goes back to papers that Codd wrote in 1970
import xml.etree.ElementTree as ET
import pandas as pd
root = ET.fromstring(xml)
ns = {'ns0': 'http://SOMELINK'}
pd.DataFrame([{**{f"{d.tag.split('}')[1]}.{k}":v for k,v in d.items()},
**{f"{co.tag.split('}')[1]}.{k}":v for k,v in co.items()},
**{f"{addr.tag.split('}')[1]}.{k}":v for addr in co.findall("ns0:addr", ns) for k,v in addr.items()} }
for d in root.findall("ns0:dept", ns)
for co in d.findall("ns0:owners/ns0:currentowner", ns)
])
Solution 2:
You can perform a depth-first search:
root = ElementTree.parse('data.xml').getroot()
ns = {'ns0': 'http://SOMELINK'}
date_from = root.get('date')
print(f'{date_from=}')
for dept in root.findall(f'./ns0:dept', ns):
for key, value in dept.items():
print(f'{key}: {value}')
for node in dept.findall('.//*'):
for key, value in node.items():
print(f'{key}: {value}')
print()
Post a Comment for "Loop Through Xml In Python"