Python Get XML Siblings Into Dictionary
I have an xml that looks like this: 1 some text some text
Solution 1:
code.py:
#!/usr/bin/env python3
import sys
import xml.etree.ElementTree as ET
from pprint import pprint as pp
FILE_NAME = "data.xml"
def convert_node(node, depth_level=0):
#print(" " * depth_level + node.tag)
child_nodes = list(node)
if not child_nodes:
return (node.text or "").strip()
ret_dict = dict()
child_node_tags = [item.tag for item in child_nodes]
child_index = 0
for child_node in child_nodes:
tag = child_node.tag
if child_node_tags.count(tag) > 1:
sub_obj_dict = ret_dict.get(tag, dict())
child_index += 1
sub_obj_dict[str(child_index)] = convert_node(child_node, depth_level=depth_level + 1)
ret_dict[tag] = sub_obj_dict
else:
ret_dict[tag] = convert_node(child_node, depth_level=depth_level + 1)
return ret_dict
def main():
tree = ET.parse(FILE_NAME)
root_node = tree.getroot()
converted_xml = convert_node(root_node)
print("\nResulting dict(s):\n")
for key in converted_xml: # converted_xml should be a dictionary having only one key (in our case "G" - we only care about its value, to match the required output)
pp(converted_xml[key])
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
main()
Notes:
- FILE_NAME contains the file name that contains the input xml. Feel free to change it, in order to match yours
- The conversion happens in convert_node. It's a recursive function that it's called upon each xml node and returns a Python dictionary (or a string). The algorithm:
- For each node, get a list of its (direct) children. If the node hasn't any (it's a leaf node - like G# or GP# nodes), it will return its text
- If the node has more than one child with a specific tag, then its content will be added under a key representing its index (like G or GP nodes), in a sub dictionary of the current dictionary corresponding to the the child tag key
- All the children with unique tags will have their content placed under a key equal to their tag directly under the current dictionary
- depth_level is not used (you can remove it), I used it to print the xml node tags in a tree form; it's the depth in the xml tree (root - 0, G - 1, G#, GP - 2, GP# - 3, ...)
- The code is designed to be:
- General: notice there are no hardcoded key names
- Scalable: if at some point the xml will become ore complex (e.g. under a GP node there will be a GPD node let's say, and that node will have subnodes as well - basically the xml will gain one more depth level), the code will handle it without change
- Python 3 and Python 2 compatible
Output:
(py_064_03.05.04_test0) e:\Work\Dev\StackOverflow\q045799991>"e:\Work\Dev\VEnvs\py_064_03.05.04_test0\Scripts\python.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
Resulting dict(s):
{'1': {'G1': '1',
'G2': 'some text',
'G3': 'some text',
'GP': {'1': {'GP1': '1', 'GP2': 'a', 'GP3': 'a'},
'2': {'GP1': '2', 'GP2': 'b', 'GP3': 'b'},
'3': {'GP1': '3', 'GP2': 'c', 'GP3': 'c'}}},
'2': {'G1': '2',
'G2': 'some text',
'G3': 'some text',
'GP': {'1': {'GP1': '1', 'GP2': 'aa', 'GP3': 'aa'},
'2': {'GP1': '2', 'GP2': 'bb', 'GP3': 'bb'},
'3': {'GP1': '3', 'GP2': 'cc', 'GP3': 'cc'}}},
'3': {'G1': '3',
'G2': 'some text',
'G3': 'some text',
'GP': {'1': {'GP1': '1', 'GP2': 'aaa', 'GP3': 'aaa'},
'2': {'GP1': '2', 'GP2': 'bbb', 'GP3': 'bbb'},
'3': {'GP1': '3', 'GP2': 'ccc', 'GP3': 'ccc'}}}}
Post a Comment for "Python Get XML Siblings Into Dictionary"