Formatting Pyyaml Dump() Output

March 31, 2024 Post a Comment

I have a list of dictionaries, which I want to serialize: list_of_dicts = [ { 'key_1': 'value_a', 'key_2': 'value_b'}, { 'key_1': 'value_c', 'key_2': 'value_d'},

Solution 1:

There's no easy way to do this with the library (Node objects in yaml dumper syntax tree are passive and can't emit this info), so I ended up with

stream = yaml.dump(list_of_dicts, default_flow_style = False)
file.write(stream.replace('\n- ', '\n\n- '))

Solution 2:

PyYAML documentation only talks about dump() arguments briefly, because there is not much to say. This kind of control is not provided by PyYAML.

To allow preservation of such empty (and comment) lines in YAML that is loaded, I started the development of the ruamel.yaml library, a superset of the stalled PyYAML, with YAML 1.2 compatibility, many features added and bugs fixed. With ruamel.yaml you can do:

import sys
import ruamel.yaml

yaml_str = """\
- key_1: value_a
  key_2: value_b

- key_1: value_c
  key_2: value_d

- key_1: value_x  # a few before this were ellipsed
  key_2: value_y
"""

yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

and get the output exactly the same as the input string (including the comment).

You can also build the output that you want from scratch:

import sys
import ruamel.yaml

yaml = ruamel.yaml.YAML()
list_of_dicts = yaml.seq([ { 'key_1': 'value_a', 'key_2': 'value_b'},
                           { 'key_1': 'value_c', 'key_2': 'value_d'},
                           { 'key_1': 'value_x', 'key_2': 'value_y'}  ])

for idx in range(1, len(list_of_dicts)):
    list_of_dicts.yaml_set_comment_before_after_key(idx, before='\n')

ruamel.yaml.comments.dump_comments(list_of_dicts)
yaml.dump(list_of_dicts, sys.stdout)

The conversion using yaml.seq() is necessary to create an object that allows attachment of the empty-lines through special attributes.

The library also allows preservation/easy-setting of quotes and literal style on strings, format of int (hex, octal, binary) and floats. As well as separate indent specification for mappings and sequences (although not for individual mappings or sequences).

Solution 3:

While its a little klunky, I had the same goal as OP. I solved it by subclassing yaml.Dumper

from yaml import Dumper

classMyDumper(Dumper):

  defwrite_indent(self):
    indent = self.indent or0ifnot self.indention or self.column > indent \
        or (self.column == indent andnot self.whitespace):
      self.write_line_break()


    ##########$######################################## On the first level of indentation, add an extra# newlineif indent == 2:
      self.write_line_break()

    ##################################################if self.column < indent:
      self.whitespace = True
      data = u' '*(indent-self.column)
      self.column = indent
      if self.encoding:
        data = data.encode(self.encoding)
      self.stream.write(data)

You call it like this:

printdump(python_dict, default_flow_style=False, width=79, Dumper=MyDumper)

Python Guru

Formatting Pyyaml Dump() Output

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Formatting Pyyaml Dump() Output"