Skip to content Skip to sidebar Skip to footer

How To Slice Numbered Lists Into Sublists

I have opened a file and used readlines() and split() with regex '\t' to remove TABs and it has resulted into the following lists: ['1', 'cats', '--,'] ['2', 'chase', '--,'] ['3',

Solution 1:

Something like:

from itertools import groupby

withopen('yourfile') as fin:
    # split lines
    lines = (line.split() for line in fin)
    # group by consecutive ints
    grouped = groupby(enumerate(lines), lambda (idx, el): idx - int(el[0]))
    # build sentences from words in groups
    sentences = [' '.join(el[1][1] for el in g) for k, g in grouped]
    # ['cats chase dogs', 'the car is gray']

NB: This works based on your example data of:

example = [
    ["1", "cats", "--,"],
    ["2", "chase", "--,"],
    ["3", "dogs", "--,"],
    ["1", "the", "--,"],
    ["2", "car", "--,"],
    ["3", "is", "--,"],
    ["4", "gray", "--,"]
]

Solution 2:

Choosing the suitable data structures make the job easier:

container = [["1", "cats", "--,"],
             ["2", "chase", "--,"],
             ["3", "dogs", "--,"],
             ["1", "the", "--,"],
             ["2", "car", "--,"],
             ["3", "is", "--,"],
             ["4", "gray", "--,"]]

Nest your lists in a container list then use a dictionary to store the output lists:

from collections import defaultdict

out = defaultdict(list)              # Initialize dictionary for output
key = 0# Initialize key  for idx, word, _ in container:       # Unpack sublistsifint(idx) == 1:                # Check if we are at start of new sentence
        key += 1# Increment key for new sentence
    out[key].append(word)            # Add word to list

Gives:

{
    1: ['cats', 'chase', 'dogs'], 
    2: ['the', 'car', 'is', 'gray']
}

Post a Comment for "How To Slice Numbered Lists Into Sublists"