Skip to content Skip to sidebar Skip to footer

Deleting Certain Line Of Text File In Python

I have the following text file: This is my text file NUM,123 FRUIT DRINK FOOD,BACON CAR NUM,456 FRUIT DRINK FOOD,BURGER CAR NUM,789 FRUIT DRINK FOOD,SAUSAGE CAR NUM,012 FRUIT DRINK

Solution 1:

You shouldn't modify a list while you are looping over it.

What you could try is to just advance the iterator on the file object when needed:

wanted = set(['123', '789'])

with open("inputfile.txt",'r') as infile, open("outfile.txt",'w') as outfile: 
    for line in infile:
        if line.startswith('NUM,'):
            UNIT = line.strip().split(',')[1] 
            if UNIT not in wanted:
                for _ in xrange(4):
                    infile.next()
                continue

        outfile.write(line)

And use a set. It is faster for constantly checking the membership.

This approach doesn't make you read in the entire file at once to process it in a list form. It goes line by line, reading from the file, advancing, and writing to the new file. If you want, you can replace the outfile with a list that you are appending to.


Solution 2:

There are some issues with the code; for instance, data_list isn't even defined. If it's a list, you can't del elements from it; you can only pop. Then you use both enumerate and direct index access on data; also readlines is not needed.

I'd suggest to avoid keeping all lines in memory, it's not really needed here. Maybe try with something like (untested):

with open('infile.txt') as fin, open('outfile.txt', 'w') as fout:
   for line in fin:
       if line.startswith('NUM,') and line.split(',')[1] not in wanted:
           for _ in range(4):
               fin.next()
       else:
           fout.write(line)

Solution 3:

import re
# find the lines that match NUM,XYZ
nums = re.compile('NUM,(?:' + '|'.join(['456','012']) + ")")
# find the three lines after a nums match
line_matches = breaks = re.compile('.*\n.*\n.*\n')
keeper = ''
for line in nums.finditer(data):
    keeper += breaks.findall( data[line.start():] )[0]

result on the given string is

NUM,456
FRUIT
DRINK
FOOD,BURGER

NUM,012
FRUIT
DRINK
FOOD,MEATBALL

Solution 4:

edit: deleting items while iterating is probably not a good idea, see: Remove items from a list while iterating

infile = open("inputfile.txt",'r')
data = infile.readlines()
SKIP_LINES = 4
skip_until = False

result_data = []
for current_line, line in enumerate(data):
    if skip_until and skip_until < current_line:
        continue

    try:
        _, num = line.split(',')
    except ValueError:
        pass
    else:
       if num not in wanted:
           skip_until = current_line + SKIP_LINES
       else:
           result_data.append(line)

... and result_data is what you want.


Solution 5:

If you don't mind building a list, and iff your "NUM" lines come every 5 other line, you may want to try:

keep = []
for (i, v) in enumerate(lines[::5]):
    (num, current) = v.split(",")
    if current in wanted:
        keep.extend(lines[i*5:i*5+5])

Post a Comment for "Deleting Certain Line Of Text File In Python"