Delete Rows In Csv Based On Specific Column Value Python
I have a large csv with the following header columns id, type, state, location, number of students and the following values: 124, preschool, Pennsylvania, Pittsburgh, 1242 421, sec
Solution 1:
Here's a solution using csv
module:
import csv
withopen('fin.csv', 'r') as fin, open('fout.csv', 'w', newline='') as fout:
# define reader and writer objects
reader = csv.reader(fin, skipinitialspace=True)
writer = csv.writer(fout, delimiter=',')
# write headers
writer.writerow(next(reader))
# iterate and write rows based on conditionfor i in reader:
ifint(i[-1]) > 2000:
writer.writerow(i)
Result:
id,type,state,location,number of students
213,primary school,California,Los Angeles,3213
155,secondary school,Pennsylvania,Pittsburgh,2141
Solution 2:
In case you just want to read file and avoid any other processing, you can use regex - (assuming this is the last column, and value are positive integers) -
import re
f1 = open('Test1.txt','wb')
withopen("Test.txt") as f:
for line in f:
match = re.search(r'[2-9][0-9]{3,}$', line)
if (match):
f1.write(line)
f1.close()
Same thing will be much faster if you do it on bash -
whileread line; do
K='[2-9][0-9]{3,}$'if [[ $line =~ $K ]] ; thenecho$line; fidone <Test.txt
Post a Comment for "Delete Rows In Csv Based On Specific Column Value Python"