Python Sorting Script For Library Book Call No. (csv File)
I am writing a Python script to find duplicate entries in a CSV list of call numbers and titles. Here is the format of the CSV file: 920.105,George Mueller 920.105,George Mueller
Solution 1:
This is based off of @hiro protaginist's answer but it allows unsorted duplicates.
import csv
from io import StringIO
from itertools import groupby
from collections import defaultdict
text = '''286.003,This Day in Baptist History 1
920.105,George Mueller
327.373,The Letters to the Galatians and Ephesians
327.371,Galatians and Ephesians
920.105,George Mueller 1
289,The Modern Tongues Movement
288.01,The Seduction of Christianity
920.105,George Mueller
288.003,Understanding Cults and New Religions
288.002,Understanding Cults and New Religions
286.061,"History of the Baptists, A"
286.044,"History of the Baptists, A"
286.003,This Day in Baptist History 2
286.003,This Day in Baptist History 3'''with StringIO(text) as in_file, StringIO() as out_file:
reader = csv.reader(in_file)
writer = csv.writer(out_file)
grouped = defaultdict(set)
# Maps call_numbers to a set of all book_titles under that numberfor entry in reader:
grouped[entry[0]].add(entry[1])
for call_number, titles in grouped.items():
iflen(titles) > 1:
for title in titles:
writer.writerow((call_number, title))
print(out_file.getvalue()) # Remove this line if actually writing to a file
As with the aforementioned answer, replace StringIO(text)
with open(filename)
and StringIO()
with open(outfilename, 'w')
.
Solution 2:
if your input is sorted by the book numbers, you could use itertools.groupby
:
import csv
from io import StringIO
from itertools import groupby
text = '''920.105,George Mueller
920.105,George Mueller
920.105,George Mueller 1
327.373,The Letters to the Galatians and Ephesians
327.371,Galatians and Ephesians
289,The Modern Tongues Movement
288.01,The Seduction of Christianity
288.003,Understanding Cults and New Religions
288.002,Understanding Cults and New Religions
286.061,"History of the Baptists, A"
286.044,"History of the Baptists, A"
286.003,This Day in Baptist History 1
286.003,This Day in Baptist History 2
286.003,This Day in Baptist History 3'''with StringIO(text) as in_file, StringIO() as out_file:
reader = csv.reader(in_file)
writer = csv.writer(out_file)
for number, group in groupby(reader, key=lambda x: x[0]):
titles = set(item[1] for item in group)
iflen(titles) != 1:
writer.writerow((number, *titles))
print(out_file.getvalue())
which will output
920.105,George Mueller 1,George Mueller
286.003,This Dayin Baptist History 2,This Dayin Baptist History 3,This Dayin Baptist History 1
note that i had to change your input as that would not have generated any output...
in order to use that you'd need replace the with StringIO(text) as file:
with something like with open('infile.txt', 'r') as file
for the program to read your actual file (and similar for the output file with open('outfile.txt', 'w')
).
again: this will only work if your input is sorted by the numbers.
Post a Comment for "Python Sorting Script For Library Book Call No. (csv File)"