Merge One Column From Variable Number Of Csv Files Into One Csv File
Novice Python programmer here. I know there are a lot of SO posts relating to this, but none of the solutions I've reviewed seem to fit my problem. I have a variable number of csv
Solution 1:
Loop through the files, keep track of which keys exist and write all records with csv.DictWriter
and csv.DictReader
.
import csv
records = list()
all_keys = set()
for fn in ["table_1.csv", "table_2.csv"]:
withopen(fn) as f:
reader = csv.DictReader(f)
all_keys.update(set(reader.fieldnames))
for r in reader:
records.append(r)
withopen("table_merged.csv", "wb") as f:
writer = csv.DictWriter(f, fieldnames = all_keys)
writer.writeheader()
for r in records:
writer.writerow(r)
This will write an empty 'cell' for records that didn't have the column.
With your file as both the first and the second .csv
, with in the second case the last column renamed to 002
instead of 001
, you would get this:
UID,Longitude,002,001,Latitude
1,45.20,,13121,-151.012,45.16,,15009,-151.133,45.09,,10067,-151.024,45.03,,14010,-151.331,45.20,13121,,-151.012,45.16,15009,,-151.133,45.09,10067,,-151.024,45.03,14010,,-151.33
If you want to keep the columns in a specific order, you will have to make all_keys
a list
, and then add only the columns in the new file that are not in all_keys
.
all_keys = list()
...
all_keys += list(set(reader.fieldnames).difference(set(all_keys)))
Solution 2:
try pandas approach:
import pandas as pd
file_list = ['1.csv','2.csv','3.csv']
df = pd.read_csv(file_list[0])
for f in file_list[1:]:
# use only 1-st and 4-th columns ...
tmp = pd.read_csv(f, usecols=[0, 3])
df = pd.merge(df, tmp, on='UID')
df.to_csv('output.csv', index=False)
print(df)
Output:
UID Latitude Longitude 00100701501-151.0145.2013121111111112-151.1345.1615009222221223-151.0245.0910067333331334-151.3345.03140104444414
output.csv
UID,Latitude,Longitude,001,007,0151,-151.01,45.2,13121,11111,112,-151.13,45.16,15009,22222,123,-151.02,45.09,10067,33333,134,-151.33,45.03,14010,44444,14
Post a Comment for "Merge One Column From Variable Number Of Csv Files Into One Csv File"