Skip to content Skip to sidebar Skip to footer

Merge One Column From Variable Number Of Csv Files Into One Csv File

Novice Python programmer here. I know there are a lot of SO posts relating to this, but none of the solutions I've reviewed seem to fit my problem. I have a variable number of csv

Solution 1:

Loop through the files, keep track of which keys exist and write all records with csv.DictWriter and csv.DictReader.

import csv

records = list()
all_keys = set()
for fn in ["table_1.csv", "table_2.csv"]:
    withopen(fn) as f:
        reader = csv.DictReader(f)
        all_keys.update(set(reader.fieldnames))
        for r in reader:
            records.append(r)

withopen("table_merged.csv", "wb") as f:
    writer = csv.DictWriter(f, fieldnames = all_keys)
    writer.writeheader()
    for r in records:
        writer.writerow(r)

This will write an empty 'cell' for records that didn't have the column.

With your file as both the first and the second .csv, with in the second case the last column renamed to 002 instead of 001, you would get this:

UID,Longitude,002,001,Latitude
1,45.20,,13121,-151.012,45.16,,15009,-151.133,45.09,,10067,-151.024,45.03,,14010,-151.331,45.20,13121,,-151.012,45.16,15009,,-151.133,45.09,10067,,-151.024,45.03,14010,,-151.33

If you want to keep the columns in a specific order, you will have to make all_keys a list, and then add only the columns in the new file that are not in all_keys.

all_keys = list()

... 
         all_keys += list(set(reader.fieldnames).difference(set(all_keys)))

Solution 2:

try pandas approach:

import pandas as pd

file_list = ['1.csv','2.csv','3.csv']

df = pd.read_csv(file_list[0])

for f in file_list[1:]:
    # use only 1-st and 4-th columns ...
    tmp = pd.read_csv(f, usecols=[0, 3])
    df = pd.merge(df, tmp, on='UID')

df.to_csv('output.csv', index=False)

print(df)

Output:

UID  Latitude  Longitude    00100701501-151.0145.2013121111111112-151.1345.1615009222221223-151.0245.0910067333331334-151.3345.03140104444414

output.csv

UID,Latitude,Longitude,001,007,0151,-151.01,45.2,13121,11111,112,-151.13,45.16,15009,22222,123,-151.02,45.09,10067,33333,134,-151.33,45.03,14010,44444,14

Post a Comment for "Merge One Column From Variable Number Of Csv Files Into One Csv File"