How To Create One Dataframe From Multiple Csv Files In A Folder
Solution 1:
Here is an option in R.
Step 1: Prepare a vector with file names. If there are too many files in the folder, the list.files
function could be useful. Here, I just manually created it. I also assume that all the files are stored in the working directory. Otherwise, you will need to construct the file path.
file_vec <- c("A1.csv", "A2.csv", "A3.csv")
Step 2: Read all CSV file based on file_vec. The key is to use the lapply
function to apply read.csv
of every element in file_vec
.
dt_list <- lapply(file_vec, read.csv, stringsAsFactors =FALSE)
Step 3: Prepare a vector showing file names without .csv
name_vec <- sub(".csv", "", file_vec)
Step 4: Create the data frame. x[nrow(x), 2]
is a way to access the last value of the second column.
dt_final <- data.frame(File = name_vec,
Value = sapply(dt_list,function(x) x[nrow(x),2]),
stringsAsFactors =FALSE)
dt_final
is the final output.
Solution 2:
Here's another option using the tidyverse
in R:
library(tidyverse)
# In my example, I'm using a folder with 4 Chicago Crime Datasets
setwd("INSERT/PATH/HERE")
files <- list.files()
tibble(files) %>%
mutate(file_contents = map(files, ~ read_csv(file.path(.), n_max = 10))) %>%
unnest(file_contents) %>%
group_by(files) %>%
slice(n()) %>%
select(1:2)
Which returns:
# A tibble: 4 x 2# Groups: filename [4]filenameX1<chr><int>1Chicago_Crimes_2001_to_2004.csv49042Chicago_Crimes_2005_to_2007.csv103Chicago_Crimes_2008_to_2011.csv58674Chicago_Crimes_2012_to_2017.csv1891
Note that the n_max = 10
argument isn't needed. I only included this because the files I was working with are pretty large.
For anyone interested, the dataset can be found here.
Also, it's possible that you may want to avoid setting the work directory with setwd()
. If this is the case, you can use the additional argument full.names = TRUE
in list.files()
:
path <- "INSERT/PATH/HERE"
files <- list.files(path, full.names = TRUE)
I'd recommend this approach as scripts containing the line setwd()
aren't flexible, paths will change from user to user.
Solution 3:
Python Solution
>>>import pandas as pd>>>files = ['A1.csv', 'A2.csv', ... , 'D10.csv']>>>df_final = pd.Dataframe({fname: pd.read_csv(fname).iat[-1, 1] for fname in files})
Solution 4:
This is an easy case for bash
and friends. This one-liner
for i in A*.csv B*.csv C*.csv D*.csv; do awk -F , 'END{ print $NF }'"$i"; done
extracts the bottom right field, no matter how many rows or columns, of any number of files that follow the pattern you have given. If all files were in one in one folder, and they were the only .csv
files in that folder, and you wanted to save the outcome in a new file, this would do the job:
for i in *.csv; do awk -F , 'END{ print $NF }'"$i"; done > extract.txt
Post a Comment for "How To Create One Dataframe From Multiple Csv Files In A Folder"