Wide To Long Data Table Transformation With Variables In Columns And Rows
Solution 1:
The sample data set provided by the OP suggests that all data frames within the csv file
- have the same structure, i.e., the same number, names, and positions of columns
- and the monthly columns
V4
toV8
refer to the same months 1 to 5 for all "sub frames".
If this is true then we can treat the whole csv file as one data frame and convert it to the desired format by reshaping using melt()
and dcast()
from the data.table
package:
library(data.table)
setDT(df3)[, melt(.SD, id.vars = paste0("V", 1:3), na.rm = TRUE)][
V3 != "month", dcast(.SD, V1 + V2 + rleid(variable) ~ forcats::fct_inorder(V3))][
, setnames(.SD, 1:3, c("city", "address", "month"))]
city address month x y z a b c
1: la 63 main st 1 NA NA NA 87035 345 86690
2: la 63 main st 2 NA NA NA 7467456 456 7467000
3: la 63 main st 3 NA NA NA 3363 345 3018
4: la 63 main st 4 NA NA NA 863 678 185
5: la 63 main st 5 NA NA NA 43673 345 43328
6: nyc 123 main st 1 58568 5345 53223 NA NA NA
7: nyc 123 main st 2 567567 3673 563894 NA NA NA
8: nyc 123 main st 3 567909 3453 564456 NA NA NA
9: nyc 123 main st 4 35876 3467 32409 NA NA NA
10: nyc 123 main st 5 56943 788 56155 NA NA NA
11: sf 953 main st 1 457456 NA 452111 NA 5345 NA
12: sf 953 main st 2 3455 NA -218 NA 3673 NA
13: sf 953 main st 3 345345 NA 341892 NA 3453 NA
14: sf 953 main st 4 56457 NA 52990 NA 3467 NA
15: sf 953 main st 5 3634 NA 2846 NA 788 NA
The fct_inorder()
function from Hadley's forcats
package is used here to order the columns by their first appearance instead of alphabetical order a, b, c, x, y, z.
Note that also the cities have been ordered alphabetically. If this is crcuial (but I doubt it is) the original order can be preserved as well by using
forcats::fct_inorder(V1) + V2 + rleid(variable) ~ forcats::fct_inorder(V3)
as dcast()
formula.
Data
Unfortunately, the OP didn't supply the result of dput(df3)
which made it unnecessarily difficult to reproduce the data set as printed in the question:
df3 <- readr::read_table(
" V1 V2 V3 V4 V5 V6 V7 V8
1 nyc 123 main st month 1 2 3 4 5
2 nyc 123 main st x 58568 567567 567909 35876 56943
3 nyc 123 main st y 5345 3673 3453 3467 788
4 nyc 123 main st z 53223 563894 564456 32409 56155
5
6 la 63 main st month 1 2 3 4 5
7 la 63 main st a 87035 7467456 3363 863 43673
8 la 63 main st b 345 456 345 678 345
9 la 63 main st c 86690 7467000 3018 185 43328
10
11 sf 953 main st month 1 2 3 4 5
12 sf 953 main st x 457456 3455 345345 56457 3634
13 sf 953 main st b 5345 3673 3453 3467 788
14 sf 953 main st z 452111 -218 341892 52990 2846"
)
library(data.table)
setDT(df3)[, V2 := paste(X3, V2)][, c("X1", "X3") := NULL]
setDF(df3)[]
V1 V2 V3 V4 V5 V6 V7 V8
1 nyc 123 main st month 1 2 3 4 5
2 nyc 123 main st x 58568 567567 567909 35876 56943
3 nyc 123 main st y 5345 3673 3453 3467 788
4 nyc 123 main st z 53223 563894 564456 32409 56155
5 NA NA NA NA NA NA
6 la 63 main st month 1 2 3 4 5
7 la 63 main st a 87035 7467456 3363 863 43673
8 la 63 main st b 345 456 345 678 345
9 la 63 main st c 86690 7467000 3018 185 43328
10 NA NA NA NA NA NA
11 sf 953 main st month 1 2 3 4 5
12 sf 953 main st x 457456 3455 345345 56457 3634
13 sf 953 main st b 5345 3673 3453 3467 788
14 sf 953 main st z 452111 -218 341892 52990 2846
Solution 2:
It would help first if you had proper column names for your df, please insert column names once you read in the data.
I have use the following libraries, dplyr
and stringr
for this analysis and also renamed the first 3 columns:
df <- data.frame(stringsAsFactors=FALSE,
city = c("nyc", "nyc", "nyc"),
address = c("123 main st", "123 main st", "123 main st"),
month = c("x", "y", "z"),
X1 = c(58568L, 5345L, 53223L),
X2 = c(567567L, 3673L, 563894L),
X3 = c(567909L, 3453L, 564456L),
X4 = c(35876L, 3467L, 32409L),
X5 = c(56943L, 788L, 56155L)
)
df %>% gather(Type, Value, -c(city:month)) %>%
spread(month, Value) %>%
mutate(month = str_sub(Type, 2, 2)) %>%
select(-Type) %>%
select(c(city, address, month, x:z))
city address month x y z
1 nyc 123 main st 1 58568 5345 53223
2 nyc 123 main st 2 567567 3673 563894
3 nyc 123 main st 3 567909 3453 564456
4 nyc 123 main st 4 35876 3467 32409
5 nyc 123 main st 5 56943 788 56155
Post a Comment for "Wide To Long Data Table Transformation With Variables In Columns And Rows"