Skip to content Skip to sidebar Skip to footer

Wide To Long Data Table Transformation With Variables In Columns And Rows

I have a csv with multiple tables with variables stored in both rows and columns. About this csv: I'd want to go 'wide' to 'long' There are multiple 'data frames' in one csv There

Solution 1:

The sample data set provided by the OP suggests that all data frames within the csv file

  1. have the same structure, i.e., the same number, names, and positions of columns
  2. and the monthly columns V4to V8 refer to the same months 1 to 5 for all "sub frames".

If this is true then we can treat the whole csv file as one data frame and convert it to the desired format by reshaping using melt() and dcast() from the data.table package:

library(data.table)
setDT(df3)[, melt(.SD, id.vars = paste0("V", 1:3), na.rm = TRUE)][
  V3 != "month", dcast(.SD, V1 + V2 + rleid(variable) ~ forcats::fct_inorder(V3))][
    , setnames(.SD, 1:3, c("city", "address", "month"))]
    city     address month      x    y      z       a    b       c
 1:   la  63 main st     1     NA   NA     NA   87035  345   86690
 2:   la  63 main st     2     NA   NA     NA 7467456  456 7467000
 3:   la  63 main st     3     NA   NA     NA    3363  345    3018
 4:   la  63 main st     4     NA   NA     NA     863  678     185
 5:   la  63 main st     5     NA   NA     NA   43673  345   43328
 6:  nyc 123 main st     1  58568 5345  53223      NA   NA      NA
 7:  nyc 123 main st     2 567567 3673 563894      NA   NA      NA
 8:  nyc 123 main st     3 567909 3453 564456      NA   NA      NA
 9:  nyc 123 main st     4  35876 3467  32409      NA   NA      NA
10:  nyc 123 main st     5  56943  788  56155      NA   NA      NA
11:   sf 953 main st     1 457456   NA 452111      NA 5345      NA
12:   sf 953 main st     2   3455   NA   -218      NA 3673      NA
13:   sf 953 main st     3 345345   NA 341892      NA 3453      NA
14:   sf 953 main st     4  56457   NA  52990      NA 3467      NA
15:   sf 953 main st     5   3634   NA   2846      NA  788      NA

The fct_inorder() function from Hadley's forcats package is used here to order the columns by their first appearance instead of alphabetical order a, b, c, x, y, z.

Note that also the cities have been ordered alphabetically. If this is crcuial (but I doubt it is) the original order can be preserved as well by using

forcats::fct_inorder(V1) + V2 + rleid(variable) ~ forcats::fct_inorder(V3)

as dcast() formula.

Data

Unfortunately, the OP didn't supply the result of dput(df3) which made it unnecessarily difficult to reproduce the data set as printed in the question:

df3 <- readr::read_table(
  "     V1          V2    V3     V4      V5     V6      V7    V8
  1   nyc 123 main st month      1       2      3       4     5
  2   nyc 123 main st     x  58568  567567 567909   35876 56943
  3   nyc 123 main st     y   5345    3673   3453    3467   788
  4   nyc 123 main st     z  53223  563894 564456   32409 56155
  5                                                            
  6    la  63 main st month      1       2      3       4     5
  7    la  63 main st     a  87035 7467456   3363     863 43673
  8    la  63 main st     b    345     456    345     678   345
  9    la  63 main st     c  86690 7467000   3018     185 43328
  10                                                           
  11   sf 953 main st month      1       2      3       4     5
  12   sf 953 main st     x 457456    3455 345345   56457  3634
  13   sf 953 main st     b   5345    3673   3453    3467   788
  14   sf 953 main st     z 452111    -218 341892   52990  2846"
)
library(data.table)
setDT(df3)[, V2 := paste(X3, V2)][, c("X1", "X3") := NULL]
setDF(df3)[]
    V1          V2    V3     V4      V5     V6    V7    V8
1  nyc 123 main st month      1       2      3     4     5
2  nyc 123 main st     x  58568  567567 567909 35876 56943
3  nyc 123 main st     y   5345    3673   3453  3467   788
4  nyc 123 main st     z  53223  563894 564456 32409 56155
5              NA            NA      NA     NA    NA    NA
6   la  63 main st month      1       2      3     4     5
7   la  63 main st     a  87035 7467456   3363   863 43673
8   la  63 main st     b    345     456    345   678   345
9   la  63 main st     c  86690 7467000   3018   185 43328
10             NA            NA      NA     NA    NA    NA
11  sf 953 main st month      1       2      3     4     5
12  sf 953 main st     x 457456    3455 345345 56457  3634
13  sf 953 main st     b   5345    3673   3453  3467   788
14  sf 953 main st     z 452111    -218 341892 52990  2846

Solution 2:

It would help first if you had proper column names for your df, please insert column names once you read in the data.

I have use the following libraries, dplyr and stringr for this analysis and also renamed the first 3 columns:

df <- data.frame(stringsAsFactors=FALSE,
        city = c("nyc", "nyc", "nyc"),
     address = c("123 main st", "123 main st", "123 main st"),
       month = c("x", "y", "z"),
          X1 = c(58568L, 5345L, 53223L),
          X2 = c(567567L, 3673L, 563894L),
          X3 = c(567909L, 3453L, 564456L),
          X4 = c(35876L, 3467L, 32409L),
          X5 = c(56943L, 788L, 56155L)
)

df %>% gather(Type, Value, -c(city:month)) %>% 
        spread(month, Value) %>%
        mutate(month = str_sub(Type, 2, 2)) %>%
        select(-Type) %>%
        select(c(city, address, month, x:z))

city     address month      x    y      z
1  nyc 123 main st     1  58568 5345  53223
2  nyc 123 main st     2 567567 3673 563894
3  nyc 123 main st     3 567909 3453 564456
4  nyc 123 main st     4  35876 3467  32409
5  nyc 123 main st     5  56943  788  56155

Post a Comment for "Wide To Long Data Table Transformation With Variables In Columns And Rows"