Skip to content Skip to sidebar Skip to footer

Spark: How To Transpose And Explode Columns With Nested Arrays

I applied an algorithm from the question below(in NOTE) to transpose and explode nested spark dataframe. When I define cols = ['a', 'b'] I get empty dataframe, but when I define co

Solution 1:

stack might be a better option than transpose for the first step.


expr = f"stack({len(cols)}," + \
    ",".join([f"'{c}',{c}" for c in cols]) + \
    ")"
#expr = stack(2,'a',a,'b',b)

transpose_df = df.selectExpr("id", expr) \
    .withColumnRenamed("col0", "cols") \
    .withColumnRenamed("col1", "arrays") \
    .filter("not arrays is null")

explode_df = transpose_df.selectExpr('id', 'cols', 'inline(arrays)')

Post a Comment for "Spark: How To Transpose And Explode Columns With Nested Arrays"