Spark: How To Transpose And Explode Columns With Nested Arrays
I applied an algorithm from the question below(in NOTE) to transpose and explode nested spark dataframe. When I define cols = ['a', 'b'] I get empty dataframe, but when I define co
Solution 1:
stack might be a better option than transpose
for the first step.
expr = f"stack({len(cols)}," + \
",".join([f"'{c}',{c}" for c in cols]) + \
")"
#expr = stack(2,'a',a,'b',b)
transpose_df = df.selectExpr("id", expr) \
.withColumnRenamed("col0", "cols") \
.withColumnRenamed("col1", "arrays") \
.filter("not arrays is null")
explode_df = transpose_df.selectExpr('id', 'cols', 'inline(arrays)')
Post a Comment for "Spark: How To Transpose And Explode Columns With Nested Arrays"