How To Retrieve A Column From Pyspark Dataframe And And Insert It As New Column Within Existing Pyspark Dataframe?
The problem is: I've got a pyspark dataframe like this df1: +--------+ |index | +--------+ | 121| | 122| | 123| | 124| | 125| | 121| | 121| | 126
Solution 1:
Hope this helps!
import pyspark.sql.functions as f
df1 = sc.parallelize([[121],[122],[123]]).toDF(["index"])
df2 = sc.parallelize([[2.4899928731985597,-0.19775025821959014],[1.029654847161142,1.4878188087911541],
[-2.253992428312965,0.29853121635739804]]).toDF(["fact1","fact2"])
# since there is no common column between these two dataframes add row_index so that it can be joined
df1=df1.withColumn('row_index', f.monotonically_increasing_id())
df2=df2.withColumn('row_index', f.monotonically_increasing_id())
df2 = df2.join(df1, on=["row_index"]).sort("row_index").drop("row_index")
df2.show()
Don't forget to let us know if it solved your problem :)
Post a Comment for "How To Retrieve A Column From Pyspark Dataframe And And Insert It As New Column Within Existing Pyspark Dataframe?"