How To Concatenate To A Null Column In Pyspark Dataframe
I have a below dataframe and I wanted to update the rows dynamically with some values input_frame.show() +----------+----------+---------+ |student_id|name |timestamp| +------
Solution 1:
use concat_ws
, like this:
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([["1", "2"], ["2", None], ["3", "4"], ["4", "5"], [None, "6"]]).toDF("a", "b")
# This won't workdf = df.withColumn("concat", concat(df.a, df.b))
# This won't workdf = df.withColumn("concat + cast", concat(df.a.cast('string'), df.b.cast('string')))
# Do it like thisdf = df.withColumn("concat_ws", concat_ws("", df.a, df.b))
df.show()
gives:
+----+----+------+-------------+---------+| a| b|concat|concat + cast|concat_ws|+----+----+------+-------------+---------+|1|2|12|12|12||2|null|null|null|2||3|4|34|34|34||4|5|45|45|45||null|6|null|null|6|+----+----+------+-------------+---------+
Note specifically that casting a NULL column to string doesn't work as you wish, and will result in the entire row being NULL if any column is null.
There's no nice way of dealing with more complicated scenarios, but note that you can use a when
statement in side a concat if you're willing to
suffer the verboseness of it, like this:
df.withColumn("concat_custom", concat(
when(df.a.isNull(), lit('_')).otherwise(df.a),
when(df.b.isNull(), lit('_')).otherwise(df.b))
)
To get, eg:
+----+----+-------------+| a| b|concat_custom|+----+----+-------------+|1|2|12||2|null|2_||3|4|34||4|5|45||null|6| _6|+----+----+-------------+
Solution 2:
You can fill null values with empty strings:
import pyspark.sql.functions as f
from pyspark.sql.types import *
data = spark.createDataFrame([('s1', 't1'), ('s2', 't2')], ['col1', 'col2'])
data = data.withColumn('test', f.lit(None).cast(StringType()))
display(data.na.fill('').withColumn('test2', f.concat('col1', 'col2', 'test')))
Is that what you were looking for?
Solution 3:
This will resolve the issue
df = df.withColumn("concat", concat(collease(df.a, lit('')), collease(df.b, lit(''))))
Post a Comment for "How To Concatenate To A Null Column In Pyspark Dataframe"