Skip to content Skip to sidebar Skip to footer

'column' Object Is Not Callable With Regex And Pyspark

I need to extract the integers only from url stings in the column 'Page URL' and append those extracted integers to a new column. I am using PySpark. My code below: from pyspark.s

Solution 1:

You may use

spark_df_url.withColumn("new_column", regexp_extract("Page URL", "\d+", 0))

Specify the name of the string column as the first argument to regexp_replace and make sure the third argument is set to 0 as your pattern has no capturing groups and you are interested in getting the whole match value as a result.

Note that when you specified 1 as the third argument, you got empty results:

If the regex did not match, or the specified group did not match, an empty string is returned.

Post a Comment for "'column' Object Is Not Callable With Regex And Pyspark"