Skip to content Skip to sidebar Skip to footer

Pyspark : Keyerror When Converting A Dataframe Column Of String Type To Double

I'm trying to learn machine learning with PySpark. I have a dataset that has a couple of String columns which have either True or False or Yes or No as its value. I'm working with

Solution 1:

I solved it by changing mapping part to:

binary_map = {'Yes':1.0, 'No':0.0, True : 1.0, False : 0.0}
toNum = UserDefinedFunction(lambda k: binary_map[k], DoubleType())

I just removed the quote from True and False. I thought that was weird but when I checked the schema of the DataFrame using print(df.printSchema()), it showed that the field that has True and False values is of type boolean.

The Schema

root
 |-- State: string (nullable = true)
 |-- Account length: integer (nullable = true)
 |-- Area code: integer (nullable = true)
 |-- International plan: string (nullable = true)
 |-- Voice mail plan: string (nullable = true)
  .
  .
  .
 |-- Customer service calls: integer (nullable = true)
 |-- Churn: boolean (nullable = true)

So that's why I had to take the quotes off. Thank you.

Post a Comment for "Pyspark : Keyerror When Converting A Dataframe Column Of String Type To Double"