Pyspark - ValueError: Could Not Convert String To Float / Invalid Literal For Float()

February 04, 2023 Post a Comment

I am trying to use data from a spark dataframe as the input for my k-means model. However I keep getting errors. (Check section after code) My spark dataframe and looks like this

Solution 1:

you should maybe have continued on the same thread since it's the same problem. For reference : Preprocessing data in pyspark

Here you need to convert Latitude / Longitude to float and remove null values with dropna before injecting the data in Kmean, because it seems these columns contain some strings that cannot be cast to a numeric value, so preprocess df with something like :

df2 = (df
       .withColumn("Latitude", col("Latitude").cast("float"))
       .withColumn("Longitude", col("Longitude").cast("float"))
       .dropna()
       )

spark_rdd = df2.rdd ...

Baca Juga

Read Multiple Csv Data Files And Sort The Data Into A New Csv File
How Can I Fix 502 Error In A Flask Application That I Uploaded In Aws?
Pyspark - Valueerror: Could Not Convert String To Float / Invalid Literal For Float()

Python Guru

Pyspark - ValueError: Could Not Convert String To Float / Invalid Literal For Float()

Solution 1:

Post a Comment for "Pyspark - ValueError: Could Not Convert String To Float / Invalid Literal For Float()"