Skip to content Skip to sidebar Skip to footer

Cannot Resolve Column Due To Data Type Mismatch Pyspark

Error being faced in PySpark: pyspark.sql.utils.AnalysisException: 'cannot resolve '`result_set`.`dates`.`trackers`['token']' due to data type mismatch: argument 2 requires integra

Solution 1:

As you are accessing array of structs we need to give which element from array we need to access i.e 0,1,2..etc.

  • if we need to select all elements of array then we need to use explode().

Example:

df.printSchema()
#root
# |-- result_set: struct (nullable = true)
# ||-- currency: string (nullable = true)
# ||-- dates: array (nullable = true)
# |||-- element: struct (containsNull = true)
# ||||-- date: string (nullable = true)
# ||||-- trackers: array (nullable = true)
# |||||-- element: struct (containsNull = true)
# ||||||-- countries: array (nullable = true)
# |||||||-- element: struct (containsNull = true)
# ||||||||-- country: string (nullable = true)
# ||||||||-- os_names: array (nullable = true)
# |||||||||-- element: struct (containsNull = true)
# ||||||||||-- kpi_values: array (nullable = true)
# |||||||||||-- element: long (containsNull = true)
# ||||||||||-- os_name: string (nullable = true)
# ||||||-- token: string (nullable = true)
# ||-- name: string (nullable = true)
# ||-- token: string (nullable = true)

#accessing token,datefromarray
df.selectExpr('result_set.dates.trackers[0].token','result_set.currency', 'result_set.dates.date').show()
#+--------------------------------------------------+--------+------------+
#|result_set.dates.trackers AS trackers#194[0].token|currency|date|
#+--------------------------------------------------+--------+------------+
#|                                           [12345]|     EUR|[2020-03-11]|
#+--------------------------------------------------+--------+------------+

#accessing first elements from dates, trackers arrayand extracting date,token values
df.selectExpr('result_set.dates[0].trackers[0].token as token','result_set.currency', 'result_set.dates[0].date as date').show()
#+-----+--------+----------+
#|token|currency|date|
#+-----+--------+----------+
#|12345|     EUR|2020-03-11|
#+-----+--------+----------+

#if you need toselectall elements ofarraythen we need to explode the arrayandselect the data
df.selectExpr('result_set.currency','explode(result_set.dates)').\
select("*","col.*").\
selectExpr("explode(trackers)","*").\
selectExpr("currency","date","explode(trackers)").\
select("currency","date","col.*").\
select("currency","date","token").\
show()

#+--------+----------+-----+
#|currency|date|token|
#+--------+----------+-----+
#|     EUR|2020-03-11|12345|
#+--------+----------+-----+

Post a Comment for "Cannot Resolve Column Due To Data Type Mismatch Pyspark"