Cannot Resolve Column Due To Data Type Mismatch Pyspark
Error being faced in PySpark: pyspark.sql.utils.AnalysisException: 'cannot resolve '`result_set`.`dates`.`trackers`['token']' due to data type mismatch: argument 2 requires integra
Solution 1:
As you are accessing array of structs
we need to give which element from array we need to access i.e 0,1,2..
etc.
- if we need to select all elements of array then we need to use
explode()
.
Example:
df.printSchema()
#root
# |-- result_set: struct (nullable = true)
# ||-- currency: string (nullable = true)
# ||-- dates: array (nullable = true)
# |||-- element: struct (containsNull = true)
# ||||-- date: string (nullable = true)
# ||||-- trackers: array (nullable = true)
# |||||-- element: struct (containsNull = true)
# ||||||-- countries: array (nullable = true)
# |||||||-- element: struct (containsNull = true)
# ||||||||-- country: string (nullable = true)
# ||||||||-- os_names: array (nullable = true)
# |||||||||-- element: struct (containsNull = true)
# ||||||||||-- kpi_values: array (nullable = true)
# |||||||||||-- element: long (containsNull = true)
# ||||||||||-- os_name: string (nullable = true)
# ||||||-- token: string (nullable = true)
# ||-- name: string (nullable = true)
# ||-- token: string (nullable = true)
#accessing token,datefromarray
df.selectExpr('result_set.dates.trackers[0].token','result_set.currency', 'result_set.dates.date').show()
#+--------------------------------------------------+--------+------------+
#|result_set.dates.trackers AS trackers#194[0].token|currency|date|
#+--------------------------------------------------+--------+------------+
#| [12345]| EUR|[2020-03-11]|
#+--------------------------------------------------+--------+------------+
#accessing first elements from dates, trackers arrayand extracting date,token values
df.selectExpr('result_set.dates[0].trackers[0].token as token','result_set.currency', 'result_set.dates[0].date as date').show()
#+-----+--------+----------+
#|token|currency|date|
#+-----+--------+----------+
#|12345| EUR|2020-03-11|
#+-----+--------+----------+
#if you need toselectall elements ofarraythen we need to explode the arrayandselect the data
df.selectExpr('result_set.currency','explode(result_set.dates)').\
select("*","col.*").\
selectExpr("explode(trackers)","*").\
selectExpr("currency","date","explode(trackers)").\
select("currency","date","col.*").\
select("currency","date","token").\
show()
#+--------+----------+-----+
#|currency|date|token|
#+--------+----------+-----+
#| EUR|2020-03-11|12345|
#+--------+----------+-----+
Post a Comment for "Cannot Resolve Column Due To Data Type Mismatch Pyspark"