Skip to content Skip to sidebar Skip to footer

Spark - No Schema Defined, And No Parquet Data File Or Summary File Found Under

first I started $SPARK_HOME/bin/pyspark and write this code sqlContext.load('jdbc', url='jdbc:mysql://IP:3306/test', driver='com.mysql.jdbc.Driver', dbtable='test.test_tb')

Solution 1:

I don't know the reason of this error, but I stumbled upon it, and then found a way to make the same thing work.

Try this:

df = sqlContext.read.format("jdbc").options(url="jdbc:mysql://server/table?user=usr&password=secret", dbtable="table_name").load()

I suppose the .load syntax is no longer working, or does not work for jdbc. Hope it works!

By the way, I started the console with this command:

SPARK_CLASSPATH=~/progs/postgresql-9.4-1205.jdbc42.jar pyspark

My db is in postgres, and so I downloaded the jar with the jdbc, and added it to my classpath as suggested in the documentation. http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases

Post a Comment for "Spark - No Schema Defined, And No Parquet Data File Or Summary File Found Under"