Spark - No Schema Defined, And No Parquet Data File Or Summary File Found Under
first I started $SPARK_HOME/bin/pyspark and write this code sqlContext.load('jdbc', url='jdbc:mysql://IP:3306/test', driver='com.mysql.jdbc.Driver', dbtable='test.test_tb')
Solution 1:
I don't know the reason of this error, but I stumbled upon it, and then found a way to make the same thing work.
Try this:
df = sqlContext.read.format("jdbc").options(url="jdbc:mysql://server/table?user=usr&password=secret", dbtable="table_name").load()
I suppose the .load
syntax is no longer working, or does not work for jdbc. Hope it works!
By the way, I started the console with this command:
SPARK_CLASSPATH=~/progs/postgresql-9.4-1205.jdbc42.jar pyspark
My db is in postgres, and so I downloaded the jar with the jdbc, and added it to my classpath as suggested in the documentation. http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
Post a Comment for "Spark - No Schema Defined, And No Parquet Data File Or Summary File Found Under"