Exception in thread “main” org.apache.spark.SparkException: Application
Category : Spark
When you run python script on top of hive but it is failing with following error :
$ spark-submit –master yarn –deploy-mode cluster –queue ado –num-executors 60 –executor-memory 3G –executor-cores 5 –py-files argparse.py,load_iris_2.py –driver-memory 10G load_iris.py -p ado_secure.iris_places -s ado_secure.iris_places_stg -f /user/admin/iris/places/2016-11-30-place.csv
Exception in thread “main” org.apache.spark.SparkException: Application application_1476997468030_142120 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:974)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1020)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
When I checked spark logs then I found following error.
16/12/22 07:35:49 WARN metadata.Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:193)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:164)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:162)
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:415)
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:414)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:296)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:74)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
Root Cause:
It can be because of one bug (BUG-56393) in ambari and due to the format of spark job submit in cluster mode.
Resolutions:
You can resolve it with the help of following steps:
- Add spark.driver.extraJavaOptions =-Dhdp.version={{hdp_full_version}} -XX:MaxPermSize=1024m -XX:PermSize=256m and spark.yarn.am.extraJavaOptions=-Dhdp.version={{hdp_full_version}} as we were suspecting an old bug.
- You were running your custom python script along without hive-site.xml due to that it was not able to connect to hive metastore. So we added –files /etc/spark/conf/hive-site.xml to make a connect to hive metastore.
- Add the –jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar option and provided the path to datanucleus jars.