Exception in thread “main” org.apache.spark.SparkException: Application

December 26, 2016
2

Exception in thread “main” org.apache.spark.SparkException: Application

Tags : Failed to access metastore org.apache.spark.SparkException spark spark.yarn.am.extraJavaOptions

Category : Spark

When you run python script on top of hive but it is failing with following error :

$ spark-submit –master yarn –deploy-mode cluster –queue ado –num-executors 60 –executor-memory 3G –executor-cores 5 –py-files argparse.py,load_iris_2.py –driver-memory 10G load_iris.py -p ado_secure.iris_places -s ado_secure.iris_places_stg -f /user/admin/iris/places/2016-11-30-place.csv

Exception in thread “main” org.apache.spark.SparkException: Application application_1476997468030_142120 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:974)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1020)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

When I checked spark logs then I found following error.
16/12/22 07:35:49 WARN metadata.Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:193)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:164)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:162)
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:415)
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:414)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:296)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:74)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

Root Cause:

It can be because of one bug (BUG-56393) in ambari and due to the format of spark job submit in cluster mode.

Resolutions:

You can resolve it with the help of following steps:

Add spark.driver.extraJavaOptions =-Dhdp.version={{hdp_full_version}} -XX:MaxPermSize=1024m -XX:PermSize=256m and spark.yarn.am.extraJavaOptions=-Dhdp.version={{hdp_full_version}} as we were suspecting an old bug.
You were running your custom python script along without hive-site.xml due to that it was not able to connect to hive metastore. So we added –files /etc/spark/conf/hive-site.xml to make a connect to hive metastore.
Add the –jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar option and provided the path to datanucleus jars.

$ spark-submit –master yarn –deploy-mode cluster –conf “spark.driver.extraJavaOptions=-Dhdp.version=2.3.4.0-3485 -XX:MaxPermSize=1024m -XX:PermSize=256m” –conf “spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.4.0-3485” –queue ado –executor-memory 3G –executor-cores 5 –jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar –py-files argparse.py,load_iris_2.py –driver-memory 10G –files /etc/spark/conf/hive-site.xml load_iris.py -p ado_secure.iris_places -s ado_secure.iris_places_stg -f /user/admin/iris/places/2016-11-30-place.csv

Please feel free to reach out to us in case of any further assistance.

Exception in thread “main” org.apache.spark.SparkException: Application

2

Exception in thread “main” org.apache.spark.SparkException: Application

2 Comments

Amin Farvardin

admin

Leave a Reply

Recent Posts

Recent Comments

Archives