Category Archives: Hive

  • 1

Insert overwrite query Failed with exception Unable to move source

If you have explicitly setup hive.exec.stagingdir to some location like /tmp/ or some other location then whenever you will run insert overwrite statment then you will get following error.

ERROR exec.Task (SessionState.java:printError(989)) – Failed with exception Unable to move source hdfs://clustername/apps/finance/nest/nest_audit_log_final/
.hive-staging_hive_2017-12-12_19-15-30_008_33149322272174981-1/-ext-10000 to
destination hdfs://clustername/apps/finance/nest/nest_audit_log_final

Example: 

INSERT OVERWRITE TABLE nest.nest_audit_log_final
SELECT
project_name
, application
, module_seq_num
, module_name
, script_seq_num
, script_name
, run_session_id
, load_ts
, max_posted_date
, currency
, processor
, load_date
FROM nest.nest_audit_log_final;

Then you will get following error:

INFO common.FileUtils (FileUtils.java:mkdir(519)) - 
Creating directory if it doesn't exist: 
hdfs://clustername/apps/finance/nest/nest_audit_log_final
2017-12-12 19:21:42,508 ERROR hdfs.KeyProviderCache (KeyProviderCache.java:createKeyProviderURI(87)) 
- Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
2017-12-12 19:21:42,525 ERROR exec.Task (SessionState.java:printError(989)) 
- Failed with exception Unable to move source 
hdfs://clustername/apps/finance/nest/nest_audit_log_final/
.hive-staging_hive_2017-12-12_19-15-30_008_33149322272174981-1/-ext-10000 to 
destination hdfs://clustername/apps/finance/nest/nest_audit_log_final
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://clustername/apps/finance/nest/nest_audit_log_final/
.hive-staging_hive_2017-12-12_19-15-30_008_33149322272174981-1/-ext-10000 
to destination hdfs://clustername/apps/finance/nest/nest_audit_log_final
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2900)
at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3140)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1727)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:353)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1745)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1491)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1146)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:315)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:429)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:718)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.io.IOException: rename for src path: 
hdfs://clustername/apps/finance/nest/nest_audit_log_final/
.hive-staging_hive_2017-12-12_19-15-30_008_33149322272174981-1/-ext-10000/000000_0 
to dest:hdfs://clustername/apps/finance/nest/nest_audit_log_final returned false
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2849)

Root Cause: It is because of a bug (HIVE-17063). 

Workaround:

set hive.exec.stagingdir=.hive-staging;

  • 1

last access time of a table is showing zero

If you many hundreds or thousands tables and you want to know when was the last time your hive table accessed then you can run following mysql query in mysql under hive database.

mysql> use hive;

mysql> select TBL_NAME,LAST_ACCESS_TIME from TBLS where DB_ID=<db_id>;
+—————————————————————————————————-+——————+
| TBL_NAME | LAST_ACCESS_TIME |
+—————————————————————————————————-+——————+
| df_nov_4 | 0 |
| google_feed_20151111 | 0 |
| null_recos | 0 |
| taxonomy_20160107 | 0 |

Note : but unfortunately due to bug https://issues.apache.org/jira/browse/HIVE-2526 you can not do that without making some configuration changes like follow.

1. From Ambari > Hive > Advanced > Custom hive-site, edit (if it exists) or add a new property:
hive.security.authorization.sqlstd.confwhitelist=hive\.exec\.pre\.hooks

2. From Ambari > Hive > Advanced > General, edit hive.exec.pre.hooks property and append the following to the end of the field value (comma separated):
org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec

3. Restart the affected Hive services after saving the config changes in Ambari.

4. After the restart, try running the query (select TBL_NAME,LAST_ACCESS_TIME from TBLS where DB_ID=<db_id>;) again and this time you should be able to see the last access time.

 

I hope it will help you to get your work done, feel free to give your valuable feedback.


  • 1

kill hive query where application id was not created

Tags :

Category : Hive

Sometime when you run hive queries then it does not launch application or get hung due to some resources or any other reason.

Now in this case you have to kill query to resubmit it. So, please use following steps to kill hive query itself.

 

  • hive> select * from table1;
    Query ID = mapr_201804547_2ad87f0f5627
    Total jobs = 1
    Launching Job 1 out of 1
  • Use “kill query” command, which is available from HDP 2.6.3:
    KILL QUERY <queryid1>

 

Please feel free to give your feedback.


  • 0

hive jdbc in zeppelin throwing permission error to anonymous user

When users run hive query in zeppelin via jdbc interperator then it is going to some anonymous user not an actual user.

INFO [2017-11-02 03:18:20,405] ({pool-2-thread-2} RemoteInterpreter.java[pushAngularObjectRegistryToRemote]:546) – Push local angular object registry from ZeppelinServer to remote interpreter group 2CNQZ1ES5:shared_process
WARN [2017-11-02 03:18:21,825] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2058) – Job 20171031-075630_2029577092 is finished, status: ERROR, exception: null, result: %text org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Unable to fetch table ushi_gl. org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode=”/apps/hive/warehouse/adodb.db/ushi_gl”:hive:hdfs:drwxr-x— 
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:381)
at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermission(RangerHdfsAuthorizer.java:338)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:109)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4111)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1137)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:854)

 

RootCause: It is bug in zeppelin 0.7.0.2 and is going to fix in newer version of zeppelin.

Resolution:  Add your username and password in credential option in zeppelin.

 


  • 0

ERROR : Failed with exception org.apache.hadoop.security.AccessControlException: Permission denied. user=user1 is not the owner of inode=test_copy_1

If users complain that they are not able to load data into hive tables via beeline. Actually while loading data into Hive table using load data inpath ‘/tmp/test’ into table sampledb.sample1 then getting following error:
load data inpath ‘/tmp/test’ into table adodevdb.sample1;
INFO : Loading data to table adodevdb.sample1 from hdfs://m1.hdp22/tmp/test
ERROR : Failed with exception org.apache.hadoop.security.AccessControlException: Permission denied. user=user1 is not the owner of inode=test_copy_1
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:250)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227)
at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:381)
at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermission(RangerHdfsAuthorizer.java:338)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1939)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1908)
at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:63)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1824)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:821)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:464)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

 

Root Cause: It is because rollback() never happens in case of failure, so this problem is there since the start.  BUG-62311 is raised for the same and unfortunately there is no fix for now.

Workaround: You can fix it by applying a following workaround:

set hive.mv.files.thread=0(zero) in hive-site.xml. 


  • 0

Select does not return any row in mr execution engine but returns in tez via beeline

When I ran a select statement via setting set hive.execution.engine=mr; then select * from table is not returning any rows in beeline but when I run it in tez then it is returning result.

0: jdbc:hive2://m1.hdp22:10001/default> select * from test_db.table1 limit 25;

+————————+————————-+————————-+—————————+—————————+—————————+————————-+————————-+————————-+——————————-+————————-+–+

| cus_id  | prx_nme  | fir_nme  | mid_1_nme  | mid_2_nme  | mid_3_nme  | lst_nme  | sfx_nme  | gen_nme  | lic_st_abr_id  | dsd_idc  |

+————————+————————-+————————-+—————————+—————————+—————————+————————-+————————-+————————-+——————————-+————————-+–+

+————————+————————-+————————-+—————————+—————————+—————————+————————-+————————-+————————-+——————————-+————————-+–+

No rows selected (0.108 seconds)

If you will check HiveServer2 logs then you will see following traces :

2017-08-31 09:02:02,239 INFO [HiveServer2-HttpHandler-Pool: Thread-104]: parse.ParseDriver (ParseDriver.java:parse(185)) – Parsing command: select * from table1 limit

25 2017-08-31 09:02:02,241 INFO [HiveServer2-HttpHandler-Pool: Thread-104]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(855)) – 3: get_table : db=test_db tbl=table1

2017-08-31 09:02:02,241 INFO [HiveServer2-HttpHandler-Pool: Thread-104]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(406)) – ugi=saurkumaip=unknown-ip-addrcmd=get_table : db=test_db tbl=table1

2017-08-31 09:02:02,260 INFO [HiveServer2-HttpHandler-Pool: Thread-104]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(855)) – 3: get_table : db=test_db tbl=table1

2017-08-31 09:02:02,260 INFO [HiveServer2-HttpHandler-Pool: Thread-104]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(406)) – ugi=saurkumaip=unknown-ip-addrcmd=get_table : db=test_db tbl=table1

2017-08-31 09:02:02,269 INFO [HiveServer2-HttpHandler-Pool: Thread-104]: ql.Driver (Driver.java:getSchema(253)) – Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:table1.cus_id, type:int, comment:null), FieldSchema(name:table1.prx_nme, type:char(15), comment:null), FieldSchema(name:table1.fir_nme, type:char(15), comment:null), FieldSchema(name:table1.mid_1_nme, type:char(15), comment:null), FieldSchema(name:table1.mid_2_nme, type:char(15), comment:null), FieldSchema(name:table1.mid_3_nme, type:char(15), comment:null), FieldSchema(name:table1.lst_nme, type:char(30), comment:null), FieldSchema(name:table1.sfx_nme, type:char(5), comment:null), FieldSchema(name:table1.gen_nme, type:char(10), comment:null), FieldSchema(name:table1.lic_st_abr_id, type:char(2), comment:null), FieldSchema(name:table1.dsd_idc, type:char(1), comment:null)], properties:null)

2017-08-31 09:02:02,271 INFO [HiveServer2-Background-Pool: Thread-161143]: ql.Driver (Driver.java:execute(1411)) – Starting command(queryId=hive_20170831090202_3dbbdf1c-c061-4289-b4dd-a2934cbec04d): select * from table1 limit 25

2017-08-31 09:02:02,278 INFO [Atlas Logger 2]: hook.HiveHook (HiveHook.java:registerProcess(697)) – Skipped query select * from table1 limit 25 for processing since it is a select query

Root Cause: Actually we ran insert overwrite which replaced all part files to same name dir and created files under those dirs. And as we are aware mr execution does not search recursively and thats why it was not returning any result in case of mr execution engine.

[s0998dnz@m1.hdp22 ~]$ hadoop fs -ls hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/
Found 50 items
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:08 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000000_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:08 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000001_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:08 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000002_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:08 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000003_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:09 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000004_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:09 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000005_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:09 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000006_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:09 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000007_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:10 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000008_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:10 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000009_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:10 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000010_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:10 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000011_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:11 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000012_0
drwxr-x— – dmcraig hdfs 0 2017-08-23 12:11 hdfs://m1.hdp22:8020/apps/hive/warehouse/test_db.db/table1/000013_0

Resolution: There are two solution we have to resolve this issue

  1. You can change the file structure by removeing dir and place file under the table dir.
  2. Or you can set following property SET mapred.input.dir.recursive=true; and then run sql. This property will tell to your engine to search recursively.

Please feel free to give your valuable suggestion or feedback.


  • 0

Hive metastore critical alerts with ExecutionFailed: Execution of ‘export HIVE_CONF_DIR=’/usr/hdp/current/hive-metastore/conf

When you install Atlas and configure it then you may see following alert in Ambari Hive Service.

And once you check this alert details, you will see following error :

Metastore on m1.hdp22 failed (Traceback (most recent call last):
File “/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py”, line 200, in execute
timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,
File “/usr/lib/python2.6/site-packages/resource_management/core/base.py”, line 155, in __init__
self.env.run()
File “/usr/lib/python2.6/site-packages/resource_management/core/environment.py”, line 160, in run
self.run_action(resource, action)
File “/usr/lib/python2.6/site-packages/resource_management/core/environment.py”, line 124, in run_action
provider_action()
File “/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py”, line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File “/usr/lib/python2.6/site-packages/resource_management/core/shell.py”, line 72, in inner
result = function(command, **kwargs)
File “/usr/lib/python2.6/site-packages/resource_management/core/shell.py”, line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File “/usr/lib/python2.6/site-packages/resource_management/core/shell.py”, line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File “/usr/lib/python2.6/site-packages/resource_management/core/shell.py”, line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
ExecutionFailed: Execution of ‘export HIVE_CONF_DIR=’/usr/hdp/current/hive-metastore/conf’ ; hive –hiveconf hive.metastore.uris=thrift://m1.hdp22:9083 –hiveconf hive.metastore.client.connect.retry.delay=1 –hiveconf hive.metastore.failure.retries=1 –hiveconf hive.metastore.connect.retries=1 –hiveconf hive.metastore.client.socket.timeout=14 –hiveconf hive.execution.engine=mr -e ‘show databases;” returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
Exception in thread “main” java.lang.ExceptionInInitializerError
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.atlas.hive.hook.HiveHook.initialize(HiveHook.java:71)
at org.apache.atlas.hive.hook.HiveHook.<init>(HiveHook.java:41)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:60)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1386)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1370)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1598)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1291)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1148)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:315)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.NullPointerException
at org.apache.atlas.hook.AtlasHook.<clinit>(AtlasHook.java:74)
… 29 more
)

Root Cause: This happens when you have installed Atlas on that server where you do not have hive client. Actually you have org.apache.atlas.hive.hook.HiveHook in hive.exec.post.hooks hive property.

Solution :  So to get rid of this alert we need to either remove this parameters from property but as we are using Atlas so we can’t delete it then another option is installed hive client on the same server where you have atlas server.

 

Please feel free to give your valuable feedback to improve articles.


  • 1

/usr/hdp/2.6.1.0-129/atlas/hook-bin/import-hive.sh is failing with Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes

When you have installed atlas on top of your cluster and you want to sync your hive data to atlas via following method then you may see following error after sometime(~20-30 mins) running your command.

[hive@m1.hdp22 ~]$ export HADOOP_CLASSPATH=`hadoop classpath`
[hive@m1.hdp22 ~]$ export HIVE_CONF_DIR=/etc/hive/conf
[hive@m1.hdp22 ~]$ /usr/hdp/2.6.1.0-129/atlas/hook-bin/import-hive.sh
Using Hive configuration directory [/etc/hive/conf]
Log file for import is /usr/hdp/2.6.1.0-129/atlas/logs/import-hive.log
Enter username for atlas :- saurkuma
Enter password for atlas :-

Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes
at org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184)
at org.apache.hadoop.hive.hbase.HBaseSerDeParameters.<init>(HBaseSerDeParameters.java:73)
at org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117)
at org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:54)
at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:410)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:397)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:278)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:260)
at org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:630)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:613)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.createOrUpdateTableInstance(HiveMetaStoreBridge.java:488)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.createTableInstance(HiveMetaStoreBridge.java:424)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.registerTable(HiveMetaStoreBridge.java:505)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTable(HiveMetaStoreBridge.java:289)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTables(HiveMetaStoreBridge.java:272)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:143)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importHiveMetadata(HiveMetaStoreBridge.java:134)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:647)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
… 19 more
Failed to import Hive Data Model!!!

 

Root Cause : This issue seems to be a bug . So, you need to apply hot fix on hive side. 

Resolution : To apply hot fix you can can download attached jar file ( hive-metastore-1.2.1000.2.6.1.0-129.jar) from given URl for this issue, please follow below steps to replace the jar.

https://github.com/hadoopBrogrammers/hadoop-commander/blob/master/hive-metastore-1.2.1000.2.6.1.0-129.jar

Steps to apply this hot fix:
1. Back up the hive-metastore jar from /usr/hdp/2.6.1.0-129/hive/lib to some place on hiveserver 2 and hive-metastor servers.
2. Download and copy the jar at same location .
3. restart hive-metastore,hive-server2.

 

Please feel free to give your valuable feedback or suggestion to improve article.


  • 0

Spark job run successfully in client mode but failing in cluster mode

If you build a pyspark application which can run successfully  in both the local and yarn-client modes.  However, when you try to run in cluster mode, then you may receive following errors :

  1. Error 1:  Exception: (“You must build Spark with Hive. Export ‘SPARK_HIVE=true’ and run build/sbt assembly”, Py4JJavaError(u’An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n’, JavaObject id=o52))
  2. Error 2: INFO Client: Deleting staging directory .sparkStaging/application_1476997468030_139760
    Exception in thread “main” org.apache.spark.SparkException: Application application_1476997468030_139760 finished at org.apache.spark.deploy.yarn.Client.run(Client.scala:974)
  3. Error 3: ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
    java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient Caused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
  4. Error 4: INFO ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
    17/08/22 04:56:19 ERROR ApplicationMaster: Uncaught exception:
    org.apache.spark.SparkException: Exception thrown in awaitResult:
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:401)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:766)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
    Caused by: org.apache.spark.SparkUserAppException: User application exited with 1

Root Cause : If you are using HDP stack then you might be hitting a bug with HDP 2.3.2 with Ambari 2.2.1 :https://hortonworks.jira.com/browse/BUG-56393 where starting from Ambari 2.2.1 , it does not manage the spark version if HDP stack is < HDP 2.3.4.

If not then you are missing some drivers and hive parameters which you need to pass in command line during spark-submit in cluster mode.

Resolution : You can use following steps to solve this issue :

  • Check the hive-site.xml contents. Should be like as below for spark.
  • Add hive-site.xml to the driver-classpath so that spark can read hive configuration. Make sure —files must come before you .jar file.
  • Add the datanucleus jars using –jars option when you submit
  • Check the contents of hive-site.xml
    <configuration>
    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://sandbox.hortonworks.com:9083</value>
    </property>
    </configuration>
  • The Seq. of command
    spark-submit \
    –class <Your.class.name> \
    –master yarn-cluster \
    –num-executors 1 \
    –driver-memory 1g \
    –executor-memory 1g \
    –executor-cores 1 \
    –files /usr/hdp/current/spark-client/conf/hive-site.xml \
    –jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar \
    target/YOUR_JAR-1.0.0-SNAPSHOT.jar “show tables”

Or complete command can be :

spark-submit --master yarn --deploy-mode cluster --queue di --jars /usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar --conf "spark.yarn.appMasterEnv.PATH=/opt/rh/rh-python34/root/usr/bin${PATH:+:${PATH}}" --conf "spark.yarn.appMasterEnv.PATH=/opt/rh/rh-python34/root/usr/bin${PATH:+:${PATH}}" --conf "spark.yarn.appMasterEnv.LD_LIBRARY_PATH=/opt/rh/rh-python34/root/usr/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" --conf "spark.yarn.appMasterEnv.MANPATH=/opt/rh/rh-python34/root/usr/share/man:${MANPATH}" --conf "spark.yarn.appMasterEnv.XDG_DATA_DIRS=/opt/rh/rh-python34/root/usr/share${XDG_DATA_DIRS:+:${XDG_DATA_DIRS}}" --conf "spark.yarn.appMasterEnv.PKG_CONFIG_PATH=/opt/rh/rh-python34/root/usr/lib64/pkgconfig${PKG_CONFIG_PATH:+:${PKG_CONFIG_PATH}}" --conf "spark.executorEnv.PATH=/opt/rh/rh-python34/root/usr/bin${PATH:+:${PATH}}" --conf "spark.executorEnv.PATH=/opt/rh/rh-python34/root/usr/bin${PATH:+:${PATH}}" --conf "spark.executorEnv.LD_LIBRARY_PATH=/opt/rh/rh-python34/root/usr/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" --conf "spark.executorEnv.MANPATH=/opt/rh/rh-python34/root/usr/share/man:${MANPATH}" --conf "spark.executorEnv.XDG_DATA_DIRS=/opt/rh/rh-python34/root/usr/share${XDG_DATA_DIRS:+:${XDG_DATA_DIRS}}" --conf "spark.executorEnv.PKG_CONFIG_PATH=/opt/rh/rh-python34/root/usr/lib64/pkgconfig${PKG_CONFIG_PATH:+:${PKG_CONFIG_PATH}}" hive.py

where hive.py has following value :

[adebatch@server1 ~]$ cat hive.py 
from pyspark import SparkContext,SparkConf
from pyspark.sql import HiveContext
import json
import sys
conf = SparkConf()
sc = SparkContext(conf=conf)
hiveCtx = HiveContext(sc)
result = hiveCtx.sql('show databases')
#result = hiveCtx.sql('select * from default.table1 limit 1')
result.show()
result.write.save('/tmp/pyspark', format='text', mode='overwrite')

Please feel free to give your valuable feedback.


  • 2

Beeline java.lang.OutOfMemoryError: Requested array size exceeds VM limit

When we run beeline jobs very heavily then sometime we can see following error :

WARNING: Use "yarn jar" to launch YARN applications.
issuing: !connect jdbc:hive2://hdpsap.lowes.com:8443/default;transportMode=http;httpPath=gateway/default/hive?hive.execution.engine=tez;tez.queue.name=di;hive.exec.parallel=true;hive.vectorized.execution.enabled=true;hive.vectorized.execution.reduce.enabled hdpdib [pass$
Connecting to jdbc:hive2://hdpsap.lowes.com:8443/default;transportMode=http;httpPath=gateway/default/hive?hive.execution.engine=tez;tez.queue.name=di;hive.exec.parallel=true;hive.vectorized.execution.enabled=true;hive.vectorized.execution.reduce.enabled
17/07/01 20:00:05 [main]: INFO jdbc.Utils: Supplied authorities: hdpsap.lowes.com:8443
17/07/01 20:00:05 [main]: INFO jdbc.Utils: Resolved authority: hdpsap.lowes.com:8443
Connected to: Apache Hive (version 1.2.1.2.3.4.75-1)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:122)
 at org.apache.hive.beeline.BeeLine.getConsoleReader(BeeLine.java:863)
 at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:804)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:773)
 at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:485)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:468)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Root Cause : By default, the history file is located under ~/.beeline/history for that user who is facing this issue and beeline will load the latest 500 rows into memory. If those queries are super big, containing lots of characters, it is possible that the history file size will reach as big as a few GBs. When beeline is trying to load such big history file into memory, it will eventually fail with OutOfMemory error.

Currently Beeline does not provide an option to limit the max size for beeline history file, in the case that each query is very big, it will flood the history file and slow down beeline on start up and shutdown.

https://issues.apache.org/jira/browse/HIVE-15166

[root@m1 ]ls -ltrh /home/hdpdib/.beeline/
total 1.1G
-rw-r--r-- 1 hdpdib hdpuser 1.1G Jul1 03:15 history

Solution : So now for time-being to we have a workaround and that is to remove or clean the ~/.beeline/history file and then run again your jobs. Now you should be good for running jobs. 

[root@m1 ~]# rm /home/hdpdib/.beeline/history

Please feel free to reach out to me or give your valuable feedback.