Hive Actions with Oozie
Category : Hive
One of my friend was trying to run some hive .hql in their Oozie workflow and was getting error. Then I decided to replicate it on my cluster and finally I did it after some retry.
If you have the same requirement where you have to run hive sql via oozie then this article will help you to do your job.
Step 1: First create some dir inside hdfs(under your home dir) to have all script in same place and then run it from there:
[hdfs@m1 ~]$ hadoop fs -mkdir -p /user/ambari-qa/tutorial/hive-oozie
[root@m1 ]# hadoop fs -mkdir -p /user/ambari-qa/tutorial/hive-input
Step 2: Now create your workflow.xml and job.properties:
[root@m1 hive_oozie_demo]# cat workflow.xml
<workflow-app xmlns=”uri:oozie:workflow:0.4″ name=”hive-wf”>
<start to=”hive-node”/>
<action name=”hive-node”>
<hive xmlns=”uri:oozie:hive-action:0.2″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<script>script.hql</script>
<param>INPUT_PATH=${inputPath}</param>
</hive>
<ok to=”end”/>
<error to=”fail”/>
</action>
<kill name=”fail”>
<message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name=”end”/>
</workflow-app>
[root@m1 hive_oozie_demo]# cat job.properties
nameNode=hdfs://HDPINF
jobTracker=192.168.56.42:50300
queueName=default
exampleRoot=example
oozie.use.system.libpath=true
oozie.libpath=/user/oozie/share/lib
oozie.wf.application.path=${nameNode}/user/ambari-qa/${exampleRoot}/hive-oozie
inputPath=${nameNode}/user/ambari-qa/${exampleRoot}/hive-input/*
Step 3: Now create hive table in hive database :
hive> create table demo(id int, name string);
Step 4: Now create your hive script :
[root@m1 hive_oozie_demo]# cat script.hql
insert into test.demo select * from test.demo1;
Step 5: Now you need to setup your Oozie workflow app folder. You need one very important file to execute Hive action through Oozie which is hive-site.xml. When Oozie executes a Hive action, it needs Hive’s configuration file. You can provide multiple configurations file in a single action. You can find your Hive configuration file from “/etc/hive/conf.dist/hive-site.xml” (default location). Copy that file and put it inside your workflow application path in HDFS.
[root@m1 hive_oozie_demo]# hadoop fs -put /etc/hive/conf/hive-site.xml /user/ambari-qa/tutorial/hive-oozie/
[root@m1 hive_oozie_demo]# hadoop fs -put script.hql /user/ambari-qa/tutorial/hive-oozie/
[root@m1 hive_oozie_demo]# hadoop fs -put workflow.xml /user/ambari-qa/tutorial/hive-oozie/
[root@m1 hive_oozie_demo]# hadoop fs -lsr /user/ambari-qa/tutorial/hive-oozie
lsr: DEPRECATED: Please use ‘ls -R’ instead.
-rw-r–r– 3 root hdfs 19542 2016-10-08 04:36 /user/ambari-qa/tutorial/hive-oozie/hive-site.xml
-rw-r–r– 3 root hdfs 65 2016-10-08 04:36 /user/ambari-qa/tutorial/hive-oozie/script.hql
-rw-r–r– 3 root hdfs 878 2016-10-08 04:38 /user/ambari-qa/tutorial/hive-oozie/workflow.xml
Look at the <job-xml> tag, since I’m putting hive-site.xml in my application path, so I’m just passing the file name not the whole location. If you want to keep that file in some other location of your HDFS, then you can pass the whole HDFS path there too. In older version of Hive, user had to provide the hive-default.xml file by using property key oozie.hive.defaults while running Oozie Hive action, but from now on (Hive 0.8+) it’s not required anymore.
Step 6: Now you need to submit oozie job to run it :
[ambari-qa@m1 ~]$ oozie job -oozie http://m2.hdp22:11000/oozie -config job.properties -run
job: 0000004-161008041417432-oozie-oozi-W
Now you can check your oozie workflow status via oozie web UI or command prompt :
[ambari-qa@m1 ~]$ oozie job -oozie http://m2.hdp22:11000/oozie -info 0000004-161008041417432-oozie-oozi-W
Job ID : 0000004-161008041417432-oozie-oozi-W
————————————————————————————————————————————
Workflow Name : hive-wf
App Path : hdfs://HDPINF/user/ambari-qa/tutorial/hive-oozie
Status : SUCCEEDED
Run : 0
User : ambari-qa
Group : –
Created : 2016-10-08 11:02 GMT
Started : 2016-10-08 11:02 GMT
Last Modified : 2016-10-08 11:02 GMT
Ended : 2016-10-08 11:02 GMT
CoordAction ID: –
Actions
————————————————————————————————————————————
ID Status Ext ID Ext Status Err Code
————————————————————————————————————————————
0000004-161008041417432-oozie-oozi-W@:start: OK – OK –
————————————————————————————————————————————
0000004-161008041417432-oozie-oozi-W@hive-node OK job_1475917713796_0007 SUCCEEDED –
————————————————————————————————————————————
0000004-161008041417432-oozie-oozi-W@end OK – OK –
————————————————————————————————————————————
If it is successful then you can check your table as it would have been uploaded with data.
hive> select * from demo;
OK
1 saurabh
Time taken: 0.328 seconds, Fetched: 1 row(s)
I hope this article will help you to run your hive sql in oozie workflow. Please feel free to reach out to me in case of any suggestion or doubt.
Common issue :
Issue 1:You may see namenode issue, if you have hard coded namenode URI in your job.properties and your mentioned nn is in standby then you will see this issue.
[ambari-qa@m1 ~]$ oozie job -oozie http://m2.hdp22:11000 -config job.properties -run
Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, Authentication failed, status: 404, message: Not Found
[ambari-qa@m1 ~]$ oozie job -oozie http://m2.hdp22:11000/oozie -config job.properties -run
Error: E0501 : E0501: Could not perform authorization operation, Operation category READ is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1786) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1305) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3851) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1011) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)
Resolution : To resolve this issue you need to use your HA service id instead of hardcoded NN URI in job.properties.
[root@m1 hive_oozie_demo]# cat job.properties
nameNode=hdfs://HDPINF
Issue 2: If you see exit code [40000] error with following message then you need to look into your job.properties.
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if hdfs://HDPINF/apps/hive/warehouse/test.db/demo is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://HDPINF/apps/hive/warehouse/test.db/demo, expected: hdfs://HDPINF:8020
Intercepting System.exit(40000)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [40000]
Resolution : You need to change your namenode URI, i.e remove port from the end of URI.
Please keep nameNode=hdfs://HDPINF instead ofnameNode=hdfs://HDPINF:8020.
Issue 3: If you see following error then you may need to change your hive sql or approach because of this issue seems to be unresolved.
FAILED: SemanticException [Error 10028]: Line 1:17 Path is not legal ''hdfs://HDPINF:8020/user/ambari-qa/tutorial/hive-input/*'': Move from: hdfs://HDPINF:8020/user/ambari-qa/tutorial/hive-input/* to: hdfs://HDPINF/apps/hive/warehouse/test.db/demo is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict. Intercepting System.exit(10028) Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10028] Resolution : https://issues.apache.org/jira/browse/HIVE-8147