Hive Actions with Oozie

  • 0

Hive Actions with Oozie

Category : Hive

One of my friend was trying to run some hive .hql in their Oozie workflow and was getting error. Then I decided to replicate it on my cluster and finally I did it after some retry.

If you have the same requirement where you have to run hive sql via oozie then this article will help you to do your job.

Step 1: First create some dir inside hdfs(under your home dir) to have all script in same place and then run it from there: 

[hdfs@m1 ~]$ hadoop fs -mkdir -p /user/ambari-qa/tutorial/hive-oozie

[root@m1 ]# hadoop fs -mkdir -p /user/ambari-qa/tutorial/hive-input

Step 2: Now create your workflow.xml and job.properties: 

[root@m1 hive_oozie_demo]# cat workflow.xml

<workflow-app xmlns=”uri:oozie:workflow:0.4″ name=”hive-wf”>

    <start to=”hive-node”/>

    <action name=”hive-node”>

        <hive xmlns=”uri:oozie:hive-action:0.2″>

        <job-tracker>${jobTracker}</job-tracker>

        <name-node>${nameNode}</name-node>

        <job-xml>hive-site.xml</job-xml>

            <configuration>

                <property>

                    <name>mapred.job.queue.name</name>

                    <value>${queueName}</value>

                </property>              

            </configuration>

            <script>script.hql</script>

            <param>INPUT_PATH=${inputPath}</param>

        </hive>

        <ok to=”end”/>

        <error to=”fail”/>

    </action>

    <kill name=”fail”>

        <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

    </kill>

    <end name=”end”/>

</workflow-app>

[root@m1 hive_oozie_demo]# cat job.properties

nameNode=hdfs://HDPINF

jobTracker=192.168.56.42:50300

queueName=default

exampleRoot=example

oozie.use.system.libpath=true

oozie.libpath=/user/oozie/share/lib

oozie.wf.application.path=${nameNode}/user/ambari-qa/${exampleRoot}/hive-oozie

inputPath=${nameNode}/user/ambari-qa/${exampleRoot}/hive-input/*

Step 3: Now create hive table in hive database :

hive> create table demo(id int, name string);

Step 4: Now create your hive script :

[root@m1 hive_oozie_demo]# cat script.hql

insert into test.demo select * from test.demo1;

Step 5: Now you need to setup your Oozie workflow app folder. You need one very important file to execute Hive action through Oozie which is hive-site.xml. When Oozie executes a Hive action, it needs Hive’s configuration file. You can provide multiple configurations file in a single action. You can find your Hive configuration file from “/etc/hive/conf.dist/hive-site.xml” (default location). Copy that file and put it inside your workflow application path in HDFS.

[root@m1 hive_oozie_demo]# hadoop fs -put /etc/hive/conf/hive-site.xml /user/ambari-qa/tutorial/hive-oozie/

[root@m1 hive_oozie_demo]# hadoop fs -put script.hql /user/ambari-qa/tutorial/hive-oozie/

[root@m1 hive_oozie_demo]# hadoop fs -put workflow.xml /user/ambari-qa/tutorial/hive-oozie/

[root@m1 hive_oozie_demo]# hadoop fs -lsr /user/ambari-qa/tutorial/hive-oozie

lsr: DEPRECATED: Please use ‘ls -R’ instead.

-rw-r–r–   3 root hdfs      19542 2016-10-08 04:36 /user/ambari-qa/tutorial/hive-oozie/hive-site.xml

-rw-r–r–   3 root hdfs         65 2016-10-08 04:36 /user/ambari-qa/tutorial/hive-oozie/script.hql

-rw-r–r–   3 root hdfs        878 2016-10-08 04:38 /user/ambari-qa/tutorial/hive-oozie/workflow.xml

Look at the <job-xml> tag, since I’m putting hive-site.xml in my application path, so I’m just passing the file name not the whole location. If you want to keep that file in some other location of your HDFS, then you can pass the whole HDFS path there too. In older version of Hive, user had to provide the hive-default.xml file by using property key oozie.hive.defaults while running Oozie Hive action, but from now on (Hive 0.8+) it’s not required anymore.

Step 6: Now you need to submit oozie job to run it :

[ambari-qa@m1 ~]$ oozie job -oozie http://m2.hdp22:11000/oozie -config job.properties -run

job: 0000004-161008041417432-oozie-oozi-W

Now you can check your oozie workflow status via oozie web UI or command prompt :

[ambari-qa@m1 ~]$ oozie job -oozie http://m2.hdp22:11000/oozie -info 0000004-161008041417432-oozie-oozi-W

Job ID : 0000004-161008041417432-oozie-oozi-W

————————————————————————————————————————————

Workflow Name : hive-wf

App Path      : hdfs://HDPINF/user/ambari-qa/tutorial/hive-oozie

Status        : SUCCEEDED

Run           : 0

User          : ambari-qa

Group         : –

Created       : 2016-10-08 11:02 GMT

Started       : 2016-10-08 11:02 GMT

Last Modified : 2016-10-08 11:02 GMT

Ended         : 2016-10-08 11:02 GMT

CoordAction ID: –

Actions

————————————————————————————————————————————

ID                                                                            Status    Ext ID                 Ext Status Err Code

————————————————————————————————————————————

0000004-161008041417432-oozie-oozi-W@:start:                                  OK                              OK                  

————————————————————————————————————————————

0000004-161008041417432-oozie-oozi-W@hive-node                                OK        job_1475917713796_0007 SUCCEEDED           

————————————————————————————————————————————

0000004-161008041417432-oozie-oozi-W@end                                      OK                              OK                  

————————————————————————————————————————————

If it is successful then you can check your table as it would have been uploaded with data.

hive> select * from demo;

OK

1 saurabh

Time taken: 0.328 seconds, Fetched: 1 row(s)

I hope this article will help you to run your hive sql in oozie workflow. Please feel free to reach out to me in case of any suggestion or doubt.

Common issue : 

Issue 1:You may see namenode issue, if you have hard coded namenode URI in your job.properties and your mentioned nn is in standby then you will see this issue. 

[ambari-qa@m1 ~]$ oozie job -oozie http://m2.hdp22:11000 -config job.properties -run

Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, Authentication failed, status: 404, message: Not Found

[ambari-qa@m1 ~]$ oozie job -oozie http://m2.hdp22:11000/oozie -config job.properties -run

Error: E0501 : E0501: Could not perform authorization operation, Operation category READ is not supported in state standby  at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)  at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1786)  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1305)  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3851)  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1011)  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)  at java.security.AccessController.doPrivileged(Native Method)  at javax.security.auth.Subject.doAs(Subject.java:415)  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)

Resolution : To resolve this issue you need to use your HA service id instead of hardcoded NN URI in job.properties. 

[root@m1 hive_oozie_demo]# cat job.properties

nameNode=hdfs://HDPINF

Issue 2: If you see exit code [40000] error with following message then you need to look into your job.properties.
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if hdfs://HDPINF/apps/hive/warehouse/test.db/demo is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://HDPINF/apps/hive/warehouse/test.db/demo, expected: hdfs://HDPINF:8020
Intercepting System.exit(40000)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [40000]

Resolution : You need to change your namenode URI, i.e remove port from the end of URI.
Please keep nameNode=hdfs://HDPINF instead ofnameNode=hdfs://HDPINF:8020.

Issue 3: If you see following error then you may need to change your hive sql or approach because of this issue seems to be unresolved. 

FAILED: SemanticException [Error 10028]: Line 1:17 Path is not legal ''hdfs://HDPINF:8020/user/ambari-qa/tutorial/hive-input/*'': 
Move from: hdfs://HDPINF:8020/user/ambari-qa/tutorial/hive-input/* to: hdfs://HDPINF/apps/hive/warehouse/test.db/demo is not valid.
 Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict.
Intercepting System.exit(10028)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10028]

Resolution : https://issues.apache.org/jira/browse/HIVE-8147