Category Archives: Oozie

  • 0

Attempt to add *.jar multiple times to the distributed cache

When we submit Spark2 action via oozie then we may see following exception in logs and job will fail:

exception: Attempt to add (hdfs://m1:8020/user/oozie/share/lib/lib_20171129113304/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.

java.lang.IllegalArgumentException: Attempt to add (hdfs://m1:8020/user/oozie/share/lib/lib_20171129113304/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.

The above error occurs because the same jar files exists in both(/user/oozie/share/lib/lib_20171129113304/oozie/ and  /user/oozie/share/lib/lib_20171129113304/spark2/) the locations.

Solution:

You need to deleted duplicate jars from Spark2 directory and will be left with only one copy in Oozie directory.

  1. Identify the oozie sharelib run the command:
    hdfs dfs -ls /user/oozie/share/lib/
  2. Use following command to list all jar files in directory Oozie:
    hdfs dfs -ls /user/oozie/share/lib/lib_<timestamp>/oozie | awk -F \/ ‘{print $8}’ > /tmp/list
  3. Use following command for deleting the jar files in Spark2 directory which matches with Oozie directory:
    for f in $(cat /tmp/list);do echo $f; hdfs dfs -rm -skipTrash /user/oozie/share/lib/lib_<timestamp>/spark2/$f;done
  4. Restart Oozie Service.

Thanks for visiting this blog, please feel free to give your valuable feedback.


  • 0

java.lang.IllegalArgumentException: stream exceeds limit [2,048]

When we run oozie job with SSH action and we use capture output then it may fail with following error.

java.lang.IllegalArgumentException: stream exceeds limit [2,048]
at org.apache.oozie.util.IOUtils.getReaderAsString(IOUtils.java:84)
at org.apache.oozie.servlet.CallbackServlet.doPost(CallbackServlet.java:117)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.oozie.servlet.JsonRestServlet.service(JsonRestServlet.java:304)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:86)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:620)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:745)

Root Cause: It is because of insufficient value for oozie.servlet.CallbackServlet.max.data.len property in oozie-site.xml. In this case it was set to 2048, which wasn’t sufficient for the output.
Resolution:

Option 1: If you are using oozie version before 4.2 then add the following property in oozie-site.xml or add in custom oozie-site via Ambari. 

<property>
 <name>oozie.servlet.CallbackServlet.max.data.len</name>
 <value>16000</value>
</property>

Option 2: If you are using oozie version after 4.2 then add the following property in oozie-site.xml or add in custom oozie-site via Ambari.

<property>
<name>oozie.action.max.output.data</name>
<value>16000</value>
</property>

 

I hope it helped you,feel free to give your valuable feedback suggestions.


  • 2

Ssh action with oozie

When you want to run your shell script via oozie then following article will help you to do your job in easy way.

Following steps you need to setup Oozie workflow using ssh-action:

1. Configure job.properties
Example:

[s0998dnz@m1.hdp22 oozie_ssh_action]$ cat job.properties
#*************************************************
#&nbsp; job.properties
#oozie-action for ssh
#*************************************************
nameNode=hdfs://m1.hdp22:8020
jobTracker=m2.hdp22:8050
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
oozieProjectRoot=${nameNode}/user/${user.name}/ooziesshaction
appPath=${oozieProjectRoot}
oozie.wf.application.path=${appPath}
focusNodeLogin=s0998dnz@m1.hdp22
shellScriptPath=~/oozie_ssh_action/sampletest.sh

2. Configure workflow.xml

Example:


<!--******************************************-->
<!--workflow.xml -->
<!--******************************************-->
<workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1">
 <start to="sshAction"/>
 <action name="sshAction">
 <ssh xmlns="uri:oozie:ssh-action:0.1">
 <host>${focusNodeLogin}</host>
 <command>${shellScriptPath}</command>
 <capture-output/>
 </ssh>
 <ok to="end"/>
 <error to="killAction"/>
 </action>
<!-- <action name="sendEmail">
 <email xmlns="uri:oozie:email-action:0.1">
 <to>${emailToAddress}</to>
 <subject>Output of workflow ${wf:id()}</subject>
 <body>Status of the file move: ${wf:actionData('sshAction')['STATUS']}</body>
 </email>
 <ok to="end"/>
 <error to="end"/>
 </action>
 --> <kill name="killAction">
 <message>"Killed job due to error"</message>
 </kill>
 <end name="end"/>
</workflow-app>

3. Write sample sampletest.sh script

Example:

[s0998dnz@m1.hdp22 oozie_ssh_action]$ cat sampletest.sh 
#!/bin/bash
hadoop fs -ls / > /home/s0998dnz/oozie_ssh_action/output.txt

4. Upload workflow.xml to ${appPath} defined in job.properties

[s0998dnz@m1.hdp22 oozie_ssh_action]$ hadoop fs -put workflow.xml /user/s0998dnz/ooziesshaction/

5. Login to Oozie host by “oozie” user.

[oozie@m2.hdp22 ~]$

6. Generate a key pair,if it doesn’t exist already, using ‘ssh-keygen’ command:

[oozie@m2.hdp22 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/oozie/.ssh/id_rsa): 
Created directory '/home/oozie/.ssh'
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/oozie/.ssh/id_rsa.
Your public key has been saved in /home/oozie/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:EW8WSDG3QnVjGf65znS8bP0AeOrgoQuteYl3hIunO8c oozie@m1.hdp22
The key's randomart image is:
+---[RSA 2048]----+
.*++ =o <span class="Apple-converted-space">
..= *.. <span class="Apple-converted-space">  
o = .
= . . . 
.S . o o
.. .
o . o 
.+.+o .
+ +
++Eo.+ 
+.+o
+----[SHA256]-----+

7. On Oozie Server node copy ~/.ssh/id_rsa.pub and paste it to remote-node’s ~/.ssh/authorized_keys file (focus node)

8. Test password-less ssh from oozie@oozie-host to <username>@<remote-host>

9. Follow below command to run Oozie workflow

oozie job -oozie http://<oozie-server-hostname>:11000/oozie -config /$PATH/job.properties -run

I hope it helped you to do you job in quick time,please feel free to give your valuable feedback or suggestion.


  • 0

Oozie server failing with error “cannot load JDBC driver class ‘com.mysql.jdbc.Driver'”

Issue : Oozie server is failing with following error :

FATAL Services:514 – SERVER[m2.hdp22] E0103: Could not load service classes, Cannot load JDBC driver class ‘com.mysql.jdbc.Driver’
org.apache.oozie.service.ServiceException: E0103: Could not load service classes, Cannot load JDBC driver class ‘com.mysql.jdbc.Driver’
at org.apache.oozie.service.Services.loadServices(Services.java:309)
at org.apache.oozie.service.Services.init(Services.java:213)
at org.apache.oozie.servlet.ServicesLoader.contextInitialized(ServicesLoader.java:46)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4210)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4709)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:802)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:583)
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:676)
at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:602)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:503)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1322)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:325)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1068)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:822)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1060)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
at org.apache.catalina.core.StandardService.start(StandardService.java:525)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:759)
at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: <openjpa-2.2.2-r422266:1468616 fatal general error> org.apache.openjpa.persistence.PersistenceException: Cannot load JDBC driver class ‘com.mysql.jdbc.Driver’
at org.apache.openjpa.jdbc.sql.DBDictionaryFactory.newDBDictionary(DBDictionaryFactory.java:102)
at org.apache.openjpa.jdbc.conf.JDBCConfigurationImpl.getDBDictionaryInstance(JDBCConfigurationImpl.java:603)
at org.apache.openjpa.jdbc.meta.MappingRepository.endConfiguration(MappingRepository.java:1518)
at org.apache.openjpa.lib.conf.Configurations.configureInstance(Configurations.java:531)
at org.apache.openjpa.lib.conf.Configurations.configureInstance(Configurations.java:456)
at org.apache.openjpa.lib.conf.PluginValue.instantiate(PluginValue.java:120)
at org.apache.openjpa.conf.MetaDataRepositoryValue.instantiate(MetaDataRepositoryValue.java:68)
at org.apache.openjpa.lib.conf.ObjectValue.instantiate(ObjectValue.java:83)
at org.apache.openjpa.conf.OpenJPAConfigurationImpl.newMetaDataRepositoryInstance(OpenJPAConfigurationImpl.java:967)
at org.apache.openjpa.conf.OpenJPAConfigurationImpl.getMetaDataRepositoryInstance(OpenJPAConfigurationImpl.java:958)
at org.apache.openjpa.kernel.AbstractBrokerFactory.makeReadOnly(AbstractBrokerFactory.java:644)
at org.apache.openjpa.kernel.AbstractBrokerFactory.newBroker(AbstractBrokerFactory.java:203)
at org.apache.openjpa.kernel.DelegatingBrokerFactory.newBroker(DelegatingBrokerFactory.java:156)
at org.apache.openjpa.persistence.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:227)
at org.apache.openjpa.persistence.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:154)
at org.apache.openjpa.persistence.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:60)
at org.apache.oozie.service.JPAService.getEntityManager(JPAService.java:500)
at org.apache.oozie.service.JPAService.init(JPAService.java:201)
at org.apache.oozie.service.Services.setServiceInternal(Services.java:386)
at org.apache.oozie.service.Services.setService(Services.java:372)
at org.apache.oozie.service.Services.loadServices(Services.java:305)
… 26 more
Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot load JDBC driver class ‘com.mysql.jdbc.Driver’
at org.apache.commons.dbcp.BasicDataSource.createConnectionFactory(BasicDataSource.java:1429)
at org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1371)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at org.apache.openjpa.lib.jdbc.DelegatingDataSource.getConnection(DelegatingDataSource.java:110)
at org.apache.openjpa.lib.jdbc.DecoratingDataSource.getConnection(DecoratingDataSource.java:87)
at org.apache.openjpa.jdbc.sql.DBDictionaryFactory.newDBDictionary(DBDictionaryFactory.java:91)
… 46 more
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1680)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1526)
at org.apache.commons.dbcp.BasicDataSource.createConnectionFactory(BasicDataSource.java:1420)
… 51 more

Root Cause: 

Mysql driver is not located in the class path for oozie server to use.

Solution: You need to copy mysql jdbc driver to required location.

[root@m2 oozie]# cp /usr/share/java/mysql-connector-java.jar /usr/hdp/2.3.4.0-3485/oozie/oozie-server/webapps/oozie/WEB-INF/lib/

Now restart your oozie server and it will be fine.

I hope it helped you to solve your issue, please feel free to give your valuable feedback or suggestion.


  • 0

Hive2 action with Oozie in kerberos Env

One of my friend was trying to run some simple hive2 action in their Oozie workflow and was getting error. Then I decided to replicate it on my cluster and finally I did it after some retry.

If you have the same requirement where you have to run hive sql via oozie then this article will help you to do your job.

So there 3 requirements for Oozie Hive 2 Action on Kerberized HiveServer2:
1. Must have “oozie.credentials.credentialclasses” property defined in /etc/oozie/conf/oozie-site.xml. oozie.credentials.credentialclasses must include the value “hive2=org.apache.oozie.action.hadoop.Hive2Credentials”
2. workflow.xml must include a <credentials><credential>…</credential></credentials> section including the 2 properties “hive2.server.principal” and “hive2.jdbc.url”.
3. The Hive 2 action must reference the above defined credential name in the “cred=” field of the <action> definition.

 

Step 1: First create some dir inside hdfs(under your home dir) to have all script in same place and then run it from there:

[s0998dnz@m1 hive2_action_oozie]$ hadoop fs -mkdir -p /user/s0998dnz/hive2demo/app

Step 2: Now create your workflow.xml and job.properties:

[root@m1 hive_oozie_demo]# cat workflow.xml

<workflow-app name=”hive2demo” xmlns=”uri:oozie:workflow:0.4″>

  <global>

    <job-tracker>${jobTracker}</job-tracker>

    <name-node>${nameNode}</name-node>

  </global>

  <credentials>

    <credential name=”hs2-creds” type=”hive2″>

      <property>

        <name>hive2.server.principal</name>

          <value>${jdbcPrincipal}</value>

      </property>

      <property>

       <name>hive2.jdbc.url</name>

         <value>${jdbcURL}</value>

      </property>

    </credential>

  </credentials>

  <start to=”hive2″/>

    <action name=”hive2″ cred=”hs2-creds”>

      <hive2 xmlns=”uri:oozie:hive2-action:0.1″>

        <jdbc-url>${jdbcURL}</jdbc-url>

        <script>${hivescript}</script>

      </hive2>

      <ok to=”End”/>

      <error to=”Kill”/>

    </action>

    <kill name=”Kill”>

    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

  </kill>

  <end name=”End”/>

</workflow-app>

[s0998dnz@m1 hive2_action_oozie]$ cat job.properties

# Job.properties file

nameNode=hdfs://HDPINF

jobTracker=m2.hdp22:8050

exampleDir=${nameNode}/user/${user.name}/hive2demo

oozie.wf.application.path=${exampleDir}/app

oozie.use.system.libpath=true

# Hive2 action

hivescript=${oozie.wf.application.path}/hivequery.hql

outputHiveDatabase=default

jdbcURL=jdbc:hive2://m2.hdp22:10000/default

jdbcPrincipal=hive/_HOST@HADOOPADMIN.COM

Step 3: Now create your hive script :

[s0998dnz@m1 hive2_action_oozie]$ cat hivequery.hql

show databases;

Step 4: Now Upload  hivequery.hql and workflow.xml to HDFS:
For example:

[s0998dnz@m1 hive2_action_oozie]$ hadoop fs -put workflow.xml /user/s0998dnz/hive2demo/app/

[s0998dnz@m1 hive2_action_oozie]$ hadoop fs -put hivequery.hql /user/s0998dnz/hive2demo/app/

Step 5: Run the oozie job with the properites (please run kinit to acquire kerberos ticket first if required):

[s0998dnz@m1 hive2_action_oozie]$ oozie job -oozie http://m2.hdp22:11000/oozie -config job.properties -run

job: 0000008-170221004234250-oozie-oozi-W

I hope it will help you to run your hive2 action in oozie, please fell free to give your valuable feedback or suggestions.


  • 0

Enable ‘Job Error Log’ in oozie

In the Oozie UI, ‘Job Error Log’ is a tab which was introduced in HDP v2.3 on Oozie v4.2 . By default it is disabled so with the help of following steps you can enable it.

In the Oozie UI, ‘Job Error Log’ is a tab which was introduced in HDP v2.3 on Oozie v4.2 .
This is the most simplest way of looking for error for the specified oozie job from the oozie log file.

To enable the oozie’s Job Error Log, please make the following changes in the oozie log4j property file:

1. Add the below set of lines after log4j.appender.oozie and before log4j.appender.oozieops:

log4j.appender.oozieError=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.oozieError.RollingPolicy=org.apache.oozie.util.OozieRollingPolicy
log4j.appender.oozieError.File=${oozie.log.dir}/oozie-error.log
log4j.appender.oozieError.Append=true
log4j.appender.oozieError.layout=org.apache.log4j.PatternLayout
log4j.appender.oozieError.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L – SERVER[${oozie.instance.id}] %m%n
log4j.appender.oozieError.RollingPolicy.FileNamePattern=${log4j.appender.oozieError.File}-%d{yyyy-MM-dd-HH}
log4j.appender.oozieError.RollingPolicy.MaxHistory=720
log4j.appender.oozieError.filter.1 = org.apache.log4j.varia.LevelMatchFilter
log4j.appender.oozieError.filter.1.levelToMatch = WARN
log4j.appender.oozieError.filter.2 = org.apache.log4j.varia.LevelMatchFilter
log4j.appender.oozieError.filter.2.levelToMatch = ERROR
log4j.appender.oozieError.filter.3 =`enter code here` org.apache.log4j.varia.LevelMatchFilter
log4j.appender.oozieError.filter.3.levelToMatch = FATAL
log4j.appender.oozieError.filter.4 = org.apache.log4j.varia.DenyAllFilter

2. Modify the following from log4j.logger.org.apache.oozie=WARN, oozie to log4j.logger.org.apache.oozie=ALL, oozie, oozieError

3. Restart the oozie service. This would help in getting the job error log for the new jobs launched after restart of oozie service.

I hope it help you to enable your error log in oozie.