March 2017 – BigData

March 30, 2017
0

hadoop snapshots

Tags : .snapshot allowSnapshot createSnapshot lsSnapshottableDir snapshots

Category : HDFS

Hdfs snapshots are to protect important enterprise data sets from user or application errors.HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:

To demonstrate functionality of snapshots, we will create a directory in HDFS, will create its snapshot and will remove a file from the directory. Later, we will demonstrate how to recover the file from the snapshot.

First we will try to get all the snapshottable directories where the current user has permission to take snapshtos.

[hdfs@m1 ~]$ hdfs lsSnapshottableDir

here we have noticed that there is no dir which is snapshottable.

So now lets create a demo dir and and then we will create a snapshot on top of that dir.

[hdfs@m1 ~]$ hdfs dfs -mkdir /tmp/snapshot_demo
[hdfs@m1 ~]$ touch demo.txt
[hdfs@m1 ~]$ hadoop fs -put demo.txt  /tmp/snapshot_demo/
[hdfs@m1 ~]$ hdfs dfsadmin -allowSnapshot /tmp/snapshot_demo
Allowing snaphot on /tmp/snapshot_demo succeeded

Now if you will check the list of snapshottable dirs then you should get at-least above snapshot_demo.

[hdfs@m1 ~]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs hdfs 0 2017-03-30 03:31 0 65536 /tmp/snapshot_demo

Now lets create a snapshot on top /tmp/snapshot_demo and then check whether its created or not.

[hdfs@m1 ~]$ hdfs dfs -createSnapshot /tmp/snapshot_demo
Created snapshot /tmp/snapshot_demo/.snapshot/s20170330-033236.441
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/demo.txt
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot
Found 1 items
drwxr-xr-x   - hdfs hdfs          0 2017-03-30 03:32 /tmp/snapshot_demo/.snapshot/s20170330-033236.441
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot/s20170330-033236.441/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt

Accidentally delete this snapshottable dir or files.

[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo
rm: The directory /tmp/snapshot_demo cannot be deleted since /tmp/snapshot_demo is snapshottable and already has snapshots
[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo/demo.txt
Deleted /tmp/snapshot_demo/demo.txt
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/

Oppsss… Surprisingly or not, the file was removed! What a bad day! What a horrible accident! Do not worry too much, however.We can recover this file because we have a snapshot!

[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot/s20170330-033236.441/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt
[hdfs@m1 ~]$ hadoop fs -cp /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt /tmp/snapshot_demo/
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:35 /tmp/snapshot_demo/demo.txt

This will restore the lost set of files to the working data set.

Also you can not delete snapshots, and it is because snapshots are read-only, HDFS will also protect against user or application deletion of the snapshot data itself. The following operation will fail:

[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo/.snapshot/s20170330-033236.441
rm: Modification on a read-only snapshot is disallowed

I hope it helped to understand snapshots,feel free to give your valuable feedback or suggestions.

March 21, 2017
2

Ssh action with oozie

Tags : job.properties oozie_ssh_action shell action

Category : Oozie

When you want to run your shell script via oozie then following article will help you to do your job in easy way.

Following steps you need to setup Oozie workflow using ssh-action:

1. Configure job.properties
Example:

[s0998dnz@m1.hdp22 oozie_ssh_action]$ cat job.properties
#*************************************************
#&nbsp; job.properties
#oozie-action for ssh
#*************************************************
nameNode=hdfs://m1.hdp22:8020
jobTracker=m2.hdp22:8050
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
oozieProjectRoot=${nameNode}/user/${user.name}/ooziesshaction
appPath=${oozieProjectRoot}
oozie.wf.application.path=${appPath}
focusNodeLogin=s0998dnz@m1.hdp22
shellScriptPath=~/oozie_ssh_action/sampletest.sh

2. Configure workflow.xml

Example:


<!--******************************************-->
<!--workflow.xml -->
<!--******************************************-->
<workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1">
 <start to="sshAction"/>
 <action name="sshAction">
 <ssh xmlns="uri:oozie:ssh-action:0.1">
 <host>${focusNodeLogin}</host>
 <command>${shellScriptPath}</command>
 <capture-output/>
 </ssh>
 <ok to="end"/>
 <error to="killAction"/>
 </action>
<!-- <action name="sendEmail">
 <email xmlns="uri:oozie:email-action:0.1">
 <to>${emailToAddress}</to>
 <subject>Output of workflow ${wf:id()}</subject>
 <body>Status of the file move: ${wf:actionData('sshAction')['STATUS']}</body>
 </email>
 <ok to="end"/>
 <error to="end"/>
 </action>
 --> <kill name="killAction">
 <message>"Killed job due to error"</message>
 </kill>
 <end name="end"/>
</workflow-app>

3. Write sample sampletest.sh script

Example:

[s0998dnz@m1.hdp22 oozie_ssh_action]$ cat sampletest.sh 
#!/bin/bash
hadoop fs -ls / > /home/s0998dnz/oozie_ssh_action/output.txt

4. Upload workflow.xml to ${appPath} defined in job.properties

[s0998dnz@m1.hdp22 oozie_ssh_action]$ hadoop fs -put workflow.xml /user/s0998dnz/ooziesshaction/

5. Login to Oozie host by “oozie” user.

[oozie@m2.hdp22 ~]$

6. Generate a key pair,if it doesn’t exist already, using ‘ssh-keygen’ command:

[oozie@m2.hdp22 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/oozie/.ssh/id_rsa): 
Created directory '/home/oozie/.ssh'
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/oozie/.ssh/id_rsa.
Your public key has been saved in /home/oozie/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:EW8WSDG3QnVjGf65znS8bP0AeOrgoQuteYl3hIunO8c oozie@m1.hdp22
The key's randomart image is:
+---[RSA 2048]----+
.*++ =o <span class="Apple-converted-space">
..= *.. <span class="Apple-converted-space">  
o = .
= . . . 
.S . o o
.. .
o . o 
.+.+o .
+ +
++Eo.+ 
+.+o
+----[SHA256]-----+

7. On Oozie Server node copy ~/.ssh/id_rsa.pub and paste it to remote-node’s ~/.ssh/authorized_keys file (focus node)

8. Test password-less ssh from oozie@oozie-host to <username>@<remote-host>

9. Follow below command to run Oozie workflow

oozie job -oozie http://<oozie-server-hostname>:11000/oozie -config /$PATH/job.properties -run

I hope it helped you to do you job in quick time,please feel free to give your valuable feedback or suggestion.

March 20, 2017
0

How to remove header from csv during loading to hive

Tags : PigStorage remove header skip.header.line.count tblproperties

Category : Hive

Sometime we may have header in our data file and we do not want that header to loaded into our hive table or we want to ignore header then this article will help you.

[saurkuma@m1 ~]$ cat sampledata.csv

id,Name

1,Saurabh

2,Vishal

3,Jeba

4,Sonu

Step 1: Create a table with table properties to ignore it.

hive> create table test(id int,name string) row format delimited fields terminated by ‘,’ tblproperties(“skip.header.line.count”=”1”) ;

Time taken: 0.233 seconds

hive> show tables;

salesdata01

table1

table2

test

tmp

Time taken: 0.335 seconds, Fetched: 5 row(s)

hive> load data local inpath ‘/home/saurkuma/sampledata.csv’ overwrite into table test;

Loading data to table demo.test

Table demo.test stats: [numFiles=1, totalSize=41]

Time taken: 0.979 seconds

hive> select * from test;

1 Saurabh

2 Vishal

3 Jeba

4 Sonu

Time taken: 0.111 seconds, Fetched: 4 row(s)

To remove header in Pig:

A=load ‘sampledata.csv’ using PigStorage(‘,’);
B=FILTER A BY $0>1;

I hope this helped you to do your job in easy way. Please feel free to give your valuable suggestion or feedback.

March 16, 2017
2

Insert date into hive tables shows null during select

Tags : cast(to_date(from_unixtime(unix_timestamp from_unixtime hive null values in hive ROW FORMAT DELIMITED timestamp to_date unix_timestamp

Category : Hive

When we try to create table on any files(csv or any other format) and load data into hive table then we may see that during select queries it is showing null value.

You can solve it in the following ways:

[saurkuma@m1 ~]$ ll

total 584

-rw-r–r– 1 saurkuma saurkuma 591414 Mar 16 02:31 SalesData01.csv

[saurkuma@m1 ~]$ hive

WARNING: Use “yarn jar” to launch YARN applications.

ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,file:/usr/hdp/2.3.4.0-3485/hadoop/lib/hadoop-lzo-0.6.0.2.3.4.0-3485-sources.jar!/ivysettings.xml will be used

Logging initialized using configuration in file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties

hive> show databases;

default

demo

testhive

Time taken: 3.341 seconds, Fetched: 3 row(s)

hive> use demo;

Time taken: 1.24 seconds

hive> create table salesdata01 (Row_ID INT, Order_ID INT, Order_date String, Order_Priority STRING, Order_Quantity FLOAT, Sales FLOAT, Discount FLOAT, Shipping_Mode STRING, Profit FLOAT, Unit_Price FLOAT) row format delimited fields terminated by ‘,’;

Time taken: 0.782 seconds

hive> select * from salesdata01;

Time taken: 0.721 seconds

hive> load data local inpath ‘/home/saurkuma/SalesData01.csv’ overwrite into table salesdata01;

Loading data to table demo.salesdata01

Table demo.salesdata01 stats: [numFiles=1, totalSize=591414]

Time taken: 1.921 seconds

hive> select * from salesdata01 limit 10;

1 3 13-10-2010 Low 6.0 261.54 0.04 Regular Air -213.25 38.94

49 293 01-10-2012 High 49.0 10123.02 0.07 Delivery Truck 457.81 208.16

50 293 01-10-2012 High 27.0 244.57 0.01 Regular Air 46.71 8.69

80 483 10-07-2011 High 30.0 4965.7593 0.08 Regular Air 1198.97 195.99

85 515 28-08-2010 Not Specified 19.0 394.27 0.08 Regular Air 30.94 21.78

86 515 28-08-2010 Not Specified 21.0 146.69 0.05 Regular Air 4.43 6.64

97 613 17-06-2011 High 12.0 93.54 0.03 Regular Air -54.04 7.3

98 613 17-06-2011 High 22.0 905.08 0.09 Regular Air 127.7 42.76

103 643 24-03-2011 High 21.0 2781.82 0.07 Express Air -695.26 138.14

107 678 26-02-2010 Low 44.0 228.41 0.07 Regular Air -226.36 4.98

Time taken: 0.143 seconds, Fetched: 10 row(s)

hive> select * from salesdata01 where Order_date=’01-10-2012′ limit 10;

49 293 01-10-2012 High 49.0 10123.02 0.07 Delivery Truck 457.81 208.16

50 293 01-10-2012 High 27.0 244.57 0.01 Regular Air 46.71 8.69

3204 22980 01-10-2012 Not Specified 17.0 224.09 0.0 Regular Air -27.92 12.44

3205 22980 01-10-2012 Not Specified 10.0 56.05 0.06 Regular Air -27.73 4.98

2857 20579 01-10-2012 Medium 16.0 1434.086 0.1 Regular Air -26.25 110.99

145 929 01-10-2012 High 21.0 227.66 0.04 Regular Air -100.16 10.97

146 929 01-10-2012 High 39.0 84.33 0.04 Regular Air -64.29 2.08

859 6150 01-10-2012 Critical 38.0 191.14 0.06 Regular Air 82.65 4.98

Time taken: 0.506 seconds, Fetched: 8 row(s)

hive> select Row_ID, cast(to_date(from_unixtime(unix_timestamp(Order_date, ‘dd-MM-yyyy’))) as date) from salesdata01 limit 10;

1 2010-10-13

49 2012-10-01

50 2012-10-01

80 2011-07-10

85 2010-08-28

86 2010-08-28

97 2011-06-17

98 2011-06-17

103 2011-03-24

107 2010-02-26

hive> select Row_ID, from_unixtime(unix_timestamp(Order_date, ‘dd-MM-yyyy’),’yyyy-MM-dd’) from salesdata01 limit 10;

1 2010-10-13

49 2012-10-01

50 2012-10-01

80 2011-07-10

85 2010-08-28

86 2010-08-28

97 2011-06-17

98 2011-06-17

103 2011-03-24

107 2010-02-26

Time taken: 0.157 seconds, Fetched: 10 row(s)

hive> select Row_ID, from_unixtime(unix_timestamp(Order_date, ‘dd-MM-yyyy’)) from salesdata01 limit 10;

1 2010-10-13 00:00:00

49 2012-10-01 00:00:00

50 2012-10-01 00:00:00

80 2011-07-10 00:00:00

85 2010-08-28 00:00:00

86 2010-08-28 00:00:00

97 2011-06-17 00:00:00

98 2011-06-17 00:00:00

103 2011-03-24 00:00:00

107 2010-02-26 00:00:00

Time taken: 0.09 seconds, Fetched: 10 row(s)

hive> select Row_ID, from_unixtime(unix_timestamp(Order_date, ‘dd-MM-yyyy’),’dd-MM-yyyy’) from salesdata01 limit 10;

1 13-10-2010

49 01-10-2012

50 01-10-2012

80 10-07-2011

85 28-08-2010

86 28-08-2010

97 17-06-2011

98 17-06-2011

103 24-03-2011

107 26-02-2010

Another example:

If you are trying to store the date and timestamp values in timestamp column using hive.The source file contain the values of date or sometimes timestamps.

Sample Data:

[saurkuma@m1 ~]$ cat sample.txt

1,2015-04-15 00:00:00

2,2015-04-16 00:00:00

3,2015-04-17

hive> create table table1 (id int,tsstr string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’LINES TERMINATED BY ‘\n’;

Time taken: 0.241 seconds

hive> LOAD DATA LOCAL INPATH ‘/home/saurkuma/sample.txt’ INTO TABLE table1;

Loading data to table demo.table1

Table demo.table1 stats: [numFiles=1, totalSize=57]

Time taken: 0.855 seconds

hive> select * from table1;

1 2015-04-15 00:00:00

2 2015-04-16 00:00:00

3 2015-04-17

Time taken: 0.097 seconds, Fetched: 3 row(s)

hive> create table table2 (id int,mytimestamp timestamp) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’;

Time taken: 0.24 seconds

hive> INSERT INTO TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,’ 00:00:00′)) from table1;

Query ID = saurkuma_20170316032711_63d9129a-38c1-4ae8-89f4-e158218d2587

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there’s no reduce operator

Starting Job = job_1489644687414_0001, Tracking URL = http://m2.hdp22:8088/proxy/application_1489644687414_0001/

Kill Command = /usr/hdp/2.3.4.0-3485/hadoop/bin/hadoop job -kill job_1489644687414_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-03-16 03:27:36,290 Stage-1 map = 0%, reduce = 0%

2017-03-16 03:27:55,806 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.89 sec

MapReduce Total cumulative CPU time: 1 seconds 890 msec

Ended Job = job_1489644687414_0001

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://TESTHA/apps/hive/warehouse/demo.db/table2/.hive-staging_hive_2017-03-16_03-27-11_740_404528501642205352-1/-ext-10000

Loading data to table demo.table2

Table demo.table2 stats: [numFiles=1, numRows=3, totalSize=66, rawDataSize=63]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.89 sec HDFS Read: 4318 HDFS Write: 133 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 890 msec

Time taken: 47.687 seconds

hive> select * from table2;

1 2015-04-15 00:00:00

2 2015-04-16 00:00:00

3 2015-04-17 00:00:00

Time taken: 0.119 seconds, Fetched: 3 row(s)

I hope this helped you to solve your problem and feel free to give your valuable feedback or suggestions.

March 14, 2017
1

Unix useful commands

Tags : /etc/sysconfig/network-scripts/ifcfg-eth0 NOPASSWD useradd userdel usermod

Category : Unix

Sometime we need a user who can do everything in our server as root does. So we may do the following:

Create a new user with the same privileges as root
Grant same same privileges to existing user as root

Case 1: Lets say we need to add a new user and grant him root privileges :

Use the following commands to create the new user temp, grand him the same privileges as root and set him a password :

[root@m1 ~]# useradd -ou 0 -g 0 temp

[root@m1 ~]# passwd temp

Changing password for user temp.

New password:

BAD PASSWORD: it is based on a dictionary word

BAD PASSWORD: is too simple

Retype new password:

passwd: all authentication tokens updated successfully.

We’ve just created the user temp, with UID 0 and GID 0, so he is in the same group and has the same permissions as root.

Case 2: Grant ROOT Privileges to an Existing USER:
Perhaps you already have some user temp and you would like to give root permissions to a normal user.

[root@m1 ~]# grep temp1 /etc/passwd

temp1:x:1006:1006::/home/temp1:/bin/bash

Solu 1:

Edit /etc/passwd file and grant root permissions to the user temp1 by changing User and Group IDs to UID 0 and GID 0.

Solu 2: Create a group and assign this existing user to that group. Also grant that group to sudo access.

[root@m1 ~]# groupadd test

[root@m1 ~]# usermod -g test temp1

[temp2@m1 ~]$ id temp1

uid=1006(temp1) gid=1007(test) groups=1007(test)

Edit /etc/sudoers file and add %test ALL=(ALL) NOPASSWD: ALL line to file.

[root@m1 ~]# grep -C4 test /etc/sudoers

# %wheel ALL=(ALL) ALL

## Same thing without a password

%wheel ALL=(ALL) NOPASSWD: ALL

%test ALL=(ALL) NOPASSWD: ALL

[root@m1 ~]# su temp1

[temp1@m1 ~]$ sudo su – hdfs

[hdfs@m1 ~]$ exit

logout

[temp1@m1 ~]$ sudo su – root

[root@m1 ~]# exit

logout

Delete a USER Account with UID 0 : You won’t be able to delete second root user with another UID 0 using userdel command.
[root@m1 ~]# userdel temp
userdel: user temp is currently used by process 1

To delete user temp with UID 0, open /etc/passwd file and change temp’s UID.
[root@m1 ~]# vi /etc/passwd
[root@m1 ~]# id temp
temp:x:1111:0::/home/temp:/bin/sh

Now, you’ll be able to delete user temp with userdel command :
[root@m1 ~]# userdel temp
[root@m1 ~]# id temp

id: temp: No such user

How to make sure /etc/resolv.conf Never Get Updated By DHCP Client in centos 6 :

I using GNU/Linux with the Internet Systems Consortium DHCP Client. It also updates my /etc/resolv.conf file each time my laptop connects to different network or after restart machine. I would like to keep my existing nameservers. How do I skip /etc/resolv.conf update on a Linux based system?

The DHCP protocol allows a host to contact a central server which maintains a list of IP addresses which may be assigned on one or more subnets. This protocol reduces system administration workload, allowing devices to be added to the network with little or no manual configuration. There are various method to fix this issue but I would prefer to use the following one.

We have to modify our interface configuration file such as /etc/sysconfig/network-scripts/ifcfg-eth0 file and append the following option:

[root@m1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

TYPE=Ethernet

ONBOOT=yes

NM_CONTROLLED=yes

BOOTPROTO=dhcp

HWADDR=08:00:27:90:1E:98

DEFROUTE=yes

PEERDNS=NO ## change it to No from Yes and the following DNS accordingly.

DNS1=192.168.56.104

DNS2=168.244.212.13

DNS3=168.244.217.13

PEERROUTES=yes

IPV4_FAILURE_FATAL=yes

IPV6INIT=no

NAME=”System eth0″

Save and close the file. Where,

1. PEERDNS=yes|no – Modify /etc/resolv.conf if peer uses msdns extension (PPP only) or DNS{1,2} are set, or if using dhclient. default to “yes”.

2. DNS{1,2}=<ip address> – Provide DNS addresses that are dropped into the resolv.conf file if PEERDNS is not set to “no”.

I hope this will help you, please feel free to give your valuable suggestion or feedback.

March 10, 2017
0

Oozie server failing with error “cannot load JDBC driver class ‘com.mysql.jdbc.Driver'”

Tags : Cannot load JDBC driver class 'com.mysql.jdbc.Driver'com.mysql.jdbc.Driver Could not load service classes PersistenceException

Category : Oozie

Issue : Oozie server is failing with following error :

FATAL Services:514 – SERVER[m2.hdp22] E0103: Could not load service classes, Cannot load JDBC driver class ‘com.mysql.jdbc.Driver’
org.apache.oozie.service.ServiceException: E0103: Could not load service classes, Cannot load JDBC driver class ‘com.mysql.jdbc.Driver’
at org.apache.oozie.service.Services.loadServices(Services.java:309)
at org.apache.oozie.service.Services.init(Services.java:213)
at org.apache.oozie.servlet.ServicesLoader.contextInitialized(ServicesLoader.java:46)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4210)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4709)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:802)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:583)
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:676)
at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:602)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:503)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1322)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:325)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1068)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:822)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1060)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
at org.apache.catalina.core.StandardService.start(StandardService.java:525)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:759)
at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: <openjpa-2.2.2-r422266:1468616 fatal general error> org.apache.openjpa.persistence.PersistenceException: Cannot load JDBC driver class ‘com.mysql.jdbc.Driver’
at org.apache.openjpa.jdbc.sql.DBDictionaryFactory.newDBDictionary(DBDictionaryFactory.java:102)
at org.apache.openjpa.jdbc.conf.JDBCConfigurationImpl.getDBDictionaryInstance(JDBCConfigurationImpl.java:603)
at org.apache.openjpa.jdbc.meta.MappingRepository.endConfiguration(MappingRepository.java:1518)
at org.apache.openjpa.lib.conf.Configurations.configureInstance(Configurations.java:531)
at org.apache.openjpa.lib.conf.Configurations.configureInstance(Configurations.java:456)
at org.apache.openjpa.lib.conf.PluginValue.instantiate(PluginValue.java:120)
at org.apache.openjpa.conf.MetaDataRepositoryValue.instantiate(MetaDataRepositoryValue.java:68)
at org.apache.openjpa.lib.conf.ObjectValue.instantiate(ObjectValue.java:83)
at org.apache.openjpa.conf.OpenJPAConfigurationImpl.newMetaDataRepositoryInstance(OpenJPAConfigurationImpl.java:967)
at org.apache.openjpa.conf.OpenJPAConfigurationImpl.getMetaDataRepositoryInstance(OpenJPAConfigurationImpl.java:958)
at org.apache.openjpa.kernel.AbstractBrokerFactory.makeReadOnly(AbstractBrokerFactory.java:644)
at org.apache.openjpa.kernel.AbstractBrokerFactory.newBroker(AbstractBrokerFactory.java:203)
at org.apache.openjpa.kernel.DelegatingBrokerFactory.newBroker(DelegatingBrokerFactory.java:156)
at org.apache.openjpa.persistence.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:227)
at org.apache.openjpa.persistence.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:154)
at org.apache.openjpa.persistence.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:60)
at org.apache.oozie.service.JPAService.getEntityManager(JPAService.java:500)
at org.apache.oozie.service.JPAService.init(JPAService.java:201)
at org.apache.oozie.service.Services.setServiceInternal(Services.java:386)
at org.apache.oozie.service.Services.setService(Services.java:372)
at org.apache.oozie.service.Services.loadServices(Services.java:305)
… 26 more
Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot load JDBC driver class ‘com.mysql.jdbc.Driver’
at org.apache.commons.dbcp.BasicDataSource.createConnectionFactory(BasicDataSource.java:1429)
at org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1371)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at org.apache.openjpa.lib.jdbc.DelegatingDataSource.getConnection(DelegatingDataSource.java:110)
at org.apache.openjpa.lib.jdbc.DecoratingDataSource.getConnection(DecoratingDataSource.java:87)
at org.apache.openjpa.jdbc.sql.DBDictionaryFactory.newDBDictionary(DBDictionaryFactory.java:91)
… 46 more
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1680)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1526)
at org.apache.commons.dbcp.BasicDataSource.createConnectionFactory(BasicDataSource.java:1420)
… 51 more

Root Cause:

Mysql driver is not located in the class path for oozie server to use.

Solution: You need to copy mysql jdbc driver to required location.

[root@m2 oozie]# cp /usr/share/java/mysql-connector-java.jar /usr/hdp/2.3.4.0-3485/oozie/oozie-server/webapps/oozie/WEB-INF/lib/

Now restart your oozie server and it will be fine.

I hope it helped you to solve your issue, please feel free to give your valuable feedback or suggestion.

Monthly Archives: March 2017

0

hadoop snapshots

2

Ssh action with oozie

0

How to remove header from csv during loading to hive

2

Insert date into hive tables shows null during select

1

Unix useful commands

0

Oozie server failing with error “cannot load JDBC driver class ‘com.mysql.jdbc.Driver'”

Recent Posts

Recent Comments

Archives