Import & Export in Hive

  • 1

Import & Export in Hive

Category : Bigdata

When we work on Hive, there would be lots of scenarios that we need to move data(i.e tables from one cluster to another cluster) from one cluster to another.

For example, sometimes we need to copy some production table from one cluster to another cluster. Now we have got very good functionality in hive which give us two easy commands to do it.

Version 0.8 onwards, Hive supports EXPORT and IMPORT features that allows us to export the metadata as well as the data for the corresponding table to a directory in HDFS, which can then be imported back to another database or Hive instance.

Now with the help of following example I have tried to copy cars_beeline from cluster A to Cluster B :

Cluster A: 

hive>show databases;

OK

admintestdb

default

kmsdatabase

kmstest

samplebeelinetest

samplehivetest

sampletest

sampletest1

Time taken: 2.911 seconds, Fetched: 14 row(s)

hive> use samplebeelinetest;

OK

Time taken: 0.287 seconds

hive> show tables;

OK

cars

cars_beeline

cars_internal

i0014_itm_typ

test

Time taken: 0.295 seconds, Fetched: 5 row(s)

hive> select * from cars_beeline limit 1;

OK

Name NULL NULL NULL NULL NULL NULL NULL O

Time taken: 1.118 seconds, Fetched: 1 row(s)

hive> select * from cars_beeline limit 10;

OK

Name NULL NULL NULL NULL NULL NULL NULL O

“chevrolet chevelle malibu” 18 8 307 130 3504 12 1970-01-01 A

“buick skylark 320” 15 8 350 165 3693 12 1970-01-01 A

“plymouth satellite” 18 8 318 150 3436 11 1970-01-01 A

“amc rebel sst” 16 8 304 150 3433 12 1970-01-01 A

“ford torino” 17 8 302 140 3449 11 1970-01-01 A

Time taken: 0.127 seconds, Fetched: 6 row(s)

hive> export table cars_beeline to ‘/tmp/cars_beeline’;

Copying data from file:/tmp/s0998dnz/0bd6949f-c28c-4113-a9ab-eeaea4dcd434/hive_2016-07-11_05-41-39_786_4427147069708259788-1/-local-10000/_metadata

Copying file: file:/tmp/s0998dnz/0bd6949f-c28c-4113-a9ab-eeaea4dcd434/hive_2016-07-11_05-41-39_786_4427147069708259788-1/-local-10000/_metadata

Copying data from hdfs://HDPCLUSTERAHA/zone_encr2/data

Copying file: hdfs://HDPCLUSTERAHA/zone_encr2/data/cars.csv

OK

Time taken: 0.52 seconds

hive> dfs -ls /tmp/cars_beeline;

Found 2 items

-rwxrwxrwx   3 s0998dnz hdfs       1701 2016-07-11 05:41 /tmp/cars_beeline/_metadata

drwxrwxrwx   – s0998dnz hdfs          0 2016-07-11 05:41 /tmp/cars_beeline/data

Now use distcp to copy that dir from Cluster A to Cluster B: 

[root@server1 ~]$ hadoop distcp hdfs://HDPCLUSTERAHA/tmp/cars_beeline hdfs://HDPCLUSTERBHA/tmp/cars_beeline

16/07/11 05:43:09 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile=’null’, copyStrategy=’uniformsize’, sourceFileListing=null, sourcePaths=[hdfs://HDPCLUSTERAHA/tmp/cars_beeline], targetPath=hdfs://HDPCLUSTERAHA/tmp/cars_beeline, targetPathExists=false, preserveRawXattrs=false}

16/07/11 05:43:09 INFO impl.TimelineClientImpl: Timeline service address: http://server2:8188/ws/v1/timeline/

16/07/11 05:43:11 INFO impl.TimelineClientImpl: Timeline service address: http://server2:8188/ws/v1/timeline/

16/07/11 05:43:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2

16/07/11 05:43:11 INFO mapreduce.JobSubmitter: number of splits:4

16/07/11 05:43:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468229364042_0002

16/07/11 05:43:11 INFO impl.YarnClientImpl: Submitted application application_1468229364042_0002

16/07/11 05:43:11 INFO mapreduce.Job: The url to track the job: http://server1:8088/proxy/application_1468229364042_0002/

16/07/11 05:43:11 INFO tools.DistCp: DistCp job-id: job_1468229364042_0002

16/07/11 05:43:11 INFO mapreduce.Job: Running job: job_1468229364042_0002

16/07/11 05:43:25 INFO mapreduce.Job: Job job_1468229364042_0002 running in uber mode : false

16/07/11 05:43:25 INFO mapreduce.Job:  map 0% reduce 0%

16/07/11 05:43:31 INFO mapreduce.Job:  map 75% reduce 0%

16/07/11 05:43:35 INFO mapreduce.Job:  map 100% reduce 0%

16/07/11 05:43:36 INFO mapreduce.Job: Job job_1468229364042_0002 completed successfully

Run following steps on target cluster B:

[root@server1ofclusterb ~]$ hadoop fs -ls /tmp/cars_beeline;

Found 2 items

-rw-r–r–   3 s0998dnz hdfs       1701 2016-07-11 05:43 /tmp/cars_beeline/_metadata

drwxr-xr-x   – s0998dnz hdfs          0 2016-07-11 05:43 /tmp/cars_beeline/data

[root@server1ofclusterb ~]$ hive

WARNING: Use “yarn jar” to launch YARN applications.

16/07/11 05:45:22 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/07/11 05:45:22 WARN conf.HiveConf: HiveConf of name hive.server2.http.port does not exist

Logging initialized using configuration in file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties

hive> create database testing;

OK

Time taken: 0.342 seconds

hive> use testing;

OK

Time taken: 0.411 seconds

hive> IMPORT TABLE cars_beeline from ‘/tmp/cars_beeline’;

Copying data from hdfs://HDPCLUSTERBHA/tmp/cars_beeline/data

Copying file: hdfs://HDPCLUSTERBHA/tmp/cars_beeline/data/cars.csv

Loading data to table testing.cars_beeline

OK

Time taken: 1.196 seconds

hive> show tables;

OK

cars_beeline

Time taken: 0.246 seconds, Fetched: 1 row(s)

hive> select * from cars_beeline limit 10;

OK

Name NULL NULL NULL NULL NULL NULL NULL O

“chevrolet chevelle malibu” 18 8 307 130 3504 12 1970-01-01 A

“buick skylark 320” 15 8 350 165 3693 12 1970-01-01 A

“plymouth satellite” 18 8 318 150 3436 11 1970-01-01 A

“amc rebel sst” 16 8 304 150 3433 12 1970-01-01 A

“ford torino” 17 8 302 140 3449 11 1970-01-01 A

Time taken: 0.866 seconds, Fetched: 6 row(s)

hive>

I hope it will help you to move table from one cluster to another cluster. Please fell free to give your suggestion to improve this article.


1 Comment

MD Furqan

June 19, 2018 at 9:48 am

Hi

this steps is generally to avoid the data loss when metadata is lost ?

Leave a Reply