Import & Export in Hive
Category : Bigdata
When we work on Hive, there would be lots of scenarios that we need to move data(i.e tables from one cluster to another cluster) from one cluster to another.
For example, sometimes we need to copy some production table from one cluster to another cluster. Now we have got very good functionality in hive which give us two easy commands to do it.
Version 0.8 onwards, Hive supports EXPORT and IMPORT features that allows us to export the metadata as well as the data for the corresponding table to a directory in HDFS, which can then be imported back to another database or Hive instance.
Now with the help of following example I have tried to copy cars_beeline from cluster A to Cluster B :
Cluster A:
hive>show databases;
OK
admintestdb
default
kmsdatabase
kmstest
samplebeelinetest
samplehivetest
sampletest
sampletest1
Time taken: 2.911 seconds, Fetched: 14 row(s)
hive> use samplebeelinetest;
OK
Time taken: 0.287 seconds
hive> show tables;
OK
cars
cars_beeline
cars_internal
i0014_itm_typ
test
Time taken: 0.295 seconds, Fetched: 5 row(s)
hive> select * from cars_beeline limit 1;
OK
Name NULL NULL NULL NULL NULL NULL NULL O
Time taken: 1.118 seconds, Fetched: 1 row(s)
hive> select * from cars_beeline limit 10;
OK
Name NULL NULL NULL NULL NULL NULL NULL O
“chevrolet chevelle malibu” 18 8 307 130 3504 12 1970-01-01 A
“buick skylark 320” 15 8 350 165 3693 12 1970-01-01 A
“plymouth satellite” 18 8 318 150 3436 11 1970-01-01 A
“amc rebel sst” 16 8 304 150 3433 12 1970-01-01 A
“ford torino” 17 8 302 140 3449 11 1970-01-01 A
Time taken: 0.127 seconds, Fetched: 6 row(s)
hive> export table cars_beeline to ‘/tmp/cars_beeline’;
Copying data from file:/tmp/s0998dnz/0bd6949f-c28c-4113-a9ab-eeaea4dcd434/hive_2016-07-11_05-41-39_786_4427147069708259788-1/-local-10000/_metadata
Copying file: file:/tmp/s0998dnz/0bd6949f-c28c-4113-a9ab-eeaea4dcd434/hive_2016-07-11_05-41-39_786_4427147069708259788-1/-local-10000/_metadata
Copying data from hdfs://HDPCLUSTERAHA/zone_encr2/data
Copying file: hdfs://HDPCLUSTERAHA/zone_encr2/data/cars.csv
OK
Time taken: 0.52 seconds
hive> dfs -ls /tmp/cars_beeline;
Found 2 items
-rwxrwxrwx 3 s0998dnz hdfs 1701 2016-07-11 05:41 /tmp/cars_beeline/_metadata
drwxrwxrwx – s0998dnz hdfs 0 2016-07-11 05:41 /tmp/cars_beeline/data
Now use distcp to copy that dir from Cluster A to Cluster B:
[root@server1 ~]$ hadoop distcp hdfs://HDPCLUSTERAHA/tmp/cars_beeline hdfs://HDPCLUSTERBHA/tmp/cars_beeline
16/07/11 05:43:09 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile=’null’, copyStrategy=’uniformsize’, sourceFileListing=null, sourcePaths=[hdfs://HDPCLUSTERAHA/tmp/cars_beeline], targetPath=hdfs://HDPCLUSTERAHA/tmp/cars_beeline, targetPathExists=false, preserveRawXattrs=false}
16/07/11 05:43:09 INFO impl.TimelineClientImpl: Timeline service address: http://server2:8188/ws/v1/timeline/
16/07/11 05:43:11 INFO impl.TimelineClientImpl: Timeline service address: http://server2:8188/ws/v1/timeline/
16/07/11 05:43:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
16/07/11 05:43:11 INFO mapreduce.JobSubmitter: number of splits:4
16/07/11 05:43:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468229364042_0002
16/07/11 05:43:11 INFO impl.YarnClientImpl: Submitted application application_1468229364042_0002
16/07/11 05:43:11 INFO mapreduce.Job: The url to track the job: http://server1:8088/proxy/application_1468229364042_0002/
16/07/11 05:43:11 INFO tools.DistCp: DistCp job-id: job_1468229364042_0002
16/07/11 05:43:11 INFO mapreduce.Job: Running job: job_1468229364042_0002
16/07/11 05:43:25 INFO mapreduce.Job: Job job_1468229364042_0002 running in uber mode : false
16/07/11 05:43:25 INFO mapreduce.Job: map 0% reduce 0%
16/07/11 05:43:31 INFO mapreduce.Job: map 75% reduce 0%
16/07/11 05:43:35 INFO mapreduce.Job: map 100% reduce 0%
16/07/11 05:43:36 INFO mapreduce.Job: Job job_1468229364042_0002 completed successfully
Run following steps on target cluster B:
[root@server1ofclusterb ~]$ hadoop fs -ls /tmp/cars_beeline;
Found 2 items
-rw-r–r– 3 s0998dnz hdfs 1701 2016-07-11 05:43 /tmp/cars_beeline/_metadata
drwxr-xr-x – s0998dnz hdfs 0 2016-07-11 05:43 /tmp/cars_beeline/data
[root@server1ofclusterb ~]$ hive
WARNING: Use “yarn jar” to launch YARN applications.
16/07/11 05:45:22 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/07/11 05:45:22 WARN conf.HiveConf: HiveConf of name hive.server2.http.port does not exist
Logging initialized using configuration in file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties
hive> create database testing;
OK
Time taken: 0.342 seconds
hive> use testing;
OK
Time taken: 0.411 seconds
hive> IMPORT TABLE cars_beeline from ‘/tmp/cars_beeline’;
Copying data from hdfs://HDPCLUSTERBHA/tmp/cars_beeline/data
Copying file: hdfs://HDPCLUSTERBHA/tmp/cars_beeline/data/cars.csv
Loading data to table testing.cars_beeline
OK
Time taken: 1.196 seconds
hive> show tables;
OK
cars_beeline
Time taken: 0.246 seconds, Fetched: 1 row(s)
hive> select * from cars_beeline limit 10;
OK
Name NULL NULL NULL NULL NULL NULL NULL O
“chevrolet chevelle malibu” 18 8 307 130 3504 12 1970-01-01 A
“buick skylark 320” 15 8 350 165 3693 12 1970-01-01 A
“plymouth satellite” 18 8 318 150 3436 11 1970-01-01 A
“amc rebel sst” 16 8 304 150 3433 12 1970-01-01 A
“ford torino” 17 8 302 140 3449 11 1970-01-01 A
Time taken: 0.866 seconds, Fetched: 6 row(s)
hive>
I hope it will help you to move table from one cluster to another cluster. Please fell free to give your suggestion to improve this article.
1 Comment
MD Furqan
June 19, 2018 at 9:48 amHi
this steps is generally to avoid the data loss when metadata is lost ?