How to remove header from csv during loading to hive
Category : Hive
Sometime we may have header in our data file and we do not want that header to loaded into our hive table or we want to ignore header then this article will help you.
[saurkuma@m1 ~]$ cat sampledata.csv
id,Name
1,Saurabh
2,Vishal
3,Jeba
4,Sonu
Step 1: Create a table with table properties to ignore it.
hive> create table test(id int,name string) row format delimited fields terminated by ‘,’ tblproperties(“skip.header.line.count”=”1”) ;
OK
Time taken: 0.233 seconds
hive> show tables;
OK
salesdata01
table1
table2
test
tmp
Time taken: 0.335 seconds, Fetched: 5 row(s)
hive> load data local inpath ‘/home/saurkuma/sampledata.csv’ overwrite into table test;
Loading data to table demo.test
Table demo.test stats: [numFiles=1, totalSize=41]
OK
Time taken: 0.979 seconds
hive> select * from test;
OK
1 Saurabh
2 Vishal
3 Jeba
4 Sonu
Time taken: 0.111 seconds, Fetched: 4 row(s)
To remove header in Pig:
A=load ‘sampledata.csv’ using PigStorage(‘,’);
B=FILTER A BY $0>1;
I hope this helped you to do your job in easy way. Please feel free to give your valuable suggestion or feedback.