Category Archives: Hive

  • 3

How to use Hive Query result in a variable for other query

Category : Hive

Many time we want to store one query result into a variable and then use this variable in some other query. So now it is possible in your favorite hadoop ecosystem i.e hive.

With the help of this article you can achieve it.

[root@m1 etc]# hive

16/10/04 02:40:45 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/10/04 02:40:45 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist

16/10/04 02:40:45 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/10/04 02:40:45 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Step1: Check table test1 description:
hive> desc Table test1;
OK
columnA date
Time taken: 0.452 seconds, Fetched: 1 row(s)

Step2: Table test2 description:
hive> desc Table test2;
OK
columnB date
Time taken: 0.516 seconds, Fetched: 1 row(s)

Step3: Assigning subquery to variable:
hive> set hivevar:var1=(select columnB from Table test2 where columnB=’2016-09-19′);

Step4: Running main query with subquery:
hive> select * from Table test1 where columnA IN ${hivevar:var1};

 

 


  • 0

hive date time issue

Many time when we load data into hive tables and if we have a date & time field in our data then we may have seen an issue with getting data field. So to solve this issue I have created this article and explained steps in details.

I have the following sample input file(a.txt)

a,20-11-2015 22:07

b,17-08-2015 09:45

I created the table in hive

hive> create table mytime(a string, b timestamp) row format delimited fields terminated by ‘,’;

hive> load data local inpath ‘/root/a.txt’ overwrite into table mytime;

hive> select* from mytime;

7404-untitled

So now you saw we will get null value only. To overcome this situation we need an additional, temporary table to read your input file, and then some date conversion:

hive> create table tmp(a string, b string) row format delimited fields terminated by ‘,’;
hive> load data local inpath ‘a.txt’ overwrite into table tmp;
hive> create table mytime(a string, b timestamp);
hive> insert into table mytime select a, from_unixtime(unix_timestamp(b, ‘dd-MM-yyyy HH:mm’)) from tmp;
hive> select * from mytime;
a 2015-11-20 22:07:00
b 2015-08-17 09:45:00

I hope this article will help you to get your work done in effective manner.


  • 1

Change default permission of hive database

Category : Hive

When you create a database or internal tables in hive cli then by default it creates with 777 permission.Even though if you have umask in hdfs then also it will be same permission. But now you can change it with the help of following steps.

1.From the command line in the Ambari server node, edit the file

vi /var/lib/ambariserver/resources/commonservices/HIVE/0.12.0.2.0/package/scripts/hive.py

Search for hive_apps_whs_dir which should go to this block:

params.HdfsResource(params.hive_apps_whs_dir,

type=“directory”,

action=“create_on_execute”,

owner=params.hive_user,

group=params.user_group,

mode=0755

)

2. Modify the value for mode from 0777 to the desired permission, for example 0750.Save and close the file.

3. Restart the Ambari server to propagate the change to all nodes in the cluster:

ambariserver restart

4. From the Ambari UI, restart HiveServer2 to apply the new permission to the warehouse directory. If multiple HiveServer2 instances are configured, any one instance can be restarted.

hive> create database test2;

OK

Time taken: 0.156 seconds

hive> dfs -ls /apps/hive/warehouse;

Found 9 items

drwxrwxrwx   – hdpuser hdfs          0 2016-09-08 01:54 /apps/hive/warehouse/test.db

drwxr-xr-x   -hdpuser hdfs          0 2016-09-08 02:04 /apps/hive/warehouse/test1.db

drwxr-x—   -hdpuser hdfs          0 2016-09-08 02:09 /apps/hive/warehouse/test2.db

I hope this will help you to serve your purpose.


  • 0

Enable Debug mode for hive in Ambari

Category : Hive

Many time we see that during troubleshoot we do not find much information if we are just default logger. So no worries I will help you to guide how to enable debug mode in logs or on your console.

Case 1: Use the following command to start hive: Set follwoing property to turn on debug mode on console. 
hive -hiveconf hive.root.logger=ALL,console
It will log all messages to console.

Case 2: In case if you want to write you log in log file only for each and every steps which user or job does then you can use following process.

  • Login to Ambari portal and click on Hive service.
  • Goto config and search for hive.root.logger string in search box.
  • Once you will then you need to change default value(hive.root.logger=INFO,DFRA) to hive.root.logger=DEBUG,DFRA

Note: It may fill your log dir soon so please keep deleting old file or keep maintain log dir.

 


  • 2

Tez job fails with ‘vertex failure’ error

When you run your hive job on tez execution engine then you may see job failure due to ‘vertex failure’ error. Or you may see following error in your logs.

Vertex failed, vertexName=Reducer 34, vertexId=vertex_1424999265634_0222_1_23, diagnostics=[Task failed, taskId=task_1424999265634_01422_1_23_000008, diagnostics=[AttemptID:attempt_1424999265634_01422_1_23_000008_0 Info:Error: java.lang.RuntimeException: java.lang.RuntimeException: Reduce operator initialization failed 
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:564)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:553)
Caused by: java.lang.RuntimeException: Reduce operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:191)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:164)
… 6 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: : init not supported
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFStreamingEvaluator.init(GenericUDAFStreamingEvaluator.java:70)
at org.apache.hadoop.hive.ql.plan.PTFDeserializer.setupWdwFnEvaluator(PTFDeserializer.java:209)
at org.apache.hadoop.hive.ql.plan.PTFDeserializer.initializeWindowing(PTFDeserializer.java:130)
at org.apache.hadoop.hive.ql.plan.PTFDeserializer.initializePTFChain(PTFDeserializer.java:94)
at org.apache.hadoop.hive.ql.exec.PTFOperator.reconstructQueryDef(PTFOperator.java:145)
at org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:74)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:160)

This error is because Tez containers are not allocating enough memory to run the query.

Resolution: So now to solve this issue you have to increase memory for resources with adjusting following parameters .

tez.am.resource.memory.mb=4096 
tez.am.java.opts=-server -Xmx3276m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC 
hive.tez.container.size=4096 
hive.tez.java.opts=-server -Xmx3276m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC 

 


  • 0

heap size issue in Hive Metastore

Category : Hive

Sometime during your job running you may see job failure due to heap size. It might be because of metastore heap issue. It is encountering OutOfMemory errors, or is known to be insufficient to handle the cluster workload.

Resolution: To fix this issue you have to increase heap size for metastore in hive-end.sh (or hive-end.cmd) file. 

  • If the cluster is managed by Ambari, edit the configuration for the hive-end.sh or .cmd file in the Configuration tab for the Hive Service.
  • If the cluster is not managed by Ambari, edit the file directly and distribute it throughout the cluster
  • Locate the string that looks like this

export HADOOP_CLIENT_OPTS=”-Xmx${HADOOP_HEAPSIZE}m $HADOOP_CLIENT_OPTS ${HIVEMETASTORE_JMX_OPTS}”​ 

  • Change the last line (covering the HADOOP_CLIENT_OPTS to declare the requested heap size. For example, to allocate up to 2048m of heap, the last line would be changed to this

export HADOOP_CLIENT_OPTS=”$HADOOP_CLIENT_OPTS ${HIVEMETASTORE_JMX_OPTS} –Xmx2048m