Monthly Archives: January 2017

  • 0

Some helpful Tips

1. How-to-run-a-hive-query-using-yesterdays-date

Use from_unixtime(unix_timestamp()-1*60*60*24, ‘yyyy-MM-dd’); in your hive query.

For example:

select * from sample where date1=from_unixtime(unix_timestamp()-1*60*60*24, ‘yyyy-MM-dd’);

2. How to diff file(s) in HDFS

How to diff a file in HDFS and a file in the local filesystem:
diff <(hadoop fs -cat /path/to/file) /path/to/localfile

How to diff two files in HDFS:
diff <(hadoop fs -cat /path/to/file1) <(hadoop fs -cat /path/to/file2)


  • 0

Run Pig Script in Nifi

Category : Nifi , Pig

NiFi can interface directly with Hive, HDFS, HBase, Flume and Phoenix. And I can also trigger Spark and Flink through Kafka and Site-To-Site. Sometimes I need to run some Pig scripts. Apache Pig is very stable and has a lot of functions and tools that make for some smart processing. You can easily augment and add this piece to a larger pipeline or part of the process.

Pig Setup

I like to use Ambari to install the HDP 2.5 clients on my NiFi box to have access to all the tools I may need.

Then I can just do:

$ yum install pig

9098-pig1

9099-pig2

ExecuteProcess

We call a shell script that wraps the Pig script.

Output of script is stored to HDFS: hdfs dfs -ls /nifi-logs

Shell Script

$ export JAVA_HOME=/opt/jdk1.8.0_101/

$ pig x local l /tmp/pig.log f /opt/demo/pigscripts/test.pig

You can run in different Pig modes like local, mapreduce and tez. You can also pass in parameters or the script.

Pig Script

messages = LOAD ‘/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log’;

warns = FILTER messages BY $0 MATCHES ‘.*WARN+.*’;

DUMP warns

store warns into ‘warns.out’

This is a basic example from the internet, with the NIFI 1.0 log used as the source.

As an aside, I run a daily script with the schedule 1 * * * * ? to clean up my logs.

Simply: /bin/rm -rf /opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/*2016*

PutHDFS

Hadoop Configuration: /etc/hadoop/conf/core-site.xml

Pick a directory and store away.

Result:

HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures

2.7.3.2.5.0.012450.16.0.2.5.0.01245root20161103 19:53:5720161103 19:53:59FILTER

Success!

Job Stats (time in seconds):

JobIdMapsReducesMaxMapTimeMinMapTimeAvgMapTimeMedianMapTimeMaxReduceTimeMinReduceTimeAvgReduceTimeMedianReducetimeAliasFeatureOutputs

job_local72884441_000110n/an/an/an/a0000messages,warnsMAP_ONLYfile:/tmp/temp1540654561/tmp600070101,

Input(s):

Successfully read 30469 records from: “/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log”

Output(s):

Successfully stored 1347 records in: “file:/tmp/temp1540654561/tmp-600070101”

Counters:

Total records written : 1347

Total bytes written : 0

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

Job DAG:

job_local72884441_0001

 

ref : https://community.hortonworks.com/articles/64844/running-apache-pig-scripts-from-apache-nifi-and-st.html