Category Archives: HDFS

  • 0

Namenode may keep crashing due to excessive logging

Namenode may keep crashing even if you restart all services and you have enough heap size. And you see following error in logs.

java.io.IOException: IPC’s epoch 197 is less than the last promised epoch 198

or

2017-09-28 09:16:11,371 INFO ha.ZKFailoverController (ZKFailoverController.java:setLastHealthState(851)) – Local service NameNode at m1.hdp22 entered state: SERVICE_NOT_RESPONDING 

Root Cause: In my case it was because too much logging was happening in namenode for Blockstatechange and hdfs.statechange. If the logging is constantly occurring nonstop, the NameNode takes time to respond to other rpc requests. Hence we need to increase the NN log level (from INFO to WARN) for certain classes to take some load off the namenode.

Solution: Increased the log level for two classes: Added the below in hdfs log4j using Ambari (Ambari UI > HDFS > Config > Advanced hdfs-log4j)

log4j.logger.BlockStateChange=ERROR
log4j.logger.org.apache.hadoop.hdfs.StateChange=ERROR


  • 0

hadoop cluster Benchmarking and Stress Testing

When we install our cluster then we should do some benchmarking or Stress Testing. So in this article I have explained a inbuilt TestDFSIO functionality which will help you to to perform Stress Testing on your configured cluster.

The Hadoop distribution comes with a number of benchmarks, which are bundled in hadoop-*test*.jar and hadoop-*examples*.jar.

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-*test*.jar
Unknown program '/usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar' chosen.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-*example*.jar
Unknown program '/usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-examples.jar' chosen.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

The TestDFSIO benchmark is a read and write test for HDFS. It is helpful for tasks such as stress testing HDFS, to discover performance bottlenecks in your network, to shake out the hardware, OS and Hadoop setup of your cluster machines (particularly the NameNode and the DataNodes) and to give you a first impression of how fast your cluster is in terms of I/O.

From the command line, run the following command to test writing of 10 output files of size 500MB for a total of 5GB:

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 50
17/05/29 03:29:19 INFO fs.TestDFSIO: TestDFSIO.1.8
17/05/29 03:29:19 INFO fs.TestDFSIO: nrFiles = 10
17/05/29 03:29:19 INFO fs.TestDFSIO: nrBytes (MB) = 50.0
17/05/29 03:29:19 INFO fs.TestDFSIO: bufferSize = 1000000
17/05/29 03:29:19 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/05/29 03:29:21 INFO fs.TestDFSIO: creating control file: 52428800 bytes, 10 files
17/05/29 03:29:23 INFO fs.TestDFSIO: created control files for: 10 files
17/05/29 03:29:23 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:29:23 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:29:23 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
17/05/29 03:29:23 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
17/05/29 03:29:23 INFO mapred.FileInputFormat: Total input paths to process : 10
17/05/29 03:29:23 INFO mapreduce.JobSubmitter: number of splits:10
17/05/29 03:29:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494832799027_0142
17/05/29 03:29:24 INFO impl.YarnClientImpl: Submitted application application_1494832799027_0142
17/05/29 03:29:24 INFO mapreduce.Job: The url to track the job: http://m2.hdp22:8088/proxy/application_1494832799027_0142/
17/05/29 03:29:24 INFO mapreduce.Job: Running job: job_1494832799027_0142
17/05/29 03:29:31 INFO mapreduce.Job: Job job_1494832799027_0142 running in uber mode : false
17/05/29 03:29:31 INFO mapreduce.Job: map 0% reduce 0%
17/05/29 03:29:46 INFO mapreduce.Job: map 30% reduce 0%
17/05/29 03:29:47 INFO mapreduce.Job: map 50% reduce 0%
17/05/29 03:29:48 INFO mapreduce.Job: map 60% reduce 0%
17/05/29 03:29:51 INFO mapreduce.Job: map 80% reduce 0%
17/05/29 03:29:52 INFO mapreduce.Job: map 90% reduce 0%
17/05/29 03:29:53 INFO mapreduce.Job: map 100% reduce 0%
17/05/29 03:29:54 INFO mapreduce.Job: map 100% reduce 100%
17/05/29 03:29:54 INFO mapreduce.Job: Job job_1494832799027_0142 completed successfully
17/05/29 03:29:55 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=835
FILE: Number of bytes written=1717691
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2290
HDFS: Number of bytes written=524288077
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=103814
Total time spent by all reduces in occupied slots (ms)=7846
Total time spent by all map tasks (ms)=103814
Total time spent by all reduce tasks (ms)=3923
Total vcore-milliseconds taken by all map tasks=103814
Total vcore-milliseconds taken by all reduce tasks=3923
Total megabyte-milliseconds taken by all map tasks=212611072
Total megabyte-milliseconds taken by all reduce tasks=16068608
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=729
Map output materialized bytes=889
Input split bytes=1170
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=889
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=4456
CPU time spent (ms)=59400
Physical memory (bytes) snapshot=15627186176
Virtual memory (bytes) snapshot=43288719360
Total committed heap usage (bytes)=16284385280
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=77
17/05/29 03:29:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
17/05/29 03:29:55 INFO fs.TestDFSIO: Date & time: Mon May 29 03:29:55 EDT 2017
17/05/29 03:29:55 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:29:55 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:29:55 INFO fs.TestDFSIO: Throughput mb/sec: 50.73566717402334
17/05/29 03:29:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 52.77006149291992
17/05/29 03:29:55 INFO fs.TestDFSIO: IO rate std deviation: 11.648531487475152
17/05/29 03:29:55 INFO fs.TestDFSIO: Test exec time sec: 31.779
17/05/29 03:29:55 INFO fs.TestDFSIO:

From the command line, run the following command to test reading 10 input files of size 500MB:

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 500
17/05/29 03:30:29 INFO fs.TestDFSIO: TestDFSIO.1.8
17/05/29 03:30:29 INFO fs.TestDFSIO: nrFiles = 10
17/05/29 03:30:29 INFO fs.TestDFSIO: nrBytes (MB) = 500.0
17/05/29 03:30:29 INFO fs.TestDFSIO: bufferSize = 1000000
17/05/29 03:30:29 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/05/29 03:30:30 INFO fs.TestDFSIO: creating control file: 524288000 bytes, 10 files
17/05/29 03:30:31 INFO fs.TestDFSIO: created control files for: 10 files
17/05/29 03:30:32 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:30:32 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:30:32 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
17/05/29 03:30:32 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
17/05/29 03:30:32 INFO mapred.FileInputFormat: Total input paths to process : 10
17/05/29 03:30:32 INFO mapreduce.JobSubmitter: number of splits:10
17/05/29 03:30:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494832799027_0143
17/05/29 03:30:32 INFO impl.YarnClientImpl: Submitted application application_1494832799027_0143
17/05/29 03:30:32 INFO mapreduce.Job: The url to track the job: http://m2.hdp22:8088/proxy/application_1494832799027_0143/
17/05/29 03:30:32 INFO mapreduce.Job: Running job: job_1494832799027_0143
17/05/29 03:30:39 INFO mapreduce.Job: Job job_1494832799027_0143 running in uber mode : false
17/05/29 03:30:39 INFO mapreduce.Job: map 0% reduce 0%
17/05/29 03:30:47 INFO mapreduce.Job: map 10% reduce 0%
17/05/29 03:30:48 INFO mapreduce.Job: map 60% reduce 0%
17/05/29 03:30:54 INFO mapreduce.Job: map 70% reduce 0%
17/05/29 03:30:55 INFO mapreduce.Job: map 100% reduce 0%
17/05/29 03:30:56 INFO mapreduce.Job: map 100% reduce 100%
17/05/29 03:30:56 INFO mapreduce.Job: Job job_1494832799027_0143 completed successfully
17/05/29 03:30:56 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=846
FILE: Number of bytes written=1717691
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=524290290
HDFS: Number of bytes written=80
HDFS: Number of read operations=53
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=63451
Total time spent by all reduces in occupied slots (ms)=9334
Total time spent by all map tasks (ms)=63451
Total time spent by all reduce tasks (ms)=4667
Total vcore-milliseconds taken by all map tasks=63451
Total vcore-milliseconds taken by all reduce tasks=4667
Total megabyte-milliseconds taken by all map tasks=129947648
Total megabyte-milliseconds taken by all reduce tasks=19116032
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=740
Map output materialized bytes=900
Input split bytes=1170
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=900
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1385
CPU time spent (ms)=23420
Physical memory (bytes) snapshot=15370592256
Virtual memory (bytes) snapshot=43200081920
Total committed heap usage (bytes)=16409690112
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=80
17/05/29 03:30:56 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
17/05/29 03:30:56 INFO fs.TestDFSIO: Date & time: Mon May 29 03:30:56 EDT 2017
17/05/29 03:30:56 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:30:56 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:30:56 INFO fs.TestDFSIO: Throughput mb/sec: 1945.5252918287938
17/05/29 03:30:56 INFO fs.TestDFSIO: Average IO rate mb/sec: 1950.8646240234375
17/05/29 03:30:56 INFO fs.TestDFSIO: IO rate std deviation: 102.10763308338827
17/05/29 03:30:56 INFO fs.TestDFSIO: Test exec time sec: 24.621
17/05/29 03:30:56 INFO fs.TestDFSIO:

Check the local TestDFSIO_results.log file for metric details for tests above. The following is an example:

$ cat TestDFSIO_results.log
----- TestDFSIO ----- : write
17/05/29 03:29:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
17/05/29 03:29:55 INFO fs.TestDFSIO: Date & time: Mon May 29 03:29:55 EDT 2017
17/05/29 03:29:55 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:29:55 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:29:55 INFO fs.TestDFSIO: Throughput mb/sec: 50.73566717402334
17/05/29 03:29:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 52.77006149291992
17/05/29 03:29:55 INFO fs.TestDFSIO: IO rate std deviation: 11.648531487475152
17/05/29 03:29:55 INFO fs.TestDFSIO: Test exec time sec: 31.779
17/05/29 03:29:55 INFO fs.TestDFSIO:

----- TestDFSIO ----- : read
17/05/29 03:30:56 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
17/05/29 03:30:56 INFO fs.TestDFSIO: Date & time: Mon May 29 03:30:56 EDT 2017
17/05/29 03:30:56 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:30:56 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:30:56 INFO fs.TestDFSIO: Throughput mb/sec: 1945.5252918287938
17/05/29 03:30:56 INFO fs.TestDFSIO: Average IO rate mb/sec: 1950.8646240234375
17/05/29 03:30:56 INFO fs.TestDFSIO: IO rate std deviation: 102.10763308338827
17/05/29 03:30:56 INFO fs.TestDFSIO: Test exec time sec: 24.621
17/05/29 03:30:56 INFO fs.TestDFSIO:

Note : Observe monitoring metrics while running these tests. If there are any issues, review the HDFS and MapReduce logs and tune or adjust the cluster accordingly.

After performing Stress Testing,please perform clean up to avoid unwanted space utilization on your cluster.

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -clean
17/05/29 03:46:03 INFO fs.TestDFSIO: TestDFSIO.1.8
17/05/29 03:46:03 INFO fs.TestDFSIO: nrFiles = 1
17/05/29 03:46:03 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
17/05/29 03:46:03 INFO fs.TestDFSIO: bufferSize = 1000000
17/05/29 03:46:03 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/05/29 03:46:04 INFO fs.TestDFSIO: Cleaning up test files
[s0998dnz@m1.hdp22 ~]$ hadoop fs -ls /benchmarks

  • 0

extend your VirtualBox image size

When you first time use your HDP sandbox in VirtualBox then by default it assign 20GB of your harddisk to your sandbox. But later as far as I know this would not be enough size and you want to extend size.Then this article will help you to extend your VBox size.

Step 1: Right click on that Virtual Machine where you would like to extend the size and click on settings and then go to storage.

Step 2: Now you need to click on + symbol near “Controller:Sata”  and click on “Create New Disk”. 

Step 3: Select “VDI(Virtual Disk Image)”  and continue :

Step 4: Select Dynamic Allocation and continue:

Step 5: Select Your size which you would like to extend (e.x 10 GB) and then click on create.

So here you will see following one more sata disk with the name you provided.

Now you have to start the server and login to the shell to perform following steps:

 

[root@m1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_m1-lv_root
                       18G  9.7G  6.7G  60% /
tmpfs                 939M     0  939M   0% /dev/shm
/dev/sda1             477M   25M  427M   6% /boot
[root@m1 ~]# sfdisk -s
/dev/sda:  20971520
/dev/sdb:  11104256
/dev/mapper/vg_m1-lv_root:  18358272
/dev/mapper/vg_m1-lv_swap:   2097152
total: 52531200 blocks
[root@m1 ~]# vgextend vg_m1 /dev/sdb
  Physical volume "/dev/sdb" successfully created
  Volume group "vg_m1" successfully extended
[root@m1 ~]# lvextend -L +10G -r /dev/mapper/vg_m1-lv_root
  Size of logical volume vg_m1/lv_root changed from 17.51 GiB (4482 extents) to 27.51 GiB (7042 extents).
  Logical volume lv_root successfully resized
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/mapper/vg_m1-lv_root is mounted on /; on-line resizing required
old desc_blocks = 2, new_desc_blocks = 2
Performing an on-line resize of /dev/mapper/vg_m1-lv_root to 7211008 (4k) blocks.
The filesystem on /dev/mapper/vg_m1-lv_root is now 7211008 blocks long.

[root@m1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_m1-lv_root
                       27G  9.7G   16G  38% /
tmpfs                 939M     0  939M   0% /dev/shm
/dev/sda1             477M   25M  427M   6% /boot


  • 0

hadoop snapshots

Hdfs snapshots are to protect important enterprise data sets from user or application errors.HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:

To demonstrate functionality of snapshots, we will create a directory in HDFS, will create its snapshot and will remove a file from the directory. Later, we will demonstrate how to recover the file from the snapshot.

First we will try to get all the snapshottable directories where the current user has permission to take snapshtos.

[hdfs@m1 ~]$ hdfs lsSnapshottableDir

here we have noticed that there is no dir which is snapshottable.

So now lets create a demo dir and and then we will create a snapshot on top of that dir.

[hdfs@m1 ~]$ hdfs dfs -mkdir /tmp/snapshot_demo
[hdfs@m1 ~]$ touch demo.txt
[hdfs@m1 ~]$ hadoop fs -put demo.txt  /tmp/snapshot_demo/
[hdfs@m1 ~]$ hdfs dfsadmin -allowSnapshot /tmp/snapshot_demo
Allowing snaphot on /tmp/snapshot_demo succeeded

Now if you will check the list of snapshottable dirs then you should get at-least above snapshot_demo.

[hdfs@m1 ~]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs hdfs 0 2017-03-30 03:31 0 65536 /tmp/snapshot_demo

Now lets create a snapshot on top /tmp/snapshot_demo and then check whether its created or not.

[hdfs@m1 ~]$ hdfs dfs -createSnapshot /tmp/snapshot_demo
Created snapshot /tmp/snapshot_demo/.snapshot/s20170330-033236.441
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/demo.txt
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot
Found 1 items
drwxr-xr-x   - hdfs hdfs          0 2017-03-30 03:32 /tmp/snapshot_demo/.snapshot/s20170330-033236.441
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot/s20170330-033236.441/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt 

Accidentally delete this snapshottable dir or files.

[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo
rm: The directory /tmp/snapshot_demo cannot be deleted since /tmp/snapshot_demo is snapshottable and already has snapshots
[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo/demo.txt
Deleted /tmp/snapshot_demo/demo.txt
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/

Oppsss… Surprisingly or not, the file was removed! What a bad day! What a horrible accident! Do not worry too much, however.We can recover this file because we have a snapshot!

[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot/s20170330-033236.441/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt
[hdfs@m1 ~]$ hadoop fs -cp /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt /tmp/snapshot_demo/
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:35 /tmp/snapshot_demo/demo.txt

This will restore the lost set of files to the working data set.

Also you can not delete snapshots, and it is because snapshots are read-only, HDFS will also protect against user or application deletion of the snapshot data itself. The following operation will fail:

[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo/.snapshot/s20170330-033236.441
rm: Modification on a read-only snapshot is disallowed

I hope it helped to understand snapshots,feel free to give your valuable feedback or suggestions.


  • 0

script to kill yarn application if it is running more than x mins

Sometime we get a situation where we have to get lists of all long running and based on threshold we need to kill them.Also sometime we need to do it for a specific yarn queue.  In such situation following script will help you to do your job.

[root@m1.hdp22~]$ vi kill_application_after_some_time.sh

#!/bin/bash

if [ “$#” -lt 1 ]; then

  echo Usage: $0  <max_life_in_mins>

  exit 1

fi

yarn application -list 2>/dev/null | grep <queue_name> | grep RUNNING | awk {print $1} > job_list.txt

for jobId in `cat job_list.txt`

do

finish_time=`yarn application -status $jobId 2>/dev/null | grep Finish-Time | awk {print $NF}`

if [ $finish_time -ne 0 ]; then

  echo App $jobId is not running

  exit 1

fi

time_diff=`date +%s``yarn application -status $jobId 2>/dev/null | grep Start-Time | awk {print $NF} | sed s!$!/1000!`

time_diff_in_mins=`echo ($time_diff)/60 | bc`

echo App $jobId is running for $time_diff_in_mins min(s)

if [ $time_diff_in_mins -gt $1 ]; then

  echo Killing app $jobId

  yarn application -kill $jobId

else

  echo App $jobId should continue to run

fi

done

[yarn@m1.hdp22 ~]$ ./kill_application_after_some_time.sh 30 (pass x tim in mins)

App application_1487677946023_5995 is running for 0 min(s)

App application_1487677946023_5995 should continue to run

I hope it would help you but please feel free to give your valuable feedback or suggestion.


  • 0

Cannot retrieve repository metadata (repomd.xml) for repository

When you upgrade your hdp cluster through satellite server or local repository and you start your cluster via ambari or add some new services to your cluster then you may see following error.

resource_management.core.exceptions.Fail: Execution of ‘/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-collector’ returned 1. Error: Cannot retrieve repository metadata (repomd.xml) for repository: HDP-2.3.0.0-2557. Please verify its path and try again.

[root@m1 ~]# yum -d 0 -e 0 -y install slider_2_5_3_0_37
Error: Cannot retrieve repository metadata (repomd.xml) for repository: HDP-2.3.0.0-2557. Please verify its path and try again

Root Cause: It is because of two different version repository file in your yum dir. 

[root@m1 yum.repos.d]# ll
total 48
-rw-r–r– 1 root root 286 Sep 20 14:01 ambari.repo
-rw-r–r–. 1 root root 1991 Oct 23 2014 CentOS-Base.repo
-rw-r–r–. 1 root root 647 Oct 23 2014 CentOS-Debuginfo.repo
-rw-r–r–. 1 root root 289 Oct 23 2014 CentOS-fasttrack.repo
-rw-r–r–. 1 root root 630 Oct 23 2014 CentOS-Media.repo
-rw-r–r–. 1 root root 5394 Oct 23 2014 CentOS-Vault.repo
-rw-r–r– 1 root root 274 Oct 4 10:16 HDP-2.3.0.0-2557.repo
-rw-r–r– 1 root root 286 Oct 4 03:50 HDP-2.3.0.0.repo
-rw-r–r– 1 root root 234 Feb 1 11:05 HDP-2.5.3.0.repo
-rw-r–r– 1 root root 92 Feb 3 12:29 HDP.repo
-rw-r–r– 1 root root 135 Feb 3 12:29 HDP-UTILS.repo

 

Resolution:

Step 1: You need to disable your repo file for old version or need to move/delete them from /etc/yum.repo.d dir. 

[root@w1 yum.repos.d]# mv HDP-2.3.0.0* /tmp/
[root@w1 yum.repos.d]# ls -ltr
total 40
-rw-r–r–. 1 root root 5394 Oct 23 2014 CentOS-Vault.repo
-rw-r–r–. 1 root root 630 Oct 23 2014 CentOS-Media.repo
-rw-r–r–. 1 root root 289 Oct 23 2014 CentOS-fasttrack.repo
-rw-r–r–. 1 root root 647 Oct 23 2014 CentOS-Debuginfo.repo
-rw-r–r–. 1 root root 1991 Oct 23 2014 CentOS-Base.repo
-rw-r–r– 1 root root 286 Sep 20 14:01 ambari.repo
-rw-r–r– 1 root root 234 Feb 1 11:05 HDP-2.5.3.0.repo
-rw-r–r– 1 root root 92 Feb 3 12:29 HDP.repo
-rw-r–r– 1 root root 135 Feb 3 12:29 HDP-UTILS.repo

Step 2: Now clean all old repo and then update it again with new repo metadata.

[root@w3 yum.repos.d]# yum info all

Loaded plugins: fastestmirror

Determining fastest mirrors

* base: mirror.vcu.edu

* extras: mirror.cs.uwp.edu

* updates: mirror.nodesdirect.com

HDP-2.5                                                                                                                                                  | 2.9 kB     00:00     

HDP-2.5/primary_db                                                                                                                                       |  69 kB     00:00     

HDP-2.5.3.0                                                                                                                                              | 2.9 kB     00:00     

HDP-2.5.3.0/primary_db                                                                                                                                   |  69 kB     00:00     

HDP-UTILS-1.1.0.21                                                                                                                                       | 2.9 kB     00:00     

HDP-UTILS-1.1.0.21/primary_db                                                                                                                            |  33 kB     00:00     

HDP-UTILS-2.5.3.0                                                                                                                                        | 2.9 kB     00:00     

HDP-UTILS-2.5.3.0/primary_db                                                                                                                             |  33 kB     00:00     

Updates-ambari-2.4.1.0                                                                                                                                   | 2.9 kB     00:00     

Updates-ambari-2.4.1.0/primary_db                                                                                                                        | 8.3 kB     00:00     

base                                                                                                                                                     | 3.7 kB     00:00     

base/primary_db                                                                                                                                          | 4.7 MB     00:36     

extras                                                                                                                                                   | 3.4 kB     00:00     

extras/primary_db                                                                                                                                        |  37 kB     00:00     

updates                                                                                                                                                  | 3.4 kB     00:00     

http://mirror.nodesdirect.com/centos/6.8/updates/x86_64/repodata/b02ecfdd926546ba78f0f52d424e06c6a9b7da60cee4b9bf83a54a892b9efd06-primary.sqlite.bz2: [Errno 12] Timeout on http://mirror.nodesdirect.com/centos/6.8/updates/x86_64/repodata/b02ecfdd926546ba78f0f52d424e06c6a9b7da60cee4b9bf83a54a892b9efd06-primary.sqlite.bz2: (28, ‘Operation too slow. Less than 1 bytes/sec transfered the last 30 seconds’)

Trying other mirror.

updates/primary_db                                                                                                                                       | 4.3 MB     00:01     

Installed Packages

Name        : MAKEDEV

Arch        : x86_64

Version     : 3.24

Release     : 6.el6

Size        : 222 k

Repo        : installed

From repo   : anaconda-CentOS-201410241409.x86_64

Summary     : A program used for creating device files in /dev

URL         : http://www.lanana.org/docs/device-list/

License     : GPLv2

Description : This package contains the MAKEDEV program, which makes it easier to create

            : and maintain the files in the /dev directory.  /dev directory files

            : correspond to a particular device supported by Linux (serial or printer

            : ports, scanners, sound cards, tape drives, CD-ROM drives, hard drives,

            : etc.) and interface with the drivers in the kernel.

            : You should install the MAKEDEV package because the MAKEDEV utility makes

            : it easy to manage the /dev directory device files.

 

Please feel free to give your valuable feedback.


  • 0

HDFS disk space vs NameNode heap size

In HDFS, data and metadata are decoupled. Data files are split into block files that are stored, and replicated on DataNodes across the cluster. The filesystem namespace tree and associated metadata are stored on the NameNode.

Namespace objects are file inodes and blocks that point to block files on the DataNodes. These namespace objects are stored as a file system image (fsimage) in the NameNode’s memory and also persist locally. Updates to the metadata are written to an edit log. When the NameNode starts, or when a checkpoint is taken, the edits are applied, the log is cleared, and a new fsimage is created.

On DataNodes, data files are measured by disk space consumed—the actual data length—and not necessarily the full block size.

For example, a file that is 192 MB consumes 192 MB of disk space and not some integral multiple of the block size. Using the default block size of 128 MB, a file of 192 MB is split into two block files, one 128 MB file and one 64 MB file. On the NameNode, namespace objects are measured by the number of files and blocks. The same 192 MB file is represented by three namespace objects (1 file inode + 2 blocks) and consumes approximately 450 bytes of memory.

Large files split into fewer blocks generally consume less memory than small files that generate many blocks. One data file of 128 MB is represented by two namespace objects on the NameNode (1 file inode + 1 block) and consumes approximately 300 bytes of memory. By contrast, 128 files of 1 MB each are represented by 256 namespace objects (128 file inodes + 128 blocks) and consume approximately 38,400 bytes. The optimal split size, then, is some integral multiple of the block size, for memory management as well as data locality optimization.

How much memory you actually need depends on your workload, especially on the number of files, directories, and blocks generated in each namespace. If all of your files are split at the block size, you could allocate 1 GB for every million files. But given the historical average of 1.5 blocks per file (2 block objects), a more conservative estimate is 1 GB of memory for every million blocks.

Example 1: Estimating NameNode Heap Memory Used
Alice, Bob, and Carl each have 1 GB (1024 MB) of data on disk, but sliced into differently sized files. Alice and Bob have files that are some integral of the block size and require the least memory. Carl does not and fills the heap with unnecessary namespace objects.

Alice: 1 x 1024 MB file
1 file inode
8 blocks (1024 MB / 128 MB)
Total = 9 objects * 150 bytes = 1,350 bytes of heap memory
Bob: 8 x 128 MB files
8 file inodes
8 blocks
Total = 16 objects * 150 bytes = 2,400 bytes of heap memory
Carl: 1,024 x 1 MB files
1,024 file inodes
1,024 blocks
Total = 2,048 objects * 150 bytes = 307,200 bytes of heap memory
Example 2: Estimating NameNode Heap Memory Needed
In this example, memory is estimated by considering the capacity of a cluster. Values are rounded. Both clusters physically store 4800 TB, or approximately 36 million block files (at the default block size). Replication determines how many namespace blocks represent these block files.

Cluster A: 200 hosts of 24 TB each = 4800 TB.
Blocksize=128 MB, Replication=1
Cluster capacity in MB: 200 * 24,000,000 MB = 4,800,000,000 MB (4800 TB)
Disk space needed per block: 128 MB per block * 1 = 128 MB storage per block
Cluster capacity in blocks: 4,800,000,000 MB / 128 MB = 36,000,000 blocks
At capacity, with the recommended allocation of 1 GB of memory per million blocks, Cluster A needs 36 GB of maximum heap space.
Cluster B: 200 hosts of 24 TB each = 4800 TB.
Blocksize=128 MB, Replication=3
Cluster capacity in MB: 200 * 24,000,000 MB = 4,800,000,000 MB (4800 TB)
Disk space needed per block: 128 MB per block * 3 = 384 MB storage per block
Cluster capacity in blocks: 4,800,000,000 MB / 384 MB = 12,000,000 blocks

At capacity, with the recommended allocation of 1 GB of memory per million blocks, Cluster B needs 12 GB of maximum heap space.
Both Cluster A and Cluster B store the same number of block files. In Cluster A, however, each block file is unique and represented by one block on the NameNode; in Cluster B, only one-third are unique and two-thirds are replicas.


  • 0

Datanode doesn’t start with error “java.net.BindException: Address already in use”

In many real time scenario we have seen a error “java.net.BindException: Address already in use” with datanode when we start datanode.

You can observe following things during that issue.

1. Datanode doesn’t start with error saying “address already in use”.
2. “netstat -anp | grep 50010” shows no result.

ROOT CAUSE:
There are 3 ports needed when datanode starts and each has a different error message when address already in use.

1. Port 50010 is already in use
2016-12-02 00:01:14,056 ERROR datanode.DataNode (DataNode.java:secureMain(2630)) – Exception in secureMain
java.net.BindException: Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException

2. Port 50075 is already in use
2016-12-01 23:57:57,298 ERROR datanode.DataNode (DataNode.java:secureMain(2630)) – Exception in secureMain
java.net.BindException: Address already in use

3. Port 8010 is already in use
2016-12-02 00:09:40,422 ERROR datanode.DataNode (DataNode.java:secureMain(2630)) – Exception in secureMain
java.net.BindException: Problem binding to [0.0.0.0:8010] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException

Note that there is no port information within datanode.log when Port 50075 is already in use.If it’s required to set a different port for datanode service, review the following properties:

dfs.datanode.address : default 50010
dfs.datanode.http.address : default 50075
dfs.datanode.ipc.address : default 8010

RESOLUTION :
Stop/kill the process which uses port 50010/50075/8010.

Please feel free to give your valuable feedback.


  • 0

Standby NameNode is faling and only one is running

Category : HDFS

Standby NameNode is unable to start up. Or, once bring up standby NameNode, the active NameNode will go down soon, leaving only one live NameNode. NameNode log shows:

FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) – Error: flush failed for required journal (JournalAndStream(mgr=QJM to ))

java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond. 

ROOT CAUSE: 

This can be caused by network issue, which causes JournalNode to take long time to sync. The following snippet from JournalNode log shows it took unusual long time to sync:

WARN server.Journal (Journal.java:journal(384)) – Sync of transaction range 187176137-187176137 took 44461ms

WARN ipc.Server (Server.java:processResponse(1027)) – IPC Server handler 3 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from Call#10890 Retry#0: output error

INFO ipc.Server (Server.java:run(2105)) – IPC Server handler 3 on 8485 caught an exception

java.nio.channels.ClosedChannelException

at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265)

at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:474)

at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2573)

at org.apache.hadoop.ipc.Server.access$1900(Server.java:135)

at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:977)

at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1042)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2094)

 

RESOLUTION:

Increase the values of following JournalNode timeout properties:

dfs.qjournal.select-input-streams.timeout.ms = 60000 

dfs.qjournal.start-segment.timeout.ms = 60000 

dfs.qjournal.write-txns.timeout.ms = 60000


  • 0

“INSERT OVERWRITE” functional details

If the OVERWRITE keyword is used then the contents of the target table (or partition) will be deleted and replaced by the files referred to by filepath; otherwise the files referred by filepath will be added to the table.

  • Note that if the target table (or partition) already has a file whose name collides with any of the filenames contained in filepath, then the existing file will be replaced with the new file.
  • When Hive tries to “INSERT OVERWRITE” to a partition of an external table under existing directory, depending on whether the partition definition already exists in the metastore or not, Hive will behave differently:

1) If partition definition does not exist, it will not try to guess where the target partition directories are (either static or dynamic partitions), so it will not be able to delete existing files under those partitions that will be written to

2) If partition definition does exist, it will attempt to remove all files under the target partition directory before writing new data into those directories

You can reproduce this issue with following steps.

Step 1: Login as “hdfs” user, run the following commands

hdfs dfs -mkdir test
hdfs dfs -mkdir test/p=p1
touch test.txt
hdfs dfs -put test.txt test/p=p1

Step 2: Confirm that there is one file under test/p=p1

hdfs dfs -ls test/p=p1
Found 1 items
-rw-r–r– 3 hdfs supergroup 5 2015-05-04 17:30 test/p=p1/test.txt

Step 3 : Then start “hive”
DROP TABLE IF EXISTS partition_test;
CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string) LOCATION ‘/user/hdfs/test’;
INSERT OVERWRITE TABLE partition_test PARTITION (p = ‘p1’) SELECT <int_column> FROM <existing_table>;

The output from the above “INSERT OVERWRITE”:
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there’s no reduce operator
Starting Job = job_1430100146027_0004, Tracking URL = http://host-10-17-74-166.coe.cloudera.com:8088/proxy/application_1430100146027_0004/
Kill Command = /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/bin/hadoop job -kill job_1430100146027_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-05-05 00:15:35,220 Stage-1 map = 0%, reduce = 0%
2015-05-05 00:15:48,740 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.19 sec
MapReduce Total cumulative CPU time: 3 seconds 190 msec
Ended Job = job_1430100146027_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://ha-test/user/hdfs/test/p=p1/.hive-staging_hive_2015-05-05_00-13-47_253_4887262776207257351-1/-ext-10000
Loading data to table default.partition_test partition (p=p1)
Partition default.partition_test{p=p1} stats: [numFiles=2, numRows=33178, totalSize=194973, rawDataSize=161787]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 3.19 sec HDFS Read: 2219273 HDFS Write: 195055 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 190 msec

to confirm that test.txt is not removed

hdfs dfs -ls test/p=p1
Found 2 items
-rwxr-xr-x 3 hdfs supergroup 194965 2015-05-05 00:15 test/p=p1/000000_0
-rw-r–r– 3 hdfs supergroup 8 2015-05-05 00:10 test/p=p1/test.txt

rename 000000_0 to 11111111

hdfs dfs -mv test/p=p1/000000_0 test/p=p1/11111111
confirm now two files under test/p=p1

hdfs dfs -ls test/p=p1
Found 2 items
-rwxr-xr-x 3 hdfs supergroup 194965 2015-05-05 00:15 test/p=p1/11111111
-rw-r–r– 3 hdfs supergroup 8 2015-05-05 00:10 test/p=p1/test.txt

Step 4: Run the following query again:

INSERT OVERWRITE TABLE partition_test PARTITION (p = ‘p1’) SELECT <int_column> FROM <existing_table>;

The output from second “INSERT OVERWRITE”:

Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there’s no reduce operator
Starting Job = job_1430100146027_0005, Tracking URL = http://host-10-17-74-166.coe.cloudera.com:8088/proxy/application_1430100146027_0005/
Kill Command = /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/bin/hadoop job -kill job_1430100146027_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-05-05 00:23:39,298 Stage-1 map = 0%, reduce = 0%
2015-05-05 00:23:48,891 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.92 sec
MapReduce Total cumulative CPU time: 2 seconds 920 msec
Ended Job = job_1430100146027_0005
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://ha-test/user/hdfs/test/p=p1/.hive-staging_hive_2015-05-05_00-21-58_505_3688057093497278728-1/-ext-10000
Loading data to table default.partition_test partition (p=p1)
Moved: ‘hdfs://ha-test/user/hdfs/test/p=p1/11111111’ to trash at: hdfs://ha-test/user/hdfs/.Trash/Current
Moved: ‘hdfs://ha-test/user/hdfs/test/p=p1/test.txt’ to trash at: hdfs://ha-test/user/hdfs/.Trash/Current
Partition default.partition_test{p=p1} stats: [numFiles=1, numRows=33178, totalSize=194965, rawDataSize=161787]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.92 sec HDFS Read: 2219273 HDFS Write: 195055 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 920 msec

Step 5. Finally confirm that only one file under test/p=p1 directory, both 11111111 and test.txt were moved to .Trash directory

hdfs dfs -ls test/p=p1
Found 1 items
-rwxr-xr-x 3 hdfs supergroup 4954 2015-05-04 17:36 test/p=p1/000000_0

The above test confirms that files remain in the target partition directory when table was newly created with no partition definitions.

Resolutions: To fix this issue, you can run the following hive query before the “INSERT OVERWRITE” to recover the missing partition definitions:

MSCK REPAIR TABLE partition_test;

OK
Partitions not in metastore: partition_test:p=p1
Repair: Added partition to metastore partition_test:p=p1
Time taken: 0.486 seconds, Fetched: 2 row(s)

Ref : http://www.ericlin.me/hive-insert-overwrite-does-not-remove-existing-data