hadoop cluster Benchmarking and Stress Testing

  • 0

hadoop cluster Benchmarking and Stress Testing

When we install our cluster then we should do some benchmarking or Stress Testing. So in this article I have explained a inbuilt TestDFSIO functionality which will help you to to perform Stress Testing on your configured cluster.

The Hadoop distribution comes with a number of benchmarks, which are bundled in hadoop-*test*.jar and hadoop-*examples*.jar.

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-*test*.jar
Unknown program '/usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar' chosen.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-*example*.jar
Unknown program '/usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-examples.jar' chosen.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

The TestDFSIO benchmark is a read and write test for HDFS. It is helpful for tasks such as stress testing HDFS, to discover performance bottlenecks in your network, to shake out the hardware, OS and Hadoop setup of your cluster machines (particularly the NameNode and the DataNodes) and to give you a first impression of how fast your cluster is in terms of I/O.

From the command line, run the following command to test writing of 10 output files of size 500MB for a total of 5GB:

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 50
17/05/29 03:29:19 INFO fs.TestDFSIO: TestDFSIO.1.8
17/05/29 03:29:19 INFO fs.TestDFSIO: nrFiles = 10
17/05/29 03:29:19 INFO fs.TestDFSIO: nrBytes (MB) = 50.0
17/05/29 03:29:19 INFO fs.TestDFSIO: bufferSize = 1000000
17/05/29 03:29:19 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/05/29 03:29:21 INFO fs.TestDFSIO: creating control file: 52428800 bytes, 10 files
17/05/29 03:29:23 INFO fs.TestDFSIO: created control files for: 10 files
17/05/29 03:29:23 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:29:23 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:29:23 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
17/05/29 03:29:23 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
17/05/29 03:29:23 INFO mapred.FileInputFormat: Total input paths to process : 10
17/05/29 03:29:23 INFO mapreduce.JobSubmitter: number of splits:10
17/05/29 03:29:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494832799027_0142
17/05/29 03:29:24 INFO impl.YarnClientImpl: Submitted application application_1494832799027_0142
17/05/29 03:29:24 INFO mapreduce.Job: The url to track the job: http://m2.hdp22:8088/proxy/application_1494832799027_0142/
17/05/29 03:29:24 INFO mapreduce.Job: Running job: job_1494832799027_0142
17/05/29 03:29:31 INFO mapreduce.Job: Job job_1494832799027_0142 running in uber mode : false
17/05/29 03:29:31 INFO mapreduce.Job: map 0% reduce 0%
17/05/29 03:29:46 INFO mapreduce.Job: map 30% reduce 0%
17/05/29 03:29:47 INFO mapreduce.Job: map 50% reduce 0%
17/05/29 03:29:48 INFO mapreduce.Job: map 60% reduce 0%
17/05/29 03:29:51 INFO mapreduce.Job: map 80% reduce 0%
17/05/29 03:29:52 INFO mapreduce.Job: map 90% reduce 0%
17/05/29 03:29:53 INFO mapreduce.Job: map 100% reduce 0%
17/05/29 03:29:54 INFO mapreduce.Job: map 100% reduce 100%
17/05/29 03:29:54 INFO mapreduce.Job: Job job_1494832799027_0142 completed successfully
17/05/29 03:29:55 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=835
FILE: Number of bytes written=1717691
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2290
HDFS: Number of bytes written=524288077
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=103814
Total time spent by all reduces in occupied slots (ms)=7846
Total time spent by all map tasks (ms)=103814
Total time spent by all reduce tasks (ms)=3923
Total vcore-milliseconds taken by all map tasks=103814
Total vcore-milliseconds taken by all reduce tasks=3923
Total megabyte-milliseconds taken by all map tasks=212611072
Total megabyte-milliseconds taken by all reduce tasks=16068608
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=729
Map output materialized bytes=889
Input split bytes=1170
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=889
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=4456
CPU time spent (ms)=59400
Physical memory (bytes) snapshot=15627186176
Virtual memory (bytes) snapshot=43288719360
Total committed heap usage (bytes)=16284385280
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=77
17/05/29 03:29:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
17/05/29 03:29:55 INFO fs.TestDFSIO: Date & time: Mon May 29 03:29:55 EDT 2017
17/05/29 03:29:55 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:29:55 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:29:55 INFO fs.TestDFSIO: Throughput mb/sec: 50.73566717402334
17/05/29 03:29:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 52.77006149291992
17/05/29 03:29:55 INFO fs.TestDFSIO: IO rate std deviation: 11.648531487475152
17/05/29 03:29:55 INFO fs.TestDFSIO: Test exec time sec: 31.779
17/05/29 03:29:55 INFO fs.TestDFSIO:

From the command line, run the following command to test reading 10 input files of size 500MB:

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 500
17/05/29 03:30:29 INFO fs.TestDFSIO: TestDFSIO.1.8
17/05/29 03:30:29 INFO fs.TestDFSIO: nrFiles = 10
17/05/29 03:30:29 INFO fs.TestDFSIO: nrBytes (MB) = 500.0
17/05/29 03:30:29 INFO fs.TestDFSIO: bufferSize = 1000000
17/05/29 03:30:29 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/05/29 03:30:30 INFO fs.TestDFSIO: creating control file: 524288000 bytes, 10 files
17/05/29 03:30:31 INFO fs.TestDFSIO: created control files for: 10 files
17/05/29 03:30:32 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:30:32 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:30:32 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
17/05/29 03:30:32 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
17/05/29 03:30:32 INFO mapred.FileInputFormat: Total input paths to process : 10
17/05/29 03:30:32 INFO mapreduce.JobSubmitter: number of splits:10
17/05/29 03:30:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494832799027_0143
17/05/29 03:30:32 INFO impl.YarnClientImpl: Submitted application application_1494832799027_0143
17/05/29 03:30:32 INFO mapreduce.Job: The url to track the job: http://m2.hdp22:8088/proxy/application_1494832799027_0143/
17/05/29 03:30:32 INFO mapreduce.Job: Running job: job_1494832799027_0143
17/05/29 03:30:39 INFO mapreduce.Job: Job job_1494832799027_0143 running in uber mode : false
17/05/29 03:30:39 INFO mapreduce.Job: map 0% reduce 0%
17/05/29 03:30:47 INFO mapreduce.Job: map 10% reduce 0%
17/05/29 03:30:48 INFO mapreduce.Job: map 60% reduce 0%
17/05/29 03:30:54 INFO mapreduce.Job: map 70% reduce 0%
17/05/29 03:30:55 INFO mapreduce.Job: map 100% reduce 0%
17/05/29 03:30:56 INFO mapreduce.Job: map 100% reduce 100%
17/05/29 03:30:56 INFO mapreduce.Job: Job job_1494832799027_0143 completed successfully
17/05/29 03:30:56 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=846
FILE: Number of bytes written=1717691
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=524290290
HDFS: Number of bytes written=80
HDFS: Number of read operations=53
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=63451
Total time spent by all reduces in occupied slots (ms)=9334
Total time spent by all map tasks (ms)=63451
Total time spent by all reduce tasks (ms)=4667
Total vcore-milliseconds taken by all map tasks=63451
Total vcore-milliseconds taken by all reduce tasks=4667
Total megabyte-milliseconds taken by all map tasks=129947648
Total megabyte-milliseconds taken by all reduce tasks=19116032
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=740
Map output materialized bytes=900
Input split bytes=1170
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=900
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1385
CPU time spent (ms)=23420
Physical memory (bytes) snapshot=15370592256
Virtual memory (bytes) snapshot=43200081920
Total committed heap usage (bytes)=16409690112
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=80
17/05/29 03:30:56 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
17/05/29 03:30:56 INFO fs.TestDFSIO: Date & time: Mon May 29 03:30:56 EDT 2017
17/05/29 03:30:56 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:30:56 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:30:56 INFO fs.TestDFSIO: Throughput mb/sec: 1945.5252918287938
17/05/29 03:30:56 INFO fs.TestDFSIO: Average IO rate mb/sec: 1950.8646240234375
17/05/29 03:30:56 INFO fs.TestDFSIO: IO rate std deviation: 102.10763308338827
17/05/29 03:30:56 INFO fs.TestDFSIO: Test exec time sec: 24.621
17/05/29 03:30:56 INFO fs.TestDFSIO:

Check the local TestDFSIO_results.log file for metric details for tests above. The following is an example:

$ cat TestDFSIO_results.log
----- TestDFSIO ----- : write
17/05/29 03:29:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
17/05/29 03:29:55 INFO fs.TestDFSIO: Date & time: Mon May 29 03:29:55 EDT 2017
17/05/29 03:29:55 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:29:55 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:29:55 INFO fs.TestDFSIO: Throughput mb/sec: 50.73566717402334
17/05/29 03:29:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 52.77006149291992
17/05/29 03:29:55 INFO fs.TestDFSIO: IO rate std deviation: 11.648531487475152
17/05/29 03:29:55 INFO fs.TestDFSIO: Test exec time sec: 31.779
17/05/29 03:29:55 INFO fs.TestDFSIO:

----- TestDFSIO ----- : read
17/05/29 03:30:56 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
17/05/29 03:30:56 INFO fs.TestDFSIO: Date & time: Mon May 29 03:30:56 EDT 2017
17/05/29 03:30:56 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:30:56 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:30:56 INFO fs.TestDFSIO: Throughput mb/sec: 1945.5252918287938
17/05/29 03:30:56 INFO fs.TestDFSIO: Average IO rate mb/sec: 1950.8646240234375
17/05/29 03:30:56 INFO fs.TestDFSIO: IO rate std deviation: 102.10763308338827
17/05/29 03:30:56 INFO fs.TestDFSIO: Test exec time sec: 24.621
17/05/29 03:30:56 INFO fs.TestDFSIO:

Note : Observe monitoring metrics while running these tests. If there are any issues, review the HDFS and MapReduce logs and tune or adjust the cluster accordingly.

After performing Stress Testing,please perform clean up to avoid unwanted space utilization on your cluster.

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -clean
17/05/29 03:46:03 INFO fs.TestDFSIO: TestDFSIO.1.8
17/05/29 03:46:03 INFO fs.TestDFSIO: nrFiles = 1
17/05/29 03:46:03 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
17/05/29 03:46:03 INFO fs.TestDFSIO: bufferSize = 1000000
17/05/29 03:46:03 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/05/29 03:46:04 INFO fs.TestDFSIO: Cleaning up test files
[s0998dnz@m1.hdp22 ~]$ hadoop fs -ls /benchmarks