hadoop cluster Benchmarking and Stress Testing
Category : HDFS
When we install our cluster then we should do some benchmarking or Stress Testing. So in this article I have explained a inbuilt TestDFSIO functionality which will help you to to perform Stress Testing on your configured cluster.
The Hadoop distribution comes with a number of benchmarks, which are bundled in hadoop-*test*.jar and hadoop-*examples*.jar.
[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-*test*.jar Unknown program '/usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar' chosen. Valid program names are: DFSCIOTest: Distributed i/o benchmark of libhdfs. DistributedFSCheck: Distributed checkup of the file system consistency. JHLogAnalyzer: Job History Log analyzer. MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures NNdataGenerator: Generate the data to be used by NNloadGenerator NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job NNstructureGenerator: Generate the structure to be used by NNdataGenerator SliveTest: HDFS Stress Test and Live Data Verification. TestDFSIO: Distributed i/o benchmark. fail: a job that always fails filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed) largesorter: Large-Sort tester loadgen: Generic map/reduce load generator mapredtest: A map/reduce test check. minicluster: Single process HDFS and MR cluster. mrbench: A map/reduce benchmark that can create many small jobs nnbench: A benchmark that stresses the namenode. sleep: A job that sleeps at each map and reduce task. testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce testfilesystem: A test for FileSystem read/write. testmapredsort: A map/reduce program that validates the map-reduce framework's sort. testsequencefile: A test for flat files of binary key value pairs. testsequencefileinputformat: A test for sequence file input format. testtextinputformat: A test for text input format. threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-*example*.jar Unknown program '/usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-examples.jar' chosen. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
The TestDFSIO benchmark is a read and write test for HDFS. It is helpful for tasks such as stress testing HDFS, to discover performance bottlenecks in your network, to shake out the hardware, OS and Hadoop setup of your cluster machines (particularly the NameNode and the DataNodes) and to give you a first impression of how fast your cluster is in terms of I/O.
From the command line, run the following command to test writing of 10 output files of size 500MB for a total of 5GB:
[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 50 17/05/29 03:29:19 INFO fs.TestDFSIO: TestDFSIO.1.8 17/05/29 03:29:19 INFO fs.TestDFSIO: nrFiles = 10 17/05/29 03:29:19 INFO fs.TestDFSIO: nrBytes (MB) = 50.0 17/05/29 03:29:19 INFO fs.TestDFSIO: bufferSize = 1000000 17/05/29 03:29:19 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 17/05/29 03:29:21 INFO fs.TestDFSIO: creating control file: 52428800 bytes, 10 files 17/05/29 03:29:23 INFO fs.TestDFSIO: created control files for: 10 files 17/05/29 03:29:23 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200 17/05/29 03:29:23 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200 17/05/29 03:29:23 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]... 17/05/29 03:29:23 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1] 17/05/29 03:29:23 INFO mapred.FileInputFormat: Total input paths to process : 10 17/05/29 03:29:23 INFO mapreduce.JobSubmitter: number of splits:10 17/05/29 03:29:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494832799027_0142 17/05/29 03:29:24 INFO impl.YarnClientImpl: Submitted application application_1494832799027_0142 17/05/29 03:29:24 INFO mapreduce.Job: The url to track the job: http://m2.hdp22:8088/proxy/application_1494832799027_0142/ 17/05/29 03:29:24 INFO mapreduce.Job: Running job: job_1494832799027_0142 17/05/29 03:29:31 INFO mapreduce.Job: Job job_1494832799027_0142 running in uber mode : false 17/05/29 03:29:31 INFO mapreduce.Job: map 0% reduce 0% 17/05/29 03:29:46 INFO mapreduce.Job: map 30% reduce 0% 17/05/29 03:29:47 INFO mapreduce.Job: map 50% reduce 0% 17/05/29 03:29:48 INFO mapreduce.Job: map 60% reduce 0% 17/05/29 03:29:51 INFO mapreduce.Job: map 80% reduce 0% 17/05/29 03:29:52 INFO mapreduce.Job: map 90% reduce 0% 17/05/29 03:29:53 INFO mapreduce.Job: map 100% reduce 0% 17/05/29 03:29:54 INFO mapreduce.Job: map 100% reduce 100% 17/05/29 03:29:54 INFO mapreduce.Job: Job job_1494832799027_0142 completed successfully 17/05/29 03:29:55 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=835 FILE: Number of bytes written=1717691 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2290 HDFS: Number of bytes written=524288077 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=12 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=103814 Total time spent by all reduces in occupied slots (ms)=7846 Total time spent by all map tasks (ms)=103814 Total time spent by all reduce tasks (ms)=3923 Total vcore-milliseconds taken by all map tasks=103814 Total vcore-milliseconds taken by all reduce tasks=3923 Total megabyte-milliseconds taken by all map tasks=212611072 Total megabyte-milliseconds taken by all reduce tasks=16068608 Map-Reduce Framework Map input records=10 Map output records=50 Map output bytes=729 Map output materialized bytes=889 Input split bytes=1170 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=889 Reduce input records=50 Reduce output records=5 Spilled Records=100 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=4456 CPU time spent (ms)=59400 Physical memory (bytes) snapshot=15627186176 Virtual memory (bytes) snapshot=43288719360 Total committed heap usage (bytes)=16284385280 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1120 File Output Format Counters Bytes Written=77 17/05/29 03:29:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 17/05/29 03:29:55 INFO fs.TestDFSIO: Date & time: Mon May 29 03:29:55 EDT 2017 17/05/29 03:29:55 INFO fs.TestDFSIO: Number of files: 10 17/05/29 03:29:55 INFO fs.TestDFSIO: Total MBytes processed: 500.0 17/05/29 03:29:55 INFO fs.TestDFSIO: Throughput mb/sec: 50.73566717402334 17/05/29 03:29:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 52.77006149291992 17/05/29 03:29:55 INFO fs.TestDFSIO: IO rate std deviation: 11.648531487475152 17/05/29 03:29:55 INFO fs.TestDFSIO: Test exec time sec: 31.779 17/05/29 03:29:55 INFO fs.TestDFSIO:
From the command line, run the following command to test reading 10 input files of size 500MB:
[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 500 17/05/29 03:30:29 INFO fs.TestDFSIO: TestDFSIO.1.8 17/05/29 03:30:29 INFO fs.TestDFSIO: nrFiles = 10 17/05/29 03:30:29 INFO fs.TestDFSIO: nrBytes (MB) = 500.0 17/05/29 03:30:29 INFO fs.TestDFSIO: bufferSize = 1000000 17/05/29 03:30:29 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 17/05/29 03:30:30 INFO fs.TestDFSIO: creating control file: 524288000 bytes, 10 files 17/05/29 03:30:31 INFO fs.TestDFSIO: created control files for: 10 files 17/05/29 03:30:32 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200 17/05/29 03:30:32 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200 17/05/29 03:30:32 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]... 17/05/29 03:30:32 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1] 17/05/29 03:30:32 INFO mapred.FileInputFormat: Total input paths to process : 10 17/05/29 03:30:32 INFO mapreduce.JobSubmitter: number of splits:10 17/05/29 03:30:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494832799027_0143 17/05/29 03:30:32 INFO impl.YarnClientImpl: Submitted application application_1494832799027_0143 17/05/29 03:30:32 INFO mapreduce.Job: The url to track the job: http://m2.hdp22:8088/proxy/application_1494832799027_0143/ 17/05/29 03:30:32 INFO mapreduce.Job: Running job: job_1494832799027_0143 17/05/29 03:30:39 INFO mapreduce.Job: Job job_1494832799027_0143 running in uber mode : false 17/05/29 03:30:39 INFO mapreduce.Job: map 0% reduce 0% 17/05/29 03:30:47 INFO mapreduce.Job: map 10% reduce 0% 17/05/29 03:30:48 INFO mapreduce.Job: map 60% reduce 0% 17/05/29 03:30:54 INFO mapreduce.Job: map 70% reduce 0% 17/05/29 03:30:55 INFO mapreduce.Job: map 100% reduce 0% 17/05/29 03:30:56 INFO mapreduce.Job: map 100% reduce 100% 17/05/29 03:30:56 INFO mapreduce.Job: Job job_1494832799027_0143 completed successfully 17/05/29 03:30:56 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=846 FILE: Number of bytes written=1717691 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=524290290 HDFS: Number of bytes written=80 HDFS: Number of read operations=53 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=63451 Total time spent by all reduces in occupied slots (ms)=9334 Total time spent by all map tasks (ms)=63451 Total time spent by all reduce tasks (ms)=4667 Total vcore-milliseconds taken by all map tasks=63451 Total vcore-milliseconds taken by all reduce tasks=4667 Total megabyte-milliseconds taken by all map tasks=129947648 Total megabyte-milliseconds taken by all reduce tasks=19116032 Map-Reduce Framework Map input records=10 Map output records=50 Map output bytes=740 Map output materialized bytes=900 Input split bytes=1170 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=900 Reduce input records=50 Reduce output records=5 Spilled Records=100 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=1385 CPU time spent (ms)=23420 Physical memory (bytes) snapshot=15370592256 Virtual memory (bytes) snapshot=43200081920 Total committed heap usage (bytes)=16409690112 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1120 File Output Format Counters Bytes Written=80 17/05/29 03:30:56 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read 17/05/29 03:30:56 INFO fs.TestDFSIO: Date & time: Mon May 29 03:30:56 EDT 2017 17/05/29 03:30:56 INFO fs.TestDFSIO: Number of files: 10 17/05/29 03:30:56 INFO fs.TestDFSIO: Total MBytes processed: 500.0 17/05/29 03:30:56 INFO fs.TestDFSIO: Throughput mb/sec: 1945.5252918287938 17/05/29 03:30:56 INFO fs.TestDFSIO: Average IO rate mb/sec: 1950.8646240234375 17/05/29 03:30:56 INFO fs.TestDFSIO: IO rate std deviation: 102.10763308338827 17/05/29 03:30:56 INFO fs.TestDFSIO: Test exec time sec: 24.621 17/05/29 03:30:56 INFO fs.TestDFSIO:
Check the local TestDFSIO_results.log file for metric details for tests above. The following is an example:
$ cat TestDFSIO_results.log ----- TestDFSIO ----- : write 17/05/29 03:29:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 17/05/29 03:29:55 INFO fs.TestDFSIO: Date & time: Mon May 29 03:29:55 EDT 2017 17/05/29 03:29:55 INFO fs.TestDFSIO: Number of files: 10 17/05/29 03:29:55 INFO fs.TestDFSIO: Total MBytes processed: 500.0 17/05/29 03:29:55 INFO fs.TestDFSIO: Throughput mb/sec: 50.73566717402334 17/05/29 03:29:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 52.77006149291992 17/05/29 03:29:55 INFO fs.TestDFSIO: IO rate std deviation: 11.648531487475152 17/05/29 03:29:55 INFO fs.TestDFSIO: Test exec time sec: 31.779 17/05/29 03:29:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read 17/05/29 03:30:56 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read 17/05/29 03:30:56 INFO fs.TestDFSIO: Date & time: Mon May 29 03:30:56 EDT 2017 17/05/29 03:30:56 INFO fs.TestDFSIO: Number of files: 10 17/05/29 03:30:56 INFO fs.TestDFSIO: Total MBytes processed: 500.0 17/05/29 03:30:56 INFO fs.TestDFSIO: Throughput mb/sec: 1945.5252918287938 17/05/29 03:30:56 INFO fs.TestDFSIO: Average IO rate mb/sec: 1950.8646240234375 17/05/29 03:30:56 INFO fs.TestDFSIO: IO rate std deviation: 102.10763308338827 17/05/29 03:30:56 INFO fs.TestDFSIO: Test exec time sec: 24.621 17/05/29 03:30:56 INFO fs.TestDFSIO:
Note : Observe monitoring metrics while running these tests. If there are any issues, review the HDFS and MapReduce logs and tune or adjust the cluster accordingly.
After performing Stress Testing,please perform clean up to avoid unwanted space utilization on your cluster.
[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -clean 17/05/29 03:46:03 INFO fs.TestDFSIO: TestDFSIO.1.8 17/05/29 03:46:03 INFO fs.TestDFSIO: nrFiles = 1 17/05/29 03:46:03 INFO fs.TestDFSIO: nrBytes (MB) = 1.0 17/05/29 03:46:03 INFO fs.TestDFSIO: bufferSize = 1000000 17/05/29 03:46:03 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 17/05/29 03:46:04 INFO fs.TestDFSIO: Cleaning up test files [s0998dnz@m1.hdp22 ~]$ hadoop fs -ls /benchmarks