Author Archives: admin

  • 0

Run all service checks in bulk

In this blogs I tried to explain that how you can use ambari API to trigger all Service Checks with a single command.

In order to check the status and stability of any service in your cluster you need to run the service checks that are included in Ambari. Usually each Service provides its own service check in ambari and to run a service check you have to select the service (e.g. HDFS) in Ambari and click “Run Service Check” in the “Actions” dropdown menu.

But its a tedious job to run every service check one by one in case if we have many services. So I created this script by using Ambari API to start all available service checks via single script. Only thing you need to pass required parameters according to your env.

Example:

[s0998dnz@m1.hdp22 ~]$ ./run_all_service_checks.sh
Enter Ambari server name : m1.hdp22
Enter Ambari admin's User Name: saurkuma
Enter Password for saurkuma : 
Your cluster name is: HDPTST
There are following running services :
FALCON
FLUME
HBASE
HDFS
HIVE
KAFKA
KNOX
MAHOUT
MAPREDUCE2
OOZIE
PIG
RANGER
RANGER_KMS
SLIDER
SPARK
SQOOP
STORM
TEZ
YARN
ZOOKEEPER
{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1315",
  "Requests" : {
    "id" : 1315,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1316",
  "Requests" : {
    "id" : 1316,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1317",
  "Requests" : {
    "id" : 1317,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1318",
  "Requests" : {
    "id" : 1318,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1319",
  "Requests" : {
    "id" : 1319,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1320",
  "Requests" : {
    "id" : 1320,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1321",
  "Requests" : {
    "id" : 1321,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1322",
  "Requests" : {
    "id" : 1322,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1323",
  "Requests" : {
    "id" : 1323,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1324",
  "Requests" : {
    "id" : 1324,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1325",
  "Requests" : {
    "id" : 1325,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1326",
  "Requests" : {
    "id" : 1326,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1327",
  "Requests" : {
    "id" : 1327,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1328",
  "Requests" : {
    "id" : 1328,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1329",
  "Requests" : {
    "id" : 1329,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1330",
  "Requests" : {
    "id" : 1330,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1331",
  "Requests" : {
    "id" : 1331,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1332",
  "Requests" : {
    "id" : 1332,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1333",
  "Requests" : {
    "id" : 1333,
    "status" : "Accepted"
  }
}{
  "href" : "http://m1.hdp22:8080/api/v1/clusters/HDPTST/requests/1334",
  "Requests" : {
    "id" : 1334,
    "status" : "Accepted"
  }
}

Now if you will login to your ambari server you will see all service checks are running. So now you will be thinking which script is doing this magic,so don’t worry here is script. You use it and enjoy.

[s0998dnz@m1.hdp22 ~]$ cat run_all_service_checks.sh 
#!/usr/bin/env bash
###########################
## Saurabh Singh ###
### Version 1.0 ####
###########################
echo -n "Enter Ambari server name : "
read "server"
AMBARI_HOST=$server
echo -n "Enter Ambari admin's User Name: "
read "user"
echo -n "Enter Password for $user : "
read -s "pwd"
LOGIN=$user
PASSWORD=$pwd
if [ -e "~/.ambari_login" ]; then
    . ~/.ambari_login
fi

cluster_name=$(curl -s -u $LOGIN:$PASSWORD --insecure "http://$AMBARI_HOST:8080/api/v1/clusters"  | python -mjson.tool | perl -ne '/"cluster_name":.*?"(.*?)"/ && print "$1\n"')
if [ -z "$cluster_name" ]; then
    exit
fi
echo -e "\nYour cluster name is: $cluster_name"

running_services=$(curl -s -u $LOGIN:$PASSWORD --insecure "http://$AMBARI_HOST:8080/api/v1/clusters/$cluster_name/services?fields=ServiceInfo/service_name&ServiceInfo/maintenance_state=OFF" | python -mjson.tool | perl -ne '/"service_name":.*?"(.*?)"/ && print "$1\n"')
if [ -z "$running_services" ]; then
    exit
fi
echo "There are following running services :
$running_services"

post_body=
for s in $running_services; do
    if [ "$s" == "ZOOKEEPER" ]; then
        post_body="{\"RequestInfo\":{\"context\":\"$s Service Check\",\"command\":\"${s}_QUORUM_SERVICE_CHECK\"},\"Requests/resource_filters\":[{\"service_name\":\"$s\"}]}"

    else
        post_body="{\"RequestInfo\":{\"context\":\"$s Service Check\",\"command\":\"${s}_SERVICE_CHECK\"},\"Requests/resource_filters\":[{\"service_name\":\"$s\"}]}"
    fi
    curl -s -u $LOGIN:$PASSWORD --insecure -H "X-Requested-By:X-Requested-By" -X POST --data "$post_body"  "http://$AMBARI_HOST:8080/api/v1/clusters/$cluster_name/requests"
done

As always I welcome your valuable feedback or any suggestion.


  • 0

Enable Debug mode in beeline

Some time you have to troubleshoot beeline issue and then you think how to get into debug mode for beeline command shell as you have in hive (-hiveconf hive.root.logger=Debug,console). I know same is not going to work with beeline
So don’t worry following steps will help you and good part is you do not need to restart the hiveserve2.

Step 1: Login to your server and check whether you have beeline-log4j.properties file in /etc/hive/conf/ or not if not then copy the Beeline log4j property file from the given template.

[s0998dnz@m1.hdp22 ~]$ ll /etc/hive/conf/beeline-log4j.properties
ls: cannot access /etc/hive/conf/beeline-log4j.properties: No such file or directory
[s0998dnz@m1.hdp22 ~]$ ll /etc/hive/conf/beeline-log4j.properties.template
-rw-r--r-- 1 root root 1139 Nov 19  2014 /etc/hive/conf/beeline-log4j.properties.template
[s0998dnz@m1.hdp22 ~]$ cp /etc/hive/conf/beeline-log4j.properties.template /etc/hive/conf/beeline-log4j.properties
cp: cannot create regular file `/etc/hive/conf/beeline-log4j.properties': Permission denied
[s0998dnz@m1.hdp22 ~]$ sudo su - hive
[hive@m1.hdp22 ~]$ cp /etc/hive/conf/beeline-log4j.properties.template /etc/hive/conf/beeline-log4j.properties
[hive@m1.hdp22 ~]$ ll /etc/hive/conf/beeline-log4j.properties
-rw-r--r-- 1 hive hadoop 1139 May 31 03:34 /etc/hive/conf/beeline-log4j.properties
[hive@m1.hdp22 ~]$ cat /etc/hive/conf/beeline-log4j.properties
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

log4j.rootLogger=WARN, console

######## console appender ########
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} [%t]: %p %c{2}: %m%n
log4j.appender.console.encoding=UTF-8
[hive@m1.hdp22 ~]$

Step 2: Now open /etc/hive/conf/beeline-log4j.properties and change log4j.rootLogger from WARN/INFO to DEBUG, console.
log4j.rootLogger=DEBUG, console

Save the changes, run Beeline client and debug output should be displayed.

Please feel free to give your valuable feedback or suggestion.


  • 0

hadoop cluster Benchmarking and Stress Testing

When we install our cluster then we should do some benchmarking or Stress Testing. So in this article I have explained a inbuilt TestDFSIO functionality which will help you to to perform Stress Testing on your configured cluster.

The Hadoop distribution comes with a number of benchmarks, which are bundled in hadoop-*test*.jar and hadoop-*examples*.jar.

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-*test*.jar
Unknown program '/usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar' chosen.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-*example*.jar
Unknown program '/usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-examples.jar' chosen.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

The TestDFSIO benchmark is a read and write test for HDFS. It is helpful for tasks such as stress testing HDFS, to discover performance bottlenecks in your network, to shake out the hardware, OS and Hadoop setup of your cluster machines (particularly the NameNode and the DataNodes) and to give you a first impression of how fast your cluster is in terms of I/O.

From the command line, run the following command to test writing of 10 output files of size 500MB for a total of 5GB:

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 50
17/05/29 03:29:19 INFO fs.TestDFSIO: TestDFSIO.1.8
17/05/29 03:29:19 INFO fs.TestDFSIO: nrFiles = 10
17/05/29 03:29:19 INFO fs.TestDFSIO: nrBytes (MB) = 50.0
17/05/29 03:29:19 INFO fs.TestDFSIO: bufferSize = 1000000
17/05/29 03:29:19 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/05/29 03:29:21 INFO fs.TestDFSIO: creating control file: 52428800 bytes, 10 files
17/05/29 03:29:23 INFO fs.TestDFSIO: created control files for: 10 files
17/05/29 03:29:23 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:29:23 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:29:23 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
17/05/29 03:29:23 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
17/05/29 03:29:23 INFO mapred.FileInputFormat: Total input paths to process : 10
17/05/29 03:29:23 INFO mapreduce.JobSubmitter: number of splits:10
17/05/29 03:29:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494832799027_0142
17/05/29 03:29:24 INFO impl.YarnClientImpl: Submitted application application_1494832799027_0142
17/05/29 03:29:24 INFO mapreduce.Job: The url to track the job: http://m2.hdp22:8088/proxy/application_1494832799027_0142/
17/05/29 03:29:24 INFO mapreduce.Job: Running job: job_1494832799027_0142
17/05/29 03:29:31 INFO mapreduce.Job: Job job_1494832799027_0142 running in uber mode : false
17/05/29 03:29:31 INFO mapreduce.Job: map 0% reduce 0%
17/05/29 03:29:46 INFO mapreduce.Job: map 30% reduce 0%
17/05/29 03:29:47 INFO mapreduce.Job: map 50% reduce 0%
17/05/29 03:29:48 INFO mapreduce.Job: map 60% reduce 0%
17/05/29 03:29:51 INFO mapreduce.Job: map 80% reduce 0%
17/05/29 03:29:52 INFO mapreduce.Job: map 90% reduce 0%
17/05/29 03:29:53 INFO mapreduce.Job: map 100% reduce 0%
17/05/29 03:29:54 INFO mapreduce.Job: map 100% reduce 100%
17/05/29 03:29:54 INFO mapreduce.Job: Job job_1494832799027_0142 completed successfully
17/05/29 03:29:55 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=835
FILE: Number of bytes written=1717691
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2290
HDFS: Number of bytes written=524288077
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=103814
Total time spent by all reduces in occupied slots (ms)=7846
Total time spent by all map tasks (ms)=103814
Total time spent by all reduce tasks (ms)=3923
Total vcore-milliseconds taken by all map tasks=103814
Total vcore-milliseconds taken by all reduce tasks=3923
Total megabyte-milliseconds taken by all map tasks=212611072
Total megabyte-milliseconds taken by all reduce tasks=16068608
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=729
Map output materialized bytes=889
Input split bytes=1170
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=889
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=4456
CPU time spent (ms)=59400
Physical memory (bytes) snapshot=15627186176
Virtual memory (bytes) snapshot=43288719360
Total committed heap usage (bytes)=16284385280
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=77
17/05/29 03:29:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
17/05/29 03:29:55 INFO fs.TestDFSIO: Date & time: Mon May 29 03:29:55 EDT 2017
17/05/29 03:29:55 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:29:55 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:29:55 INFO fs.TestDFSIO: Throughput mb/sec: 50.73566717402334
17/05/29 03:29:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 52.77006149291992
17/05/29 03:29:55 INFO fs.TestDFSIO: IO rate std deviation: 11.648531487475152
17/05/29 03:29:55 INFO fs.TestDFSIO: Test exec time sec: 31.779
17/05/29 03:29:55 INFO fs.TestDFSIO:

From the command line, run the following command to test reading 10 input files of size 500MB:

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 500
17/05/29 03:30:29 INFO fs.TestDFSIO: TestDFSIO.1.8
17/05/29 03:30:29 INFO fs.TestDFSIO: nrFiles = 10
17/05/29 03:30:29 INFO fs.TestDFSIO: nrBytes (MB) = 500.0
17/05/29 03:30:29 INFO fs.TestDFSIO: bufferSize = 1000000
17/05/29 03:30:29 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/05/29 03:30:30 INFO fs.TestDFSIO: creating control file: 524288000 bytes, 10 files
17/05/29 03:30:31 INFO fs.TestDFSIO: created control files for: 10 files
17/05/29 03:30:32 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:30:32 INFO client.AHSProxy: Connecting to Application History server at m2.hdp22/172.29.90.11:10200
17/05/29 03:30:32 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
17/05/29 03:30:32 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
17/05/29 03:30:32 INFO mapred.FileInputFormat: Total input paths to process : 10
17/05/29 03:30:32 INFO mapreduce.JobSubmitter: number of splits:10
17/05/29 03:30:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494832799027_0143
17/05/29 03:30:32 INFO impl.YarnClientImpl: Submitted application application_1494832799027_0143
17/05/29 03:30:32 INFO mapreduce.Job: The url to track the job: http://m2.hdp22:8088/proxy/application_1494832799027_0143/
17/05/29 03:30:32 INFO mapreduce.Job: Running job: job_1494832799027_0143
17/05/29 03:30:39 INFO mapreduce.Job: Job job_1494832799027_0143 running in uber mode : false
17/05/29 03:30:39 INFO mapreduce.Job: map 0% reduce 0%
17/05/29 03:30:47 INFO mapreduce.Job: map 10% reduce 0%
17/05/29 03:30:48 INFO mapreduce.Job: map 60% reduce 0%
17/05/29 03:30:54 INFO mapreduce.Job: map 70% reduce 0%
17/05/29 03:30:55 INFO mapreduce.Job: map 100% reduce 0%
17/05/29 03:30:56 INFO mapreduce.Job: map 100% reduce 100%
17/05/29 03:30:56 INFO mapreduce.Job: Job job_1494832799027_0143 completed successfully
17/05/29 03:30:56 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=846
FILE: Number of bytes written=1717691
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=524290290
HDFS: Number of bytes written=80
HDFS: Number of read operations=53
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=63451
Total time spent by all reduces in occupied slots (ms)=9334
Total time spent by all map tasks (ms)=63451
Total time spent by all reduce tasks (ms)=4667
Total vcore-milliseconds taken by all map tasks=63451
Total vcore-milliseconds taken by all reduce tasks=4667
Total megabyte-milliseconds taken by all map tasks=129947648
Total megabyte-milliseconds taken by all reduce tasks=19116032
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=740
Map output materialized bytes=900
Input split bytes=1170
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=900
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1385
CPU time spent (ms)=23420
Physical memory (bytes) snapshot=15370592256
Virtual memory (bytes) snapshot=43200081920
Total committed heap usage (bytes)=16409690112
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=80
17/05/29 03:30:56 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
17/05/29 03:30:56 INFO fs.TestDFSIO: Date & time: Mon May 29 03:30:56 EDT 2017
17/05/29 03:30:56 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:30:56 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:30:56 INFO fs.TestDFSIO: Throughput mb/sec: 1945.5252918287938
17/05/29 03:30:56 INFO fs.TestDFSIO: Average IO rate mb/sec: 1950.8646240234375
17/05/29 03:30:56 INFO fs.TestDFSIO: IO rate std deviation: 102.10763308338827
17/05/29 03:30:56 INFO fs.TestDFSIO: Test exec time sec: 24.621
17/05/29 03:30:56 INFO fs.TestDFSIO:

Check the local TestDFSIO_results.log file for metric details for tests above. The following is an example:

$ cat TestDFSIO_results.log
----- TestDFSIO ----- : write
17/05/29 03:29:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
17/05/29 03:29:55 INFO fs.TestDFSIO: Date & time: Mon May 29 03:29:55 EDT 2017
17/05/29 03:29:55 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:29:55 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:29:55 INFO fs.TestDFSIO: Throughput mb/sec: 50.73566717402334
17/05/29 03:29:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 52.77006149291992
17/05/29 03:29:55 INFO fs.TestDFSIO: IO rate std deviation: 11.648531487475152
17/05/29 03:29:55 INFO fs.TestDFSIO: Test exec time sec: 31.779
17/05/29 03:29:55 INFO fs.TestDFSIO:

----- TestDFSIO ----- : read
17/05/29 03:30:56 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
17/05/29 03:30:56 INFO fs.TestDFSIO: Date & time: Mon May 29 03:30:56 EDT 2017
17/05/29 03:30:56 INFO fs.TestDFSIO: Number of files: 10
17/05/29 03:30:56 INFO fs.TestDFSIO: Total MBytes processed: 500.0
17/05/29 03:30:56 INFO fs.TestDFSIO: Throughput mb/sec: 1945.5252918287938
17/05/29 03:30:56 INFO fs.TestDFSIO: Average IO rate mb/sec: 1950.8646240234375
17/05/29 03:30:56 INFO fs.TestDFSIO: IO rate std deviation: 102.10763308338827
17/05/29 03:30:56 INFO fs.TestDFSIO: Test exec time sec: 24.621
17/05/29 03:30:56 INFO fs.TestDFSIO:

Note : Observe monitoring metrics while running these tests. If there are any issues, review the HDFS and MapReduce logs and tune or adjust the cluster accordingly.

After performing Stress Testing,please perform clean up to avoid unwanted space utilization on your cluster.

[s0998dnz@m1.hdp22 ~]$ hadoop jar /usr/hdp/2.6.0.3-8/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8-tests.jar TestDFSIO -clean
17/05/29 03:46:03 INFO fs.TestDFSIO: TestDFSIO.1.8
17/05/29 03:46:03 INFO fs.TestDFSIO: nrFiles = 1
17/05/29 03:46:03 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
17/05/29 03:46:03 INFO fs.TestDFSIO: bufferSize = 1000000
17/05/29 03:46:03 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/05/29 03:46:04 INFO fs.TestDFSIO: Cleaning up test files
[s0998dnz@m1.hdp22 ~]$ hadoop fs -ls /benchmarks

  • 1

Atlas Metadata Server error HTTP 503 response from http://localhost:21000/api/atlas/admin/status in 0.000s (HTTP Error 503: Service Unavailable)

In case if you are not able to access your atlas portal or you see following error in your browser or logs.

HTTP 503 response from http://localhost:21000/api/atlas/admin/status in 0.000s (HTTP Error 503: Service Unavailable)

Then please check application.log file in /var/log/atlas location and if you see following error in logs then do not worry,following the given steps and you would resolve it easily.

Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name ‘userService’: Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private org.apache.atlas.web.dao.UserDao org.apache.atlas.web.service.UserService.userDao; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name ‘userDao’: Invocation of init method failed; nested exception is java.lang.RuntimeException: org.apache.atlas.AtlasException: /usr/hdp/current/atlas-server/conf/users-credentials.properties not found in file system or as class loader resource

or

/usr/hdp/current/atlas-server/conf/policy-store.txt not found in file system or as class loader resource

Resolution: 

Step 1: login as atlas user or sudo to atlas then goto /usr/hdp/current/atlas-server/conf/ dir and create these files.

[s0998dnz@m1 ~]$ sudo su – atlas

[atlas@m1 ~]$ cd /usr/hdp/current/atlas-server/conf/

[atlas@m1 conf]$ touch users-credentials.properties

[atlas@m1 conf]$ touch policy-store.txt

Step 2: Now you have to update users-credentials.properties files according to your requirement. but formate would be like  “username=group::sha256-password “
e.x in my case I have following

admin=ADMIN::e7cf3ef4f17c3999a94f2c6f612e8a888e5b1026878e4e19398b23bd38ec221a

Users group can be either ADMIN, DATA_STEWARD OR DATA_SCIENTIST

Note:-password is encoded with sha256 encoding method and can be generated using unix tool.

For e.g.

echo -n “Password” | sha256sum
e7cf3ef4f17c3999a94f2c6f612e8a888e5b1026878e4e19398b23bd38ec221a –

And policy-store.txt should have following values. 

The policy store file format is as follows:
Policy_Name;;User_Name:Operations_Allowed;;Group_Name:Operations_Allowed;;Resource_Type:Resource_Name

eg. of my policy file:

adminPolicy;;admin:rwud;;ROLE_ADMIN:rwud;;type:*,entity:*,operation:*,taxonomy:*,term:*

Now restart atlas and you should be good with atlas.


  • 0

extend your VirtualBox image size

When you first time use your HDP sandbox in VirtualBox then by default it assign 20GB of your harddisk to your sandbox. But later as far as I know this would not be enough size and you want to extend size.Then this article will help you to extend your VBox size.

Step 1: Right click on that Virtual Machine where you would like to extend the size and click on settings and then go to storage.

Step 2: Now you need to click on + symbol near “Controller:Sata”  and click on “Create New Disk”. 

Step 3: Select “VDI(Virtual Disk Image)”  and continue :

Step 4: Select Dynamic Allocation and continue:

Step 5: Select Your size which you would like to extend (e.x 10 GB) and then click on create.

So here you will see following one more sata disk with the name you provided.

Now you have to start the server and login to the shell to perform following steps:

 

[root@m1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_m1-lv_root
                       18G  9.7G  6.7G  60% /
tmpfs                 939M     0  939M   0% /dev/shm
/dev/sda1             477M   25M  427M   6% /boot
[root@m1 ~]# sfdisk -s
/dev/sda:  20971520
/dev/sdb:  11104256
/dev/mapper/vg_m1-lv_root:  18358272
/dev/mapper/vg_m1-lv_swap:   2097152
total: 52531200 blocks
[root@m1 ~]# vgextend vg_m1 /dev/sdb
  Physical volume "/dev/sdb" successfully created
  Volume group "vg_m1" successfully extended
[root@m1 ~]# lvextend -L +10G -r /dev/mapper/vg_m1-lv_root
  Size of logical volume vg_m1/lv_root changed from 17.51 GiB (4482 extents) to 27.51 GiB (7042 extents).
  Logical volume lv_root successfully resized
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/mapper/vg_m1-lv_root is mounted on /; on-line resizing required
old desc_blocks = 2, new_desc_blocks = 2
Performing an on-line resize of /dev/mapper/vg_m1-lv_root to 7211008 (4k) blocks.
The filesystem on /dev/mapper/vg_m1-lv_root is now 7211008 blocks long.

[root@m1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_m1-lv_root
                       27G  9.7G   16G  38% /
tmpfs                 939M     0  939M   0% /dev/shm
/dev/sda1             477M   25M  427M   6% /boot


  • 0

Could not create http connection to jdbc:hive2:HTTP Response code: 413 (state=08S01,code=0)

If you are using HiveServer2 in HTTP transport mode, then the authentication information is sent as part of HTTP headers. And the above error occurs when the default buffer size is set and the HTTP size is insufficient also using Kerberos is used.

This is a known issue and a bug (https://issues.apache.org/jira/browse/HIVE-11720) has been raised to be addressed in a future release.

Workaround:

To resolve this issue, Set the following properties for a bigger HTTP header size in HiveServer properties:
hive.server2.thrift.http.response.header.size to 32768


  • 0

Error: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions

If you try to connect to phoenix server from hbase or you do some service checks then if you are facing following error then do not worry,be relax as here you will find solution of this problem.

Error :

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.0-3485/phoenix/phoenix-4.4.0.2.3.4.0-3485-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.0-3485/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Error: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=saurkuma, scope=default:SYSTEM.CATALOG, params=[table=default:SYSTEM.CATALOG],action=CREATE)
at org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:433)
at org.apache.hadoop.hbase.security.access.AccessController.preGetTableDescriptors(AccessController.java:2447)
at org.apache.hadoop.hbase.master.MasterCoprocessorHost$75.call(MasterCoprocessorHost.java:896)

org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=saurkuma, scope=default:SYSTEM.CATALOG, params=[table=default:SYSTEM.CATALOG],action=CREATE)

Root cause : You are getting an Access control exception, because by default the user is not allowed to create a table.

Resolution: You should give either global level or namespace level privileges to the desired user so that that user can create a table.Out of the box, only the HBase user will have permissions to grant other permissions, so you have to log in as the hbase user.

[s0998dnz@m1.hdp22 ~]$ sudo su - hbase
Last login: Wed Apr 5 14:52:41 EDT 2017
[hbase@m1.hdp22 ~]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.0-3485/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.0-3485/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.3.4.0-3485, ra2b7fd38f4dcca6eb81f99b5d37b2ea3beeef09e, Wed Dec 16 03:49:28 UTC 2015
hbase(main):001:0> grant 'saurkuma', 'RWCA'
0 row(s) in 1.3310 seconds

hbase(main):002:0> grant 'ambari-qa', 'RWCA'
0 row(s) in 0.0830 seconds

Please feel free to give your valuable feedback or suggestion.


  • 0

Ambari is showing “Add Service Wizard in Progress” or “Move Master Wizard In Progress”

If you are using ambari 2.4.1 or 2.4.2 then you may see following message in your ambari page and you will not get any option to “Service Action” to restart or doing anything to any services.

Root Cause : If there are more than one Ambari Admin users present. Then if one of the admin user say “admin1” click on the “Actions” => “Add Service” in the ambari UI and does nothing. Then other logged in admin users will keep seeing the “Add Service Wizard in Progress” blinking bar in the ambari UI and the “service action” dropdown button will not be visible to them.It is captured by following jira also.

https://issues.apache.org/jira/browse/AMBARI-18932

Solution: Take ambari server backup and run below command from Ambari node.

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d 
'{"wizard-data":"{\"userName\":\"admin\",\"controllerName\":\"addServiceController\"}"}' 
http://<ambari_server_name>.lowes.com:8080/api/v1/persist

or for “Move Master Wizard In Progress” message do the following :

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d 
'{"wizard-data":"{\"userName\":\"admin\",\"controllerName\":\"moveMasterController\"}"}' 
http://&lt;ambari_server_name&gt;.lowes.com:8080/api/v1/persist

 

Note : Replace the following value as per your cluster config.
username = the user for which you are facing issue
Ambari_host = hostname of ambari node
controller_name = name of the controller for which you are making request.

 

There are few things to be aware of

1. This will resolve issue for the current logged in user but other users will have same issue of seeing “Add service wizard” in progress label unless the current logged in user logs off and logs in back at which user will be directed to add service wizard and then cancels add service wizard. After that this issue will go away for all other ambari users.

2. As said above after the execution of the API if current user signs out and signs back in then user will be directly redirected to add service wizard.

Please feel free to give your valuable feedback or suggestion.


  • 0

java.lang.IllegalArgumentException: stream exceeds limit [2,048]

When we run oozie job with SSH action and we use capture output then it may fail with following error.

java.lang.IllegalArgumentException: stream exceeds limit [2,048]
at org.apache.oozie.util.IOUtils.getReaderAsString(IOUtils.java:84)
at org.apache.oozie.servlet.CallbackServlet.doPost(CallbackServlet.java:117)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.oozie.servlet.JsonRestServlet.service(JsonRestServlet.java:304)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:86)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:620)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:745)

Root Cause: It is because of insufficient value for oozie.servlet.CallbackServlet.max.data.len property in oozie-site.xml. In this case it was set to 2048, which wasn’t sufficient for the output.
Resolution:

Option 1: If you are using oozie version before 4.2 then add the following property in oozie-site.xml or add in custom oozie-site via Ambari. 

<property>
 <name>oozie.servlet.CallbackServlet.max.data.len</name>
 <value>16000</value>
</property>

Option 2: If you are using oozie version after 4.2 then add the following property in oozie-site.xml or add in custom oozie-site via Ambari.

<property>
<name>oozie.action.max.output.data</name>
<value>16000</value>
</property>

 

I hope it helped you,feel free to give your valuable feedback suggestions.


  • 0

hadoop snapshots

Hdfs snapshots are to protect important enterprise data sets from user or application errors.HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:

To demonstrate functionality of snapshots, we will create a directory in HDFS, will create its snapshot and will remove a file from the directory. Later, we will demonstrate how to recover the file from the snapshot.

First we will try to get all the snapshottable directories where the current user has permission to take snapshtos.

[hdfs@m1 ~]$ hdfs lsSnapshottableDir

here we have noticed that there is no dir which is snapshottable.

So now lets create a demo dir and and then we will create a snapshot on top of that dir.

[hdfs@m1 ~]$ hdfs dfs -mkdir /tmp/snapshot_demo
[hdfs@m1 ~]$ touch demo.txt
[hdfs@m1 ~]$ hadoop fs -put demo.txt  /tmp/snapshot_demo/
[hdfs@m1 ~]$ hdfs dfsadmin -allowSnapshot /tmp/snapshot_demo
Allowing snaphot on /tmp/snapshot_demo succeeded

Now if you will check the list of snapshottable dirs then you should get at-least above snapshot_demo.

[hdfs@m1 ~]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs hdfs 0 2017-03-30 03:31 0 65536 /tmp/snapshot_demo

Now lets create a snapshot on top /tmp/snapshot_demo and then check whether its created or not.

[hdfs@m1 ~]$ hdfs dfs -createSnapshot /tmp/snapshot_demo
Created snapshot /tmp/snapshot_demo/.snapshot/s20170330-033236.441
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/demo.txt
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot
Found 1 items
drwxr-xr-x   - hdfs hdfs          0 2017-03-30 03:32 /tmp/snapshot_demo/.snapshot/s20170330-033236.441
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot/s20170330-033236.441/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt 

Accidentally delete this snapshottable dir or files.

[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo
rm: The directory /tmp/snapshot_demo cannot be deleted since /tmp/snapshot_demo is snapshottable and already has snapshots
[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo/demo.txt
Deleted /tmp/snapshot_demo/demo.txt
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/

Oppsss… Surprisingly or not, the file was removed! What a bad day! What a horrible accident! Do not worry too much, however.We can recover this file because we have a snapshot!

[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot/s20170330-033236.441/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt
[hdfs@m1 ~]$ hadoop fs -cp /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt /tmp/snapshot_demo/
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:35 /tmp/snapshot_demo/demo.txt

This will restore the lost set of files to the working data set.

Also you can not delete snapshots, and it is because snapshots are read-only, HDFS will also protect against user or application deletion of the snapshot data itself. The following operation will fail:

[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo/.snapshot/s20170330-033236.441
rm: Modification on a read-only snapshot is disallowed

I hope it helped to understand snapshots,feel free to give your valuable feedback or suggestions.