Hadoop Admin most lovable commands
Category : Bigdata
If you are working on hadoop and you want to know about your cluster or you want to control your hadoop cluster then following commands should be handy to you. In this article i have tried to explain few commands which will help you a lot to do your day to day works.
- hdfs dfsadmin -report : It will give you summarize view of your hadoop cluster like size,live nodes and their utilization.
[hdfs@m1]$ hdfs dfsadmin -report
Configured Capacity: 51886964736 (48.32 GB)
Present Capacity: 27887029262 (25.97 GB)
DFS Remaining: 24417319950 (22.74 GB)
DFS Used: 3469709312 (3.23 GB)
DFS Used%: 12.44%
Under replicated blocks: 2
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 2
————————————————-
Live datanodes (3):
————————————————-
2. hdfs dfsadmin -safemode get|enter| leave : It will tell you whether your NN is in safemode or not. if NN is in safemode then you case leave option with main command.
[hdfs@m1]$ hdfs dfsadmin -safemode get
Safe mode is OFF in m1.hdp22/192.168.56.41:8020
Safe mode is OFF in m2.hdp22/192.168.56.42:8020
3. hadoop version : It will help you to get which hadoop version you are using:
[hdfs@m1]$ hadoop version
Hadoop 2.7.1.2.3.4.0-3485
Subversion git@github.com:hortonworks/hadoop.git -r ef0582ca14b8177a3cbb6376807545272677d730
Compiled by jenkins on 2015-12-16T03:01Z
Compiled with protoc 2.5.0
From source with checksum cf48a4c63aaec76a714c1897e2ba8be6
This command was run using /usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485.jar
4. classpath : This command will help you to know your hadoop class path, which will help you to get the Hadoop jar and the required libraries:
[hdfs@m1 ~]$ hadoop classpath
/usr/hdp/2.3.4.0-3485/hadoop/conf:/usr/hdp/2.3.4.0-3485/hadoop/lib/*:/usr/hdp/2.3.4.0-3485/hadoop/.//*:/usr/hdp/2.3.4.0-3485/hadoop-hdfs/./:/usr/hdp/2.3.4.0-3485/hadoop-hdfs/lib/*:/usr/hdp/2.3.4.0-3485/hadoop-hdfs/.//*:/usr/hdp/2.3.4.0-3485/hadoop-yarn/lib/*:/usr/hdp/2.3.4.0-3485/hadoop-yarn/.//*:/usr/hdp/2.3.4.0-3485/hadoop-mapreduce/lib/*:/usr/hdp/2.3.4.0-3485/hadoop-mapreduce/.//*::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/hdp/2.3.4.0-3485/tez/*:/usr/hdp/2.3.4.0-3485/tez/lib/*:/usr/hdp/2.3.4.0-3485/tez/conf
5. hadoop queue : This command will help you to get information about your yarn queue :
Usage: hadoop queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]
[hdfs@m1 ~]$ hadoop queue -list
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.
16/08/09 05:44:35 INFO impl.TimelineClientImpl: Timeline service address: http://m2.hdp22:8188/ws/v1/timeline/
16/08/09 05:44:36 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
======================
Queue Name : batch
Queue State : running
Scheduling Info : Capacity: 30.000002, MaximumCapacity: 60.000004, CurrentCapacity: 0.0
======================
Queue Name : default
Queue State : running
Scheduling Info : Capacity: 30.000002, MaximumCapacity: 90.0, CurrentCapacity: 0.0
======================
Queue Name : user
Queue State : running
Scheduling Info : Capacity: 40.0, MaximumCapacity: 40.0, CurrentCapacity: 0.0
======================
Queue Name : ado
Queue State : running
Scheduling Info : Capacity: 40.0, MaximumCapacity: 100.0, CurrentCapacity: 0.0
======================
Queue Name : aodp
Queue State : running
Scheduling Info : Capacity: 40.0, MaximumCapacity: 40.0, CurrentCapacity: 0.0
======================
Queue Name : di
Queue State : running
Scheduling Info : Capacity: 20.0, MaximumCapacity: 23.0, CurrentCapacity: 0.0
Or you can get information about a specific queue.
[hdfs@m1 ~]$ hadoop queue -info ado
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.
16/08/09 05:49:14 INFO impl.TimelineClientImpl: Timeline service address: http://m2.hdp22:8188/ws/v1/timeline/
16/08/09 05:49:15 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
======================
Queue Name : ado
Queue State : running
Scheduling Info : Capacity: 40.0, MaximumCapacity: 100.0, CurrentCapacity: 0.0
6. yarn job -kill <job_id> : It will help you to kill your running mapred job:
yarn job -kill job_1462173172032_31967 or you can kill your running application by following command.
yarn application -kill application_1462173172032_31967
7. hadoop distcp : It will help you to copy file or directories recursively within cluster or from one cluster to another cluster:
[hdfs@m1 ~]$ hadoop distcp hdfs://HDPINFHA/user/s0998dnz/input.txt hdfs://HDPTSTHA/tmp/
Note: HDPINFHA and HDPTSTHA both are namenode high availability id
8. hadoop archive -archiveName <your_archive_name>.har -p <path_to_be_archive> <dir_to_be_archive> <destination>: This will hep you to hadoop archive yoru hdfs files.
[hdfs@m1 ~]$ hadoop archive -archiveName testing.har -p /user saurabh /test
It will run a mapred job and will archive your dir.
[hdfs@m1 ~]$ hadoop fs -ls /test/
Found 1 items
drwxr-xr-x – hdfs hdfs 0 2016-08-09 06:09 /test/testing.har
If you want to list out inside archival file then you can not read by normal ls command. You have to use -lsr like below:
[hdfs@m1 ~]$ hadoop fs -lsr /test/testing.har
lsr: DEPRECATED: Please use ‘ls -R’ instead.
-rw-r–r– 3 hdfs hdfs 0 2016-08-09 06:09 /test/testing.har/_SUCCESS
-rw-r–r– 5 hdfs hdfs 565 2016-08-09 06:09 /test/testing.har/_index
-rw-r–r– 5 hdfs hdfs 23 2016-08-09 06:09 /test/testing.har/_masterindex
-rw-r–r– 3 hdfs hdfs 20710951 2016-08-09 06:09 /test/testing.har/part-0
9. hadoop fsck / : fsck command is used to check the HDFS file system. There are different arguments that can be passed with this command to emit different results.
[hdfs@m1 ~]$ hadoop fsck /
Connecting to namenode via http://m1.hdp22:50070/fsck?ugi=hdfs&path=%2F
FSCK started by hdfs (auth:SIMPLE) from /192.168.56.41 for path / at Tue Aug 09 06:23:02 EDT 2016
……………………………………………………………………………………….
…………………………………………………………………………..Status: HEALTHY
Total size: 1161798713 B (Total open files size: 2242 B)
Total dirs: 11729
Total files: 1086
Total symlinks: 0 (Files currently being written: 4)
Total blocks (validated): 1056 (avg. block size 1100188 B) (Total open file blocks (not validated): 4)
Minimally replicated blocks: 1056 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 4 (0.37878788 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.9734848
Corrupt blocks: 0
Missing replicas: 18 (0.569981 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Tue Aug 09 06:23:05 EDT 2016 in 2764 milliseconds
The filesystem under path ‘/’ is HEALTHY
10. hadoop fsck / -files : It displays all the files in HDFS while checking.
11. hadoop fsck / -files -blocks : It displays all the blocks of the files while checking.
12. hadoop fsck / -files -blocks -locations : It displays all the files block locations while checking.
13. hadoop fsck / -files -blocks -locations -racks : This command is used to display the networking topology for data-node locations.
14. hadoop fsck -delete : This command will delete the corrupted files in HDFS.
15. hadoop fsck -move :This command is used to move the corrupted files to a particular directory, by default it will move to the /lost+found directory.
16. hadoop dfsadmin -metasave file_name.txt :This command is used to save the meta data that is present in the namenode in a file in the HDFS.
17. hadoop dfsadmin -refreshNodes : This command is used to refresh the data nodes that are allowed to connect to the name node.
18. hadoop fs -count -q /mydir : Checks for the quota space for the specified directory or a file.
19. hadoop dfsadmin -setSpaceQuota 10M /dir_name : This command is used to set the space quota space for a particular directory. Now we will set the directory quota to 10MB and then we will check it using the command hadoop fs -count -q /mydir.
20. hadoop dfsadmin -clrSpaceQuota /mydir : This command is used to clear the allocated quota to a particular directory in HDFS. Now we will clear the quota which we have previously created and check the quota again.
I hope all the above commands will help you to control your cluster. Please fell free to give your feedback.