Check high CPU Intensive process on your server
Category : Bigdata
When you start utilizing your cluster heavily then you may encounter a 100% CPU utilize error on a specific server. But as you may have many jobs and process running on that server that time it would be very tough to identify a culprit process whcih is causing this issue. It is like finding a needle in haystack.
I have faced such scenario in my job so you should not worry as I have created following script which will help you to find culprit and then you can shoot them or can do anything with them whatever you want. Only thing you have to schedule this script in your cron and thats all.
[hdfs@m1.hdp22 ~]$ cat cpu_Usage.sh
dateTime=$(date +”%Y-%m-%d”)
for (( i=1; i <= 20; i++ ))
do ps -eo pcpu,pid,user,start,etime,args | sort -k 1 -r | head -5 >> /hdptmp/Metrics/CPU_Usage_$dateTime.log;
sleep 10;
done
Cron your job like below:
[hdfs@m1.hdp22 ~]$ crontab -l
##CPU issue script
20 11 * * * /home/hdfs/cpu_Usage.sh >>/hdptmp/error.log 2>&1
You will your output file like below:
[hdfs@m1.hdp22 ~]$ cat /hdptmp/Metrics/CPU_Usage_2016-08-30.log
%CPU PID USER STARTED ELAPSED COMMAND
94.5 61100 hdpbatch 11:19:59 00:02 gzip -d 14-prod_2016-08-29.tsv.gz
78.5 60220 hdpbatch 11:19:52 00:09 bzip2 20-mowprod_2016-08-29.tsv
77.2 60221 hdpbatch 11:19:52 00:09 bzip2 21-mowprod_2016-08-29.tsv
77.0 60216 hdpbatch 11:19:52 00:09 bzip2 16-mowprod_2016-08-29.tsv
%CPU PID USER STARTED ELAPSED COMMAND
84.9 60220 hdpbatch 11:19:52 00:19 bzip2 20-mowprod_2016-08-29.tsv
84.9 60216 hdpbatch 11:19:52 00:19 bzip2 16-mowprod_2016-08-29.tsv
84.8 60218 hdpbatch 11:19:52 00:19 bzip2 18-mowprod_2016-08-29.tsv
84.3 60219 hdpbatch 11:19:52 00:19 bzip2 19-mowprod_2016-08-29.tsv
%CPU PID USER STARTED ELAPSED COMMAND
89.0 62082 root 11:20:17 00:05 xz -1 /var/spool/abrt/pyhook-2016-08-30-11:20:10-61697/sosreport-corpadmin-20160830112011.tar
81.7 60220 hdpbatch 11:19:52 00:30 bzip2 20-mowprod_2016-08-29.tsv
81.5 60218 hdpbatch 11:19:52 00:30 bzip2 18-mowprod_2016-08-29.tsv
81.3 60222 hdpbatch 11:19:52 00:30 bzip2 22-mowprod_2016-08-29.tsv
%CPU PID USER STARTED ELAPSED COMMAND
94.0 62886 root 11:20:30 00:02 xz -1 /var/spool/abrt/pyhook-2016-08-30-11:20:22-62093/sosreport-corpadmin-20160830112023.tar
85.1 60218 hdpbatch 11:19:52 00:40 bzip2 18-mowprod_2016-08-29.tsv
85.0 60220 hdpbatch 11:19:52 00:40 bzip2 20-mowprod_2016-08-29.tsv
84.9 60213 hdpbatch 11:19:52 00:40 bzip2 13-mowprod_2016-08-29.tsv
%CPU PID USER STARTED ELAPSED COMMAND
88.5 60220 hdpbatch 11:19:52 00:51 bzip2 20-mowprod_2016-08-29.tsv
88.3 60213 hdpbatch 11:19:52 00:51 bzip2 13-mowprod_2016-08-29.tsv
88.1 60218 hdpbatch 11:19:52 00:51 bzip2 18-mowprod_2016-08-29.tsv
88.0 60214 hdpbatch 11:19:52 00:51 bzip2 14-mowprod_2016-08-29.tsv
I hope it will help you to find culprit. Please fell free to give your feedback for any improvement.