Check high CPU Intensive process on your server

  • 0

Check high CPU Intensive process on your server

Tags :

Category : Bigdata

When you start utilizing your cluster heavily then you may encounter a 100% CPU utilize error on a specific server. But as you may have many jobs and process running on that server that time it would be very tough to identify a culprit process whcih is causing this issue. It is like finding a needle in haystack.

I have faced such scenario in my job so you should not worry as I have created following script which will help you to find culprit and then you can shoot them or can do anything with them whatever you want. Only thing you have to schedule this script in your cron and thats all.

[hdfs@m1.hdp22 ~]$ cat cpu_Usage.sh

dateTime=$(date +”%Y-%m-%d”)

for (( i=1; i <= 20; i++ ))

do ps -eo pcpu,pid,user,start,etime,args | sort -k 1 -r | head -5 >> /hdptmp/Metrics/CPU_Usage_$dateTime.log;

sleep 10;

done

Cron your job like below: 

[hdfs@m1.hdp22 ~]$ crontab -l

##CPU issue script

20 11 * * * /home/hdfs/cpu_Usage.sh >>/hdptmp/error.log 2>&1

You will your output file like below: 

[hdfs@m1.hdp22 ~]$ cat /hdptmp/Metrics/CPU_Usage_2016-08-30.log

%CPU   PID USER      STARTED     ELAPSED COMMAND

94.5 61100 hdpbatch 11:19:59       00:02 gzip -d 14-prod_2016-08-29.tsv.gz

78.5 60220 hdpbatch 11:19:52       00:09 bzip2 20-mowprod_2016-08-29.tsv

77.2 60221 hdpbatch 11:19:52       00:09 bzip2 21-mowprod_2016-08-29.tsv

77.0 60216 hdpbatch 11:19:52       00:09 bzip2 16-mowprod_2016-08-29.tsv

%CPU   PID USER      STARTED     ELAPSED COMMAND

84.9 60220 hdpbatch 11:19:52       00:19 bzip2 20-mowprod_2016-08-29.tsv

84.9 60216 hdpbatch 11:19:52       00:19 bzip2 16-mowprod_2016-08-29.tsv

84.8 60218 hdpbatch 11:19:52       00:19 bzip2 18-mowprod_2016-08-29.tsv

84.3 60219 hdpbatch 11:19:52       00:19 bzip2 19-mowprod_2016-08-29.tsv

%CPU   PID USER      STARTED     ELAPSED COMMAND

89.0 62082 root     11:20:17       00:05 xz -1 /var/spool/abrt/pyhook-2016-08-30-11:20:10-61697/sosreport-corpadmin-20160830112011.tar

81.7 60220 hdpbatch 11:19:52       00:30 bzip2 20-mowprod_2016-08-29.tsv

81.5 60218 hdpbatch 11:19:52       00:30 bzip2 18-mowprod_2016-08-29.tsv

81.3 60222 hdpbatch 11:19:52       00:30 bzip2 22-mowprod_2016-08-29.tsv

%CPU   PID USER      STARTED     ELAPSED COMMAND

94.0 62886 root     11:20:30       00:02 xz -1 /var/spool/abrt/pyhook-2016-08-30-11:20:22-62093/sosreport-corpadmin-20160830112023.tar

85.1 60218 hdpbatch 11:19:52       00:40 bzip2 18-mowprod_2016-08-29.tsv

85.0 60220 hdpbatch 11:19:52       00:40 bzip2 20-mowprod_2016-08-29.tsv

84.9 60213 hdpbatch 11:19:52       00:40 bzip2 13-mowprod_2016-08-29.tsv

%CPU   PID USER      STARTED     ELAPSED COMMAND

88.5 60220 hdpbatch 11:19:52       00:51 bzip2 20-mowprod_2016-08-29.tsv

88.3 60213 hdpbatch 11:19:52       00:51 bzip2 13-mowprod_2016-08-29.tsv

88.1 60218 hdpbatch 11:19:52       00:51 bzip2 18-mowprod_2016-08-29.tsv

88.0 60214 hdpbatch 11:19:52       00:51 bzip2 14-mowprod_2016-08-29.tsv

I hope it will help you to find culprit. Please fell free to give your feedback for any improvement.