Ambari shows all services down though hadoop services running

  • 0

Ambari shows all services down though hadoop services running

Category : Bigdata

We have seen many time that our hadoop services are up and running but when we open ambari then it shows all are down. So basically it means services do not have any issue,it is a problem with ambari-agent.

Ambari server typically gets to know about the service availability from Ambari agent and using the ‘*.pid’ files created in /var/run.

Suspected problem 1:

[root@sandbox ambari-agent]# ambari-agent status

Found ambari-agent PID: 12112

ambari-agent running.

Agent PID at: /var/run/ambari-agent/ambari-agent.pid

Agent out at: /var/log/ambari-agent/ambari-agent.out

Agent log at: /var/log/ambari-agent/ambari-agent.log

Now check pid in process also and compare like below :

[root@sandbox ambari-agent]# ps -ef | grep ‘ambari_agent’

root     12104     1  0 12:32 pts/0    00:00:00 /usr/bin/python2 /usr/lib/python2.6/site-packages/ambari_agent/AmbariAgent.py start

root     12112 12104  6 12:32 pts/0    00:01:28 /usr/bin/python2 /usr/lib/python2.6/site-packages/ambari_agent/main.py start

If the agent process id and /var/run/ambari-agent/ambari-agent.pid are matching, then possibly there is no issue with the agent process itself.

Now the issue is due to /var/lib/ambari-agent/data/structured-out-status.json. So cat this file to review the content. Typical content could be like following:

cat structured-out-status.json {“processes”: [], “securityState”: “UNKNOWN”} or

[root@sandbox ambari-agent]# cat /var/lib/ambari-agent/data/structured-out-status.json

{“processes”: [], “securityState”: “UNSECURED”}

Compare the content with the same file in another node which is working fine.

Resolution :

Now you need to delete this .json file and restart ambari-agent once again and see the content of this file to match with above given:

root@sandbox ambari-agent]# rm /var/lib/ambari-agent/data/structured-out-status.json

rm: remove regular file `/var/lib/ambari-agent/data/structured-out-status.json’? y

[root@sandbox ambari-agent]# ll /var/lib/ambari-agent/data/structured-out-status.json

ls: cannot access /var/lib/ambari-agent/data/structured-out-status.json: No such file or directory

[root@sandbox ambari-agent]# ambari-agent restart

Restarting ambari-agent

Verifying Python version compatibility…

Using python  /usr/bin/python2

Found ambari-agent PID: 13866

Stopping ambari-agent

Removing PID file at /var/run/ambari-agent/ambari-agent.pid

ambari-agent successfully stopped

Verifying Python version compatibility…

Using python  /usr/bin/python2

Checking for previously running Ambari Agent…

Starting ambari-agent

Verifying ambari-agent process status…

Ambari Agent successfully started

Agent PID at: /var/run/ambari-agent/ambari-agent.pid

Agent out at: /var/log/ambari-agent/ambari-agent.out

Agent log at: /var/log/ambari-agent/ambari-agent.log

[root@sandbox ambari-agent]# ll /var/lib/ambari-agent/data/structured-out-status.json

-rw-r–r– 1 root root 73 2016-06-29 12:59 /var/lib/ambari-agent/data/structured-out-status.json

[root@sandbox ambari-agent]# cat /var/lib/ambari-agent/data/structured-out-status.json

{“processes”: [], “securityState”: “UNSECURED”}

Suspected Problem 2: Ambari Agent is good, but the HDP services are still shown to be down

If there are only few services which are shown to be down, then it could be due to the /var/run/PRODUCT/product.pid file is not matching with the process running in the node.

For eg, if Hiveserver2 service is shown to be not up in Ambari, when hive is actually working fine, check the following files:

  1. # cd /var/run/hive # ls -lrt-rw-r–r– 1 hive hadoop 6 Feb 17 07:15 hive.pid -rw-r–r– 1 hive hadoop 6 Feb 17 07:16 hive-server.pid

Check the content of these files. For eg,

  1. # cat hive-server.pid
  2. 31342
  3. # ps -ef | grep 31342
  4. hive 31342 1 0 Feb17 ? 00:14:36 /usr/jdk64/jdk1.7.0_67/bin/java Xmx1024m Dhdp.version=2.2.9.03393 Djava.net.preferIPv4Stack=true Dhdp.version=2.2.9.03393 Dhadoop.log.dir=/var/log/hadoop/hive Dhadoop.log.file=hadoop.log Dhadoop.home.dir=/usr/hdp/2.2.9.03393/hadoop Dhadoop.id.str=hive Dhadoop.root.logger=INFO,console Djava.library.path=:/usr/hdp/current/hadoopclient/lib/native/Linuxamd6464:/usr/hdp/2.2.9.03393/hadoop/lib/native Dhadoop.policy.file=hadooppolicy.xml Djava.net.preferIPv4Stack=true Xmx1024m XX:MaxPermSize=512m Xmx1437m Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/hdp/2.2.9.03393/hive/lib/hiveservice0.14.0.2.2.9.03393.jar org.apache.hive.service.server.HiveServer2 hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar -hiveconf hive.metastore.uris= -hiveconf hive.log.file=hiveserver2.log -hiveconf hive.log.dir=/var/log/hive

If the content of hive-server.pid and the process running for HiveServer2 aren’t matching, then Ambari wouldn’t report the status correctly.

Ensure that these files have correct ownership / permissions. For eg, the pid files for Hive should be owned by hive:hadoop and it should be 644. In this situation, change the ownership/ permission correctly and update the file with the correct PID of hive process. This would ensure that Ambari shows the status correctly.

Care should be taken while doing the above by ensuring that this is the only HiveServer2 process running in the system and that HiveServer2 is indeed working fine. If there are multiple HiveServer2 processes, then some of them could be stray which needs to be killed.

Post this, if possible also restart the affected services and ensure that the status of the services are correctly shown.


Leave a Reply