Namenode may keep crashing due to excessive logging

  • 1

Namenode may keep crashing due to excessive logging

Namenode may keep crashing even if you restart all services and you have enough heap size. And you see following error in logs.

java.io.IOException: IPC’s epoch 197 is less than the last promised epoch 198

or

2017-09-28 09:16:11,371 INFO ha.ZKFailoverController (ZKFailoverController.java:setLastHealthState(851)) – Local service NameNode at m1.hdp22 entered state: SERVICE_NOT_RESPONDING 

Root Cause: In my case it was because too much logging was happening in namenode for Blockstatechange and hdfs.statechange. If the logging is constantly occurring nonstop, the NameNode takes time to respond to other rpc requests. Hence we need to increase the NN log level (from INFO to WARN) for certain classes to take some load off the namenode.

Solution: Increased the log level for two classes: Added the below in hdfs log4j using Ambari (Ambari UI > HDFS > Config > Advanced hdfs-log4j)

log4j.logger.BlockStateChange=ERROR
log4j.logger.org.apache.hadoop.hdfs.StateChange=ERROR