Monthly Archives: October 2017

  • 1

Namenode may keep crashing due to excessive logging

Namenode may keep crashing even if you restart all services and you have enough heap size. And you see following error in logs.

java.io.IOException: IPC’s epoch 197 is less than the last promised epoch 198

or

2017-09-28 09:16:11,371 INFO ha.ZKFailoverController (ZKFailoverController.java:setLastHealthState(851)) – Local service NameNode at m1.hdp22 entered state: SERVICE_NOT_RESPONDING 

Root Cause: In my case it was because too much logging was happening in namenode for Blockstatechange and hdfs.statechange. If the logging is constantly occurring nonstop, the NameNode takes time to respond to other rpc requests. Hence we need to increase the NN log level (from INFO to WARN) for certain classes to take some load off the namenode.

Solution: Increased the log level for two classes: Added the below in hdfs log4j using Ambari (Ambari UI > HDFS > Config > Advanced hdfs-log4j)

log4j.logger.BlockStateChange=ERROR
log4j.logger.org.apache.hadoop.hdfs.StateChange=ERROR


  • 0

ERROR : Failed with exception org.apache.hadoop.security.AccessControlException: Permission denied. user=user1 is not the owner of inode=test_copy_1

If users complain that they are not able to load data into hive tables via beeline. Actually while loading data into Hive table using load data inpath ‘/tmp/test’ into table sampledb.sample1 then getting following error:
load data inpath ‘/tmp/test’ into table adodevdb.sample1;
INFO : Loading data to table adodevdb.sample1 from hdfs://m1.hdp22/tmp/test
ERROR : Failed with exception org.apache.hadoop.security.AccessControlException: Permission denied. user=user1 is not the owner of inode=test_copy_1
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:250)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227)
at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:381)
at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermission(RangerHdfsAuthorizer.java:338)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1939)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1908)
at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:63)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1824)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:821)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:464)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

 

Root Cause: It is because rollback() never happens in case of failure, so this problem is there since the start.  BUG-62311 is raised for the same and unfortunately there is no fix for now.

Workaround: You can fix it by applying a following workaround:

set hive.mv.files.thread=0(zero) in hive-site.xml.