Monthly Archives: December 2016

  • 2

Exception in thread “main” org.apache.spark.SparkException: Application

When you run python script on top of hive but it is failing with following error :

$ spark-submit –master yarn –deploy-mode cluster –queue ado –num-executors 60 –executor-memory 3G –executor-cores 5 –py-files argparse.py,load_iris_2.py –driver-memory 10G  load_iris.py -p ado_secure.iris_places -s ado_secure.iris_places_stg -f /user/admin/iris/places/2016-11-30-place.csv

Exception in thread “main” org.apache.spark.SparkException: Application application_1476997468030_142120 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:974)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1020)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

When I checked spark logs then I found following error.
16/12/22 07:35:49 WARN metadata.Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:193)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:164)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:162)
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:415)
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:414)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:296)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:74)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

Root Cause: 

It can be because of one bug (BUG-56393) in ambari and due to the format of spark job submit in cluster mode.

Resolutions: 

You can resolve it with the help of following steps:

  • Add spark.driver.extraJavaOptions =-Dhdp.version={{hdp_full_version}} -XX:MaxPermSize=1024m -XX:PermSize=256m and spark.yarn.am.extraJavaOptions=-Dhdp.version={{hdp_full_version}} as we were suspecting an old bug.
  • You were running your custom python script along without hive-site.xml due to that it was not able to connect to hive metastore. So we added –files /etc/spark/conf/hive-site.xml to make a connect to hive metastore.
  • Add the –jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar option and provided the path to datanucleus jars.
$ spark-submit –master yarn –deploy-mode cluster –conf “spark.driver.extraJavaOptions=-Dhdp.version=2.3.4.0-3485 -XX:MaxPermSize=1024m -XX:PermSize=256m” –conf “spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.4.0-3485” –queue ado –executor-memory 3G –executor-cores 5 –jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar –py-files argparse.py,load_iris_2.py –driver-memory 10G –files /etc/spark/conf/hive-site.xml load_iris.py -p ado_secure.iris_places -s ado_secure.iris_places_stg -f /user/admin/iris/places/2016-11-30-place.csv 
Please feel free to reach out to us in case of any further assistance.

  • 0

HDFS disk space vs NameNode heap size

In HDFS, data and metadata are decoupled. Data files are split into block files that are stored, and replicated on DataNodes across the cluster. The filesystem namespace tree and associated metadata are stored on the NameNode.

Namespace objects are file inodes and blocks that point to block files on the DataNodes. These namespace objects are stored as a file system image (fsimage) in the NameNode’s memory and also persist locally. Updates to the metadata are written to an edit log. When the NameNode starts, or when a checkpoint is taken, the edits are applied, the log is cleared, and a new fsimage is created.

On DataNodes, data files are measured by disk space consumed—the actual data length—and not necessarily the full block size.

For example, a file that is 192 MB consumes 192 MB of disk space and not some integral multiple of the block size. Using the default block size of 128 MB, a file of 192 MB is split into two block files, one 128 MB file and one 64 MB file. On the NameNode, namespace objects are measured by the number of files and blocks. The same 192 MB file is represented by three namespace objects (1 file inode + 2 blocks) and consumes approximately 450 bytes of memory.

Large files split into fewer blocks generally consume less memory than small files that generate many blocks. One data file of 128 MB is represented by two namespace objects on the NameNode (1 file inode + 1 block) and consumes approximately 300 bytes of memory. By contrast, 128 files of 1 MB each are represented by 256 namespace objects (128 file inodes + 128 blocks) and consume approximately 38,400 bytes. The optimal split size, then, is some integral multiple of the block size, for memory management as well as data locality optimization.

How much memory you actually need depends on your workload, especially on the number of files, directories, and blocks generated in each namespace. If all of your files are split at the block size, you could allocate 1 GB for every million files. But given the historical average of 1.5 blocks per file (2 block objects), a more conservative estimate is 1 GB of memory for every million blocks.

Example 1: Estimating NameNode Heap Memory Used
Alice, Bob, and Carl each have 1 GB (1024 MB) of data on disk, but sliced into differently sized files. Alice and Bob have files that are some integral of the block size and require the least memory. Carl does not and fills the heap with unnecessary namespace objects.

Alice: 1 x 1024 MB file
1 file inode
8 blocks (1024 MB / 128 MB)
Total = 9 objects * 150 bytes = 1,350 bytes of heap memory
Bob: 8 x 128 MB files
8 file inodes
8 blocks
Total = 16 objects * 150 bytes = 2,400 bytes of heap memory
Carl: 1,024 x 1 MB files
1,024 file inodes
1,024 blocks
Total = 2,048 objects * 150 bytes = 307,200 bytes of heap memory
Example 2: Estimating NameNode Heap Memory Needed
In this example, memory is estimated by considering the capacity of a cluster. Values are rounded. Both clusters physically store 4800 TB, or approximately 36 million block files (at the default block size). Replication determines how many namespace blocks represent these block files.

Cluster A: 200 hosts of 24 TB each = 4800 TB.
Blocksize=128 MB, Replication=1
Cluster capacity in MB: 200 * 24,000,000 MB = 4,800,000,000 MB (4800 TB)
Disk space needed per block: 128 MB per block * 1 = 128 MB storage per block
Cluster capacity in blocks: 4,800,000,000 MB / 128 MB = 36,000,000 blocks
At capacity, with the recommended allocation of 1 GB of memory per million blocks, Cluster A needs 36 GB of maximum heap space.
Cluster B: 200 hosts of 24 TB each = 4800 TB.
Blocksize=128 MB, Replication=3
Cluster capacity in MB: 200 * 24,000,000 MB = 4,800,000,000 MB (4800 TB)
Disk space needed per block: 128 MB per block * 3 = 384 MB storage per block
Cluster capacity in blocks: 4,800,000,000 MB / 384 MB = 12,000,000 blocks

At capacity, with the recommended allocation of 1 GB of memory per million blocks, Cluster B needs 12 GB of maximum heap space.
Both Cluster A and Cluster B store the same number of block files. In Cluster A, however, each block file is unique and represented by one block on the NameNode; in Cluster B, only one-third are unique and two-thirds are replicas.


  • 0

GC pool ‘PS MarkSweep’ had collection(s): count=6 time=26445ms

When you create table and it is enforcing authorization using Ranger then it fails to create the table and post that HiveServer2 process crashes.

0: jdbc:hive2://server1> CREATE EXTERNAL TABLE test (cust_id STRING, ACCOUNT_ID STRING,
 ROLE_ID STRING, ROLE_NAME STRING, START_DATE STRING, END_DATE STRING, PRIORITY STRING, 
ACTIVE_ACCOUNT_ROLE STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n' 
STORED AS TEXTFILE LOCATION '/tmp/testTable' 
TBLPROPERTIES ('serialization.null.format'=''); 
Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)


When you check hiveserver2 logs then you will see permission denied error:
Caused by: org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAccessControlException: 
Permission denied: user [saurkuma] does not have [READ] privilege on [hdfs://HDPHA/tmp/testTable]
at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges
(RangerHiveAuthorizer.java:253)

Along with the above errors, hiveserver2.log also shows repetitive GC pauses and subsequently
HiveServer2 service crashes:
2016-11-15 12:39:54,428 WARN [org.apache.hadoop.util.JvmPauseMonitor$Monitor@24197b13]:
util.JvmPauseMonitor (JvmPauseMonitor.java:run(192)) - Detected pause in JVM or host machine 
(eg GC): pause of approximately 24000ms GC pool 'PS MarkSweep' had collection(s): 
count=6 time=26445ms

Root Cause: It is because process goes to check for a permission (read or write) on a given path 
of query, Ranger checks for permissions on a given directory and all its children. However,
if the directory does not exist, it will try to check the parent directory, or its parent directory,
and so on. Eventually the table creation fails and at the same time as this operation uses too much 
memory and causes GC pauses.

In this case, Ranger checks for permission on /tmp/<databasename>, and since it does not exist it 
starts checking /tmp/ and its child directories, causing the GC Pauses and HiveServer2 service crash.

RESOLUTION:
No permamnetly solution for this issue as of now but we have following workaround. 

WORKAROUND:
Ensure that the Storage Location specified in the create table statement does exist in the system.

  • 1

Datanode doesn’t start with error “java.net.BindException: Address already in use”

In many real time scenario we have seen a error “java.net.BindException: Address already in use” with datanode when we start datanode.

You can observe following things during that issue.

1. Datanode doesn’t start with error saying “address already in use”.
2. “netstat -anp | grep 50010” shows no result.

ROOT CAUSE:
There are 3 ports needed when datanode starts and each has a different error message when address already in use.

1. Port 50010 is already in use
2016-12-02 00:01:14,056 ERROR datanode.DataNode (DataNode.java:secureMain(2630)) – Exception in secureMain
java.net.BindException: Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException

2. Port 50075 is already in use
2016-12-01 23:57:57,298 ERROR datanode.DataNode (DataNode.java:secureMain(2630)) – Exception in secureMain
java.net.BindException: Address already in use

3. Port 8010 is already in use
2016-12-02 00:09:40,422 ERROR datanode.DataNode (DataNode.java:secureMain(2630)) – Exception in secureMain
java.net.BindException: Problem binding to [0.0.0.0:8010] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException

Note that there is no port information within datanode.log when Port 50075 is already in use.If it’s required to set a different port for datanode service, review the following properties:

dfs.datanode.address : default 50010
dfs.datanode.http.address : default 50075
dfs.datanode.ipc.address : default 8010

RESOLUTION :
Stop/kill the process which uses port 50010/50075/8010.

Please feel free to give your valuable feedback.