BigData – "You can have data without information, but you cannot have information without Big data."

Latest Blog

Do you think file format does matter in big Data technology?

Yes, Thats matter a lot because of following main reasons: By using correct file format as per your use case you can achieve following. 1. Less storage: if we select a proper file format with good compatibile compression technique then it’s required less storage. 2. Faster processing of data: based on our use case if

admin
July 19, 2021
0

Install and configure Spark History Server (SHS) on Kubernetes K8s

Tags : helm helm install stable/spark-history-server k8s kubectl get svc she on k8s shs spark historyserver on Kubernetes

We always struggle like how to install and configure SHS on Kubernetes with gas event log. So here is your solution. Create a shs-gcs.yaml deployments file which will be used to deploy shs service. pvc: enablePVC: false existingClaimName: nfs-pvc eventsDir: “/” nfs: enableExampleNFS: false pvName: nfs-pv pvcName: nfs-pvc gcs: enableGCS: true secret: history-secrets key:

admin
June 23, 2021
0

Install Airflow in your local Macbook

Tags : airflow airflow webserver conda activate airflow-tutorial1 local macbook

****************************** Step 1 ***************************** Create a new airflow directory anywhere in your laptop (base) saurabhkumar@Saurabhs-MacBook-Pro spark-3.1.1-bin-hadoop2.7 % cd ~/Documents (base) saurabhkumar@Saurabhs-MacBook-Pro Documents % mkdir airflow-tutorial (base) saurabhkumar@Saurabhs-MacBook-Pro Documents % cd airflow-tutorial ************************** Step 2 ******************************* Create a python virtual env (base) saurabhkumar@Saurabhs-MacBook-Pro airflow-tutorial % conda create –name airflow-tutorial1 python=3.7 Collecting package metadata (current_repodata.json): done

admin
April 30, 2021
0

Google Container Registry (GCR) with Minikube or K8s

Tags : gcr https://cloud.google.com/container-registry/docs/advanced-authentication k8s kubernetes minikube unauthorized

When you use Google Container Registry (GCR) and seeing the dreaded ImagePullBackoff status on your pods in minikube/K8s Then this article can help you to solve that error. Error : (base) saurabhkumar@Saurabhs-MacBook-Pro ~ % kubectl describe pod airflow-postgres-694899d6fd-lqp2c -n airflow Events: Type Reason Age From Message —- —— —- —- ——- Normal Scheduled 56s default-scheduler

admin
April 30, 2021
0

Insert overwrite query Failed with exception Unable to move source

Tags : .hivestage hive.exec.stagingdir=/tmp/.hivestage;INSERT OVERWRITE

If you have explicitly setup hive.exec.stagingdir to some location like /tmp/ or some other location then whenever you will run insert overwrite statment then you will get following error. ERROR exec.Task (SessionState.java:printError(989)) – Failed with exception Unable to move source hdfs://clustername/apps/finance/nest/nest_audit_log_final/ .hive-staging_hive_2017-12-12_19-15-30_008_33149322272174981-1/-ext-10000 to destination hdfs://clustername/apps/finance/nest/nest_audit_log_final Example: INSERT OVERWRITE TABLE nest.nest_audit_log_final SELECT project_name , application , module_seq_num ,

admin
May 30, 2018
1

last access time of a table is showing zero

Tags : HIVE-2526 LAST_ACCESS_TIME LAST_ACCESS_TIME from TBLS where DB_ID org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook select TBL_NAME

If you many hundreds or thousands tables and you want to know when was the last time your hive table accessed then you can run following mysql query in mysql under hive database. mysql> use hive; mysql> select TBL_NAME,LAST_ACCESS_TIME from TBLS where DB_ID=<db_id>; +—————————————————————————————————-+——————+ | TBL_NAME | LAST_ACCESS_TIME | +—————————————————————————————————-+——————+ | df_nov_4 | 0 |

admin
May 29, 2018
1

kill hive query where application id was not created

Tags : KILL QUERY

Sometime when you run hive queries then it does not launch application or get hung due to some resources or any other reason. Now in this case you have to kill query to resubmit it. So, please use following steps to kill hive query itself. hive> select * from table1; Query ID = mapr_201804547_2ad87f0f5627

admin
May 29, 2018
1

Purging history/old data in oozie database

Tags : oozie.coord_actions OOZIE.WF_ACTIONS OOZIE.WF_JOB PurgeService

After some period of time your oozie db will be big and it may start throwing space issue or might be some slowness during oozie UI load. There are some properties which will help you to purge your oozie data but sometime, the oozie purge service does not function as expected. It result to a

admin
January 17, 2018
0

Attempt to add *.jar multiple times to the distributed cache

Tags : Attempt to add java.lang.IllegalArgumentException multiple times to the distributed cache.

When we submit Spark2 action via oozie then we may see following exception in logs and job will fail: exception: Attempt to add (hdfs://m1:8020/user/oozie/share/lib/lib_20171129113304/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache. java.lang.IllegalArgumentException: Attempt to add (hdfs://m1:8020/user/oozie/share/lib/lib_20171129113304/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache. The above error occurs because the same jar files exists in both(/user/oozie/share/lib/lib_20171129113304/oozie/ and /user/oozie/share/lib/lib_20171129113304/spark2/) the

admin
January 16, 2018
0

hive jdbc in zeppelin throwing permission error to anonymous user

Tags : hive jdbc Permission denied: user=anonymous zeppelin

When users run hive query in zeppelin via jdbc interperator then it is going to some anonymous user not an actual user. INFO [2017-11-02 03:18:20,405] ({pool-2-thread-2} RemoteInterpreter.java[pushAngularObjectRegistryToRemote]:546) – Push local angular object registry from ZeppelinServer to remote interpreter group 2CNQZ1ES5:shared_process WARN [2017-11-02 03:18:21,825] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2058) – Job 20171031-075630_2029577092 is finished, status: ERROR, exception: null, result:

admin
November 22, 2017
0

Namenode may keep crashing due to excessive logging

Tags : java.io.IOException: IPC's epoch is less than the last promised epoch log4j.logger.BlockStateChange log4j.logger.org.apache.hadoop.hdfs.StateChange ZKFailoverController.java:setLastHealthState

Namenode may keep crashing even if you restart all services and you have enough heap size. And you see following error in logs. java.io.IOException: IPC’s epoch 197 is less than the last promised epoch 198 or 2017-09-28 09:16:11,371 INFO ha.ZKFailoverController (ZKFailoverController.java:setLastHealthState(851)) – Local service NameNode at m1.hdp22 entered state: SERVICE_NOT_RESPONDING Root Cause: In my case

admin
October 16, 2017
1

ERROR : Failed with exception org.apache.hadoop.security.AccessControlException: Permission denied. user=user1 is not the owner of inode=test_copy_1

Tags : BUG-62311 ERROR : Failed hive.mv.files.thread org.apache.hadoop.security.AccessControlException

If users complain that they are not able to load data into hive tables via beeline. Actually while loading data into Hive table using load data inpath ‘/tmp/test’ into table sampledb.sample1 then getting following error: load data inpath ‘/tmp/test’ into table adodevdb.sample1; INFO : Loading data to table adodevdb.sample1 from hdfs://m1.hdp22/tmp/test ERROR : Failed with

admin
October 16, 2017
0

Select does not return any row in mr execution engine but returns in tez via beeline

Tags : hive.execution.engine HiveHook.java:registerProcess mapred.input.dir.recursive

When I ran a select statement via setting set hive.execution.engine=mr; then select * from table is not returning any rows in beeline but when I run it in tez then it is returning result. 0: jdbc:hive2://m1.hdp22:10001/default> select * from test_db.table1 limit 25; +————————+————————-+————————-+—————————+—————————+—————————+————————-+————————-+————————-+——————————-+————————-+–+ | cus_id | prx_nme | fir_nme | mid_1_nme | mid_2_nme | mid_3_nme

admin
September 1, 2017
0

knox is not getting start, failing with error Gateway SSL Certificate is Expired

Tags : /security/keystores/gateway.jks Gateway SSL Certificate is Expired knox org.apache.hadoop.gateway.services. ServiceLifecycleException

When you try to start knox then if it fails with following error then don’t worry, this article will help you to solve problem. INFO hadoop.gateway (JettySSLService.java: logAndValidateCertificate(122)) – The Gateway SSL certificate is valid between: FATAL hadoop.gateway (GatewayServer.java:main (120)) – Failed to start gateway: org.apache.hadoop.gateway.services. ServiceLifecycleException: Gateway SSL Certificate is Expired. Root cause: It

admin
August 29, 2017
1

Hive metastore critical alerts with ExecutionFailed: Execution of ‘export HIVE_CONF_DIR=’/usr/hdp/current/hive-metastore/conf

Tags : at org.apache.atlas.hook.AtlasHook.Caused by: java.lang.NullPointerException hive.exec.post.hooks Metastore on failed (Traceback (most recent call last):org.apache.atlas.hive.hook.HiveHook

When you install Atlas and configure it then you may see following alert in Ambari Hive Service. And once you check this alert details, you will see following error : Metastore on m1.hdp22 failed (Traceback (most recent call last): File “/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py”, line 200, in execute timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE, File “/usr/lib/python2.6/site-packages/resource_management/core/base.py”, line 155, in __init__ self.env.run() File “/usr/lib/python2.6/site-packages/resource_management/core/environment.py”,

admin
August 23, 2017
0

Sqoop import is failing after enabling atlas with ERROR security.InMemoryJAASConfiguration: Unable to add JAAS configuration

Tags : /var/lib/ambari-server/resources/scripts/configs.sh atlas-application.properties atlas.jaas.KafkaClient.loginModuleName atlas.jaas.KafkaClient.option.renewTicket atlas.jaas.KafkaClient.option.useTicketCache ERROR security.InMemoryJAASConfiguration: Unable to add JAAS configuration KafkaClient

When you run Sqoop import with teradata or mysql/oracle then it might fail after installing and enabling atlas in your cluster with following error. 17/08/10 04:31:56 ERROR security.InMemoryJAASConfiguration: Unable to add JAAS configuration for client [KafkaClient] as it is missing param [atlas.jaas.KafkaClient.loginModuleName]. Skipping JAAS config for [KafkaClient] 17/08/10 04:31:58 INFO checking on the exit code

admin
August 22, 2017
0

/usr/hdp/2.6.1.0-129/atlas/hook-bin/import-hive.sh is failing with Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes

Tags : Exception in thread "main" java.lang.NoClassDefFoundError:Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes usr/hdp/2.6.1.0-129/atlas/hook-bin/import-hive.sh

When you have installed atlas on top of your cluster and you want to sync your hive data to atlas via following method then you may see following error after sometime(~20-30 mins) running your command. [hive@m1.hdp22 ~]$ export HADOOP_CLASSPATH=`hadoop classpath` [hive@m1.hdp22 ~]$ export HIVE_CONF_DIR=/etc/hive/conf [hive@m1.hdp22 ~]$ /usr/hdp/2.6.1.0-129/atlas/hook-bin/import-hive.sh Using Hive configuration directory [/etc/hive/conf] Log file for

admin
August 22, 2017
1

Spark job run successfully in client mode but failing in cluster mode

Tags : Caused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory Exception in thread "main" org.apache.spark.SparkException exitCode: 1 Final app status: FAILED Spark job failing in cluster mode Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient You must build Spark with Hive. Export 'SPARK_HIVE=true

If you build a pyspark application which can run successfully in both the local and yarn-client modes. However, when you try to run in cluster mode, then you may receive following errors : Error 1: Exception: (“You must build Spark with Hive. Export ‘SPARK_HIVE=true’ and run build/sbt assembly”, Py4JJavaError(u’An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n’, JavaObject id=o52))

admin
August 22, 2017
0

Unable to view OS Host information in the Ambari Dashboard(No data Available)

Tags : ambari ambari-metrics-collector No data Available

On the Ambari dashboard, the memory usage, Network Usage, CPU usage and Cluster Load information are missing.The dashboard displays the following error: Root Cause : This issue occurs when there are some temporary files present in the AMS collector folder. Solution: You need to stop ams service vi ambari and then remove all temp files.

admin
July 4, 2017
2

Beeline java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Tags : java.lang.OutOfMemoryError:java.lang.OutOfMemoryError: Requested array size exceeds VM limit org.apache.http.NoHttpResponseException:org.apache.thrift.transport.TTransportException: org.apache.http.NoHttpResponseException Requested array size exceeds VM limit

When we run beeline jobs very heavily then sometime we can see following error : Root Cause : By default, the history file is located under ~/.beeline/history for that user who is facing this issue and beeline will load the latest 500 rows into memory. If those queries are super big, containing lots of characters, it

admin
July 4, 2017
2

Run all service checks in bulk

Tags : _SERVICE_CHECK 8080/api/v1/clusters AMBARI_HOST bulk service checks Run all service checks in bulk run_all_service_checks running_services X-Requested-By:X-Requested-By" -X POST

In this blogs I tried to explain that how you can use ambari API to trigger all Service Checks with a single command. In order to check the status and stability of any service in your cluster you need to run the service checks that are included in Ambari. Usually each Service provides its own

admin
May 31, 2017
0

Enable Debug mode in beeline

Tags : -hiveconf hive.root.logger=Debug beeline-log4j.properties console Enable Debug mode in beeline log4j.rootLogger

Some time you have to troubleshoot beeline issue and then you think how to get into debug mode for beeline command shell as you have in hive (-hiveconf hive.root.logger=Debug,console). I know same is not going to work with beeline So don’t worry following steps will help you and good part is you do not need

admin
May 31, 2017
0

hadoop cluster Benchmarking and Stress Testing

Tags : Benchmarking and Stress Testing Benchmarking Testing hadoop-mapreduce-client-jobclient Stress Testing TestDFSIO

When we install our cluster then we should do some benchmarking or Stress Testing. So in this article I have explained a inbuilt TestDFSIO functionality which will help you to to perform Stress Testing on your configured cluster. The Hadoop distribution comes with a number of benchmarks, which are bundled in hadoop-*test*.jar and hadoop-*examples*.jar. The TestDFSIO benchmark is

admin
May 29, 2017
0

Atlas Metadata Server error HTTP 503 response from http://localhost:21000/api/atlas/admin/status in 0.000s (HTTP Error 503: Service Unavailable)

Tags : atlas HTTP 503 response localhost:21000/api/atlas/admin/status in 0.000s policy-store.txt not found in file system users-credentials.properties not found

In case if you are not able to access your atlas portal or you see following error in your browser or logs. HTTP 503 response from http://localhost:21000/api/atlas/admin/status in 0.000s (HTTP Error 503: Service Unavailable) Then please check application.log file in /var/log/atlas location and if you see following error in logs then do not worry,following the given

admin
May 15, 2017
1

extend your VirtualBox image size

Tags : lvextend sfdisk -s successfully extended vgextend

When you first time use your HDP sandbox in VirtualBox then by default it assign 20GB of your harddisk to your sandbox. But later as far as I know this would not be enough size and you want to extend size.Then this article will help you to extend your VBox size. Step 1: Right click

admin
May 10, 2017
0

Could not create http connection to jdbc:hive2:HTTP Response code: 413 (state=08S01,code=0)

Tags : Could not create http connection to jdbc:hive2:HTTP Response code: 413 state=08S01

If you are using HiveServer2 in HTTP transport mode, then the authentication information is sent as part of HTTP headers. And the above error occurs when the default buffer size is set and the HTTP size is insufficient also using Kerberos is used. This is a known issue and a bug (https://issues.apache.org/jira/browse/HIVE-11720) has been raised

admin
May 3, 2017
0

Error: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions

Tags : AccessDeniedException default:SYSTEM.CATALOG Error: org.apache.hadoop.Insufficient permissions org.apache.hadoop.hbase.security.AccessDeniedException RWCA

If you try to connect to phoenix server from hbase or you do some service checks then if you are facing following error then do not worry,be relax as here you will find solution of this problem. Error : SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.0-3485/phoenix/phoenix-4.4.0.2.3.4.0-3485-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in

admin
April 10, 2017
0

Ambari is showing “Add Service Wizard in Progress” or “Move Master Wizard In Progress”

Tags : Add service wizard Add Service Wizard in Progress addServiceController Move Master Wizard In Progress moveMasterController

If you are using ambari 2.4.1 or 2.4.2 then you may see following message in your ambari page and you will not get any option to “Service Action” to restart or doing anything to any services. Root Cause : If there are more than one Ambari Admin users present. Then if one of the admin user

admin
April 10, 2017
0

java.lang.IllegalArgumentException: stream exceeds limit [2,048]

Tags : 048]16000 exceeds limit [2 limit [2 oozie-site.xml oozie.action.max.output.data oozie.servlet.CallbackServlet.max.data.len

When we run oozie job with SSH action and we use capture output then it may fail with following error. java.lang.IllegalArgumentException: stream exceeds limit [2,048] at org.apache.oozie.util.IOUtils.getReaderAsString(IOUtils.java:84) at org.apache.oozie.servlet.CallbackServlet.doPost(CallbackServlet.java:117) at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) at org.apache.oozie.servlet.JsonRestServlet.service(JsonRestServlet.java:304) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:86) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

admin
April 4, 2017
0

hadoop snapshots

Tags : .snapshot allowSnapshot createSnapshot lsSnapshottableDir snapshots

Hdfs snapshots are to protect important enterprise data sets from user or application errors.HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are: To demonstrate functionality of snapshots, we will create a directory in HDFS, will create

admin
March 30, 2017
0

Ssh action with oozie

Tags : job.properties oozie_ssh_action shell action

When you want to run your shell script via oozie then following article will help you to do your job in easy way. Following steps you need to setup Oozie workflow using ssh-action: 1. Configure job.properties Example: 2. Configure workflow.xml Example: 3. Write sample sampletest.sh script Example: 4. Upload workflow.xml to ${appPath} defined in job.properties

admin
March 21, 2017
2

How to remove header from csv during loading to hive

Tags : PigStorage remove header skip.header.line.count tblproperties

Sometime we may have header in our data file and we do not want that header to loaded into our hive table or we want to ignore header then this article will help you. [saurkuma@m1 ~]$ cat sampledata.csv id,Name 1,Saurabh 2,Vishal 3,Jeba 4,Sonu Step 1: Create a table with table properties to ignore it. hive>

admin
March 20, 2017
0

Insert date into hive tables shows null during select

Tags : cast(to_date(from_unixtime(unix_timestamp from_unixtime hive null values in hive ROW FORMAT DELIMITED timestamp to_date unix_timestamp

When we try to create table on any files(csv or any other format) and load data into hive table then we may see that during select queries it is showing null value. You can solve it in the following ways: [saurkuma@m1 ~]$ ll total 584 -rw-r–r– 1 saurkuma saurkuma 591414 Mar 16 02:31 SalesData01.csv [saurkuma@m1

admin
March 16, 2017
2

Unix useful commands

Tags : /etc/sysconfig/network-scripts/ifcfg-eth0 NOPASSWD useradd userdel usermod

Sometime we need a user who can do everything in our server as root does. So we may do the following: Create a new user with the same privileges as root Grant same same privileges to existing user as root Case 1: Lets say we need to add a new user and grant him root

admin
March 14, 2017
1

Oozie server failing with error “cannot load JDBC driver class ‘com.mysql.jdbc.Driver'”

Tags : Cannot load JDBC driver class 'com.mysql.jdbc.Driver'com.mysql.jdbc.Driver Could not load service classes PersistenceException

Issue : Oozie server is failing with following error : FATAL Services:514 – SERVER[m2.hdp22] E0103: Could not load service classes, Cannot load JDBC driver class ‘com.mysql.jdbc.Driver’ org.apache.oozie.service.ServiceException: E0103: Could not load service classes, Cannot load JDBC driver class ‘com.mysql.jdbc.Driver’ at org.apache.oozie.service.Services.loadServices(Services.java:309) at org.apache.oozie.service.Services.init(Services.java:213) at org.apache.oozie.servlet.ServicesLoader.contextInitialized(ServicesLoader.java:46) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4210) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4709) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:802) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:583)

admin
March 10, 2017
0

script to kill yarn application if it is running more than x mins

Tags : kill_application yarn application -list

Sometime we get a situation where we have to get lists of all long running and based on threshold we need to kill them.Also sometime we need to do it for a specific yarn queue. In such situation following script will help you to do your job. [root@m1.hdp22~]$ vi kill_application_after_some_time.sh #!/bin/bash if [ “$#” -lt

admin
February 23, 2017
0

Hive2 action with Oozie in kerberos Env

Tags : HadoopThriftAuthBridge Hive2 hive2action kerberos org.apache.thrift.transport.TTransportException Unsupported mechanism type PLAIN

One of my friend was trying to run some simple hive2 action in their Oozie workflow and was getting error. Then I decided to replicate it on my cluster and finally I did it after some retry. If you have the same requirement where you have to run hive sql via oozie then this article

admin
February 22, 2017
0

Enable GUI for Centos 6 on top of command line

Tags : centos6 GUI GUI for Centos 6 init 5

If you have installed CentOS 6.5, and you just have a terminal with a black background and you want to enable GUI then thsi article is for you to get it done. Desktop environment is not necessary for Server usage, though. But Sometimes installation or using an application requires Desktop Environment, then build Desktop Environment

admin
February 21, 2017
1

Encrypt Database and LDAP Passwords for Ambari-Server

Tags : ambari ldap AMBARI_SECURITY_MASTER_KEY

By default the passwords to access the Ambari database and the LDAP server are stored in a plain text configuration file. To have those passwords encrypted, you need to run a special setup command. [root@m1 ~]# cd /etc/ambari-server/conf/ [root@m1 conf]# ls -ltrh total 52K -rw-r–r– 1 root root 2.8K Mar 31 2015 ambari.properties.rpmsave.20161004015858 -rwxrwxrwx 1

admin
February 9, 2017
0

Cannot retrieve repository metadata (repomd.xml) for repository

Tags : Error: Cannot retrieve repomd.xml repository

When you upgrade your hdp cluster through satellite server or local repository and you start your cluster via ambari or add some new services to your cluster then you may see following error. resource_management.core.exceptions.Fail: Execution of ‘/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-collector’ returned 1. Error: Cannot retrieve repository metadata (repomd.xml) for repository: HDP-2.3.0.0-2557.

admin
February 6, 2017
0

Unable to initialize Falcon Client object. Cause : Could not authenticate, Authentication failed

Tags : FalconException HTTP ERROR: 503 SERVICE_UNAVAILABL

If you upgrade to or install HDP 2.5.0 or later without first installing the Berkeley DB file, you will get the error “Unable to initialize Falcon Client object. Cause : Could not authenticate, Authentication failed” or HTTP ERROR: 503 Problem accessing /index.html. Reason: SERVICE_UNAVAILABL or Falcon UI is unavailable. From Falcon logs: java.lang.RuntimeException: org.apache.falcon.FalconException: Unable

admin
February 6, 2017
2

Enable logging for client connections and running queries with Phoenix Query Server

Tags : AbstractEndPoint Phoenix phoenix-server PQS queryserver

Phoenix Query server (PQS) does not log details about client connections and the queries run using the default log level of INFO. It is required to modify the log4j configuration for certain classes to obtain such logs. To enable logging such messages by PQS, perform the following: On the node that runs PQS service, edit

admin
February 5, 2017
0

Some helpful Tips

Tags : hive query using yesterday's date

1. How-to-run-a-hive-query-using-yesterdays-date Use from_unixtime(unix_timestamp()-1*60*60*24, ‘yyyy-MM-dd’); in your hive query. For example: select * from sample where date1=from_unixtime(unix_timestamp()-1*60*60*24, ‘yyyy-MM-dd’); 2. How to diff file(s) in HDFS How to diff a file in HDFS and a file in the local filesystem: diff <(hadoop fs -cat /path/to/file) /path/to/localfile How to diff two files in HDFS: diff <(hadoop fs -cat /path/to/file1)

admin
January 30, 2017
0

Run Pig Script in Nifi

Tags : nifi pig pig script in nifi

NiFi can interface directly with Hive, HDFS, HBase, Flume and Phoenix. And I can also trigger Spark and Flink through Kafka and Site-To-Site. Sometimes I need to run some Pig scripts. Apache Pig is very stable and has a lot of functions and tools that make for some smart processing. You can easily augment and

admin
January 7, 2017
0

Exception in thread “main” org.apache.spark.SparkException: Application

Tags : Failed to access metastore org.apache.spark.SparkException spark spark.yarn.am.extraJavaOptions

When you run python script on top of hive but it is failing with following error : $ spark-submit –master yarn –deploy-mode cluster –queue ado –num-executors 60 –executor-memory 3G –executor-cores 5 –py-files argparse.py,load_iris_2.py –driver-memory 10G load_iris.py -p ado_secure.iris_places -s ado_secure.iris_places_stg -f /user/admin/iris/places/2016-11-30-place.csv Exception in thread “main” org.apache.spark.SparkException: Application application_1476997468030_142120 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:974)

admin
December 26, 2016
2

HDFS disk space vs NameNode heap size

Tags : NameNode Heap Memory namenode heapsize namespace

In HDFS, data and metadata are decoupled. Data files are split into block files that are stored, and replicated on DataNodes across the cluster. The filesystem namespace tree and associated metadata are stored on the NameNode. Namespace objects are file inodes and blocks that point to block files on the DataNodes. These namespace objects are

admin
December 21, 2016
0

GC pool ‘PS MarkSweep’ had collection(s): count=6 time=26445ms

Tags : HiveAccessControlException hiveserver2 down PS MarkSweep

When you create table and it is enforcing authorization using Ranger then it fails to create the table and post that HiveServer2 process crashes. 0: jdbc:hive2://server1> CREATE EXTERNAL TABLE test (cust_id STRING, ACCOUNT_ID STRING, ROLE_ID STRING, ROLE_NAME STRING, START_DATE STRING, END_DATE STRING, PRIORITY STRING, ACTIVE_ACCOUNT_ROLE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED

admin
December 15, 2016
0

Datanode doesn’t start with error “java.net.BindException: Address already in use”

Tags : 50010 is already in use datanode java.net.BindException: Address already in use port is in use

In many real time scenario we have seen a error “java.net.BindException: Address already in use” with datanode when we start datanode. You can observe following things during that issue. 1. Datanode doesn’t start with error saying “address already in use”. 2. “netstat -anp | grep 50010” shows no result. ROOT CAUSE: There are 3 ports

admin
December 8, 2016
1

Installing grafana and it is failing with resource_management.core.exceptions.Fail: Ambari Metrics Grafana data source creation failed. POST request status: 401 Unauthorized

Tags : ambari-metrics ams grafana

When we do fresh install for grafana in ambari 2.4 and when you start it then it may be fail with following error. stderr: /var/lib/ambari-agent/data/errors-14517.txt Traceback (most recent call last): File “/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/ metrics_grafana.py”, line 67, in <module> AmsGrafana().execute() File “/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py”, line 280, in execute method(env) File “/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py”, line 725, in restart self.start(env) File “/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/

admin
November 24, 2016
0

Standby NameNode is faling and only one is running

Standby NameNode is unable to start up. Or, once bring up standby NameNode, the active NameNode will go down soon, leaving only one live NameNode. NameNode log shows: FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) – Error: flush failed for required journal (JournalAndStream(mgr=QJM to )) java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond. ROOT CAUSE:

admin
November 18, 2016
0

Map side join in Hive

Tags : hive map join mapjoin

Many time we face a situation that we have very small tables in hive but when we query these tables then it takes long time. Here I am going to explain Map side join and its advantages over the normal join operation in Hive. But before knowing about this, we should first understand the concept of

admin
October 19, 2016
18

sql workbench connection to hadoop

Tags : hive-jdbc.jar sql-workbench

Many time we do not want to run our hive query through beeline or hive cli due to so many reason. Here I am not going to talk about reasons as its big debatable point, so in this article I have explain the steps to connect SQL Workbench to out hadoop cluster. In this article

admin
October 11, 2016
1

Hive Actions with Oozie

Tags : hive action oozie with hive

One of my friend was trying to run some hive .hql in their Oozie workflow and was getting error. Then I decided to replicate it on my cluster and finally I did it after some retry. If you have the same requirement where you have to run hive sql via oozie then this article will help

admin
October 8, 2016
0

Process xml file via apache pig

Tags : xml parser XPath

If you want to work with XML in Pig, the Piggybank library (a user-contributed library of useful Pig code) contains an XMLLoader. It works in a similar way to our technique and captures all of the content between a start and end tag and supplies it as a single bytearray field in a Pig tuple.

admin
October 8, 2016
4

Process xml file via mapreduce

Tags : mapreduce and xml xml parser

When you have a requirement to process your data via hadoop which is not default input format then this article will help you. Hadoop provides default input formats like TextInputFormat, NLineInputFormat, KeyValueInputFormat etc., when you get a different types of files for processing you have to create your own custom input format for processing using

admin
October 8, 2016
3

How to use Hive Query result in a variable for other query

Tags : desc hivevar:subquery

Many time we want to store one query result into a variable and then use this variable in some other query. So now it is possible in your favorite hadoop ecosystem i.e hive. With the help of this article you can achieve it. [root@m1 etc]# hive 16/10/04 02:40:45 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

admin
October 4, 2016
3

“INSERT OVERWRITE” functional details

Tags : hive partitions INSERT OVERWRITE

If the OVERWRITE keyword is used then the contents of the target table (or partition) will be deleted and replaced by the files referred to by filepath; otherwise the files referred by filepath will be added to the table. Note that if the target table (or partition) already has a file whose name collides with

admin
September 28, 2016
0

Ranger admin install fails with “007-updateBlankPolicyName.sql import failed”

Tags : Ranger updateBlankPolicyName.sql

If you see following error during ranger install then no need to worry as you can solve it by following just one step. 2016-03-18 16:10:44,048 [JISQL] /usr/jdk64/jdk1.8.0_60/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/hdp/current/ranger-admin/jisql/lib/* org.apache.util.sql.Jisql -driver mysqlconj -cstring jdbc:mysql://mysqldb/ranger -u ‘user’ -p ‘********’ -noheader -trim -c \; -input /usr/hdp/current/ranger-admin/db/mysql/patches/007-updateBlankPolicyName.sql Resolution : SET GLOBAL log_bin_trust_function_creators = 1 Reinstall again Ranger service.

admin
September 27, 2016
0

Enable ‘Job Error Log’ in oozie

Tags : enable Job Error Log Job Error Log log4j.logger.org.apache.oozie=ALL

In the Oozie UI, ‘Job Error Log’ is a tab which was introduced in HDP v2.3 on Oozie v4.2 . By default it is disabled so with the help of following steps you can enable it. In the Oozie UI, ‘Job Error Log’ is a tab which was introduced in HDP v2.3 on Oozie v4.2

admin
September 27, 2016
0

After upgrading ambari it is not coming up (hostcomponentdesiredstate.admin_state)

Tags : ambari hostcomponentdesiredstate hostcomponentdesiredstate.admin_state INSERVICE

If you upgrade ambari and in case if you see following error then you should not worry, following steps will help you to bring your cluster into running state. Issue: Once you upgrade your cluster and after restarting you don’t see any service or their metrics on ambari then you need following given steps. You

admin
September 23, 2016
0

Hadoop Archive Files – HAR

Tags : archive hadoop archive -archiveName har

Hadoop archive files or HAR files are facility to pack HDFS files into archives. This is the best option for storing large number of small sized files in HDFS as storing large number of small sized files directly in HDFS is not very efficient. The advantage of har files is that, these files can be

admin
September 22, 2016
1

Falcon MQ log files location

Tags : embeddedmq falcon falcon.embeddedmq.data

Sometime we see that falcon use 90-100% of / space like showing in following example. [user1@server localhost]$ du -sh /hadoop/falcon/hadoop/falcon/embeddedmq/data/localhost/KahaDB 67M /hadoop/falcon/hadoop/falcon/embeddedmq/data/localhost/KahaDB [users1@server localhost]$ du -sh /hadoop/falcon/embeddedmq/data/localhost/KahaDB/ 849M /hadoop/falcon/embeddedmq/data/localhost/KahaDB/ This is because we have installed falcon in embedded mode and we have set falcon.embeddedmq.data to that location. Falcon server starts embedded active mq whenever we

admin
September 19, 2016
0

Pig script with HCatLoader on Hive ORC table

Tags : HCatLoader hive action pig

Sometime we have to run some pig command on hive orc tables then this article will help you to do that. Step 1: First create hive orc table: hive> CREATE TABLE ORC_Table(COL1 BIGINT,COL2 STRING) CLUSTERED BY (COL1) INTO 10 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\T’ STORED AS ORC TBLPROPERTIES (‘TRANSACTIONAL’=’TRUE’) ; Step 2:

admin
September 16, 2016
0

hive date time issue

Tags : hiev date and time null in date

Many time when we load data into hive tables and if we have a date & time field in our data then we may have seen an issue with getting data field. So to solve this issue I have created this article and explained steps in details. I have the following sample input file(a.txt) a,20-11-2015

admin
September 12, 2016
0

Compression in Hadoop

Tags : Compression hdfs compression

File compression brings two major benefits: it reduces the space needed to store files, and it speeds up data transfer across the network or to or from disk. When dealing with large volumes of data, both of these savings can be significant, so it pays to carefully consider how to use compression in Hadoop. 1.

admin
September 12, 2016
1

Change default permission of hive database

Tags : 0777 default hive warehouse

When you create a database or internal tables in hive cli then by default it creates with 777 permission.Even though if you have umask in hdfs then also it will be same permission. But now you can change it with the help of following steps. 1.From the command line in the Ambari server node, edit

admin
September 8, 2016
1

Update your Capacity Scheduler through REST API

Tags : CS SCHEDULER yarn capacity scheduler

Sometime you want change your Capacity Scheduler through REST API or you have a requirement where you have to change your Capacity Scheduler configurations frequently via some script then this article will help you to do your work. You can achieve it via following command. [root@sandbox conf.server]# curl -v -u admin:admin -H “Content-Type: application/json” -H “X-Requested-By:ambari” -X PUT

admin
September 7, 2016
0

Start Namenode manually

Tags : manually start namenode start

Sometime we do not want to start all hdfs services at once or we just want to start NN,DN or SNN only via command then this article will help you to do this in a very simple manner. 1. Kill the current operation if already going on from ambari for namenode startup 2. set hadoop.root.logger=DEBUG,console

admin
September 7, 2016
0

Enable Debug mode for hive in Ambari

Tags : debug mode hive

Many time we see that during troubleshoot we do not find much information if we are just default logger. So no worries I will help you to guide how to enable debug mode in logs or on your console. Case 1: Use the following command to start hive: Set follwoing property to turn on debug mode

admin
September 2, 2016
0

How to integrate Ambari with ldap

Tags : ambari ldap

By default, Ambari uses an internal database as the user store for authentication and authorization. If you wish to add LDAP external authentication in addition for Ambari Web, you need to make some edits to the Ambari properties file. Collect following information : ldap.primaryUrl=<ldap_server_name>:389 ldap.useSSL=false ldap.usernameAttribute=sAMAccountName ldap.baseDn=cn=Users,dc=<sreach_dir>,dc=com ldap.bindAnonymously=false ldap.managerDn=cn=ambari,cn=users,dc=<sreach_dir>,dc=com ldap.managerPassword=/etc/ambari-server/conf/ldap-password.dat ldap.userObjectClass=user ldap.groupObjectClass=group ldap.groupMembershipAttr=memberOf ldap.groupNamingAttr=cn ldap.referral=ignore

admin
August 31, 2016
3

Check high CPU Intensive process on your server

Tags : 100% CPU CPU

When you start utilizing your cluster heavily then you may encounter a 100% CPU utilize error on a specific server. But as you may have many jobs and process running on that server that time it would be very tough to identify a culprit process whcih is causing this issue. It is like finding a

admin
August 31, 2016
0

Tez job fails with ‘vertex failure’ error

Tags : java.lang.RuntimeException tez job vertex failure

When you run your hive job on tez execution engine then you may see job failure due to ‘vertex failure’ error. Or you may see following error in your logs. Vertex failed, vertexName=Reducer 34, vertexId=vertex_1424999265634_0222_1_23, diagnostics=[Task failed, taskId=task_1424999265634_01422_1_23_000008, diagnostics=[AttemptID:attempt_1424999265634_01422_1_23_000008_0 Info:Error: java.lang.RuntimeException: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:564) at java.security.AccessController.doPrivileged(Native Method) at

admin
August 31, 2016
2

heap size issue in Hive Metastore

Tags : heap size metastore

Sometime during your job running you may see job failure due to heap size. It might be because of metastore heap issue. It is encountering OutOfMemory errors, or is known to be insufficient to handle the cluster workload. Resolution: To fix this issue you have to increase heap size for metastore in hive-end.sh (or hive-end.cmd)

admin
August 31, 2016
0

Hadoop Admin most lovable commands

Tags : dfsadmin hadoop commands useful commands

If you are working on hadoop and you want to know about your cluster or you want to control your hadoop cluster then following commands should be handy to you. In this article i have tried to explain few commands which will help you a lot to do your day to day works. hdfs dfsadmin

admin
August 9, 2016
6

Rack Awareness on Hadoop

Tags : rack awareness topology

If you have Hadoop clusters of more than 30-40 nodes then it is better you have configured it with rack awarenwss because communication between two data nodes on the same rack is efficient than the same between two nodes on different racks. It also have us to improve network traffic while reading/writing HDFS files, NameNode

admin
August 9, 2016
0

Namenode installation issue

Tags : hdfs.distro not found namenode issue

When you install hdp and during installation if something goes wrong with hdfs components(like namenode) then you may see following errors. File “/usr/lib/python2.6/site-packages/resource_management/core/shell.py”, line 140, in _call_wrapper result = _call(command, **kwargs_copy) File “/usr/lib/python2.6/site-packages/resource_management/core/shell.py”, line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of ‘yes Y | hdfs –config /usr/hdp/current/hadoop-client/conf namenode -format’ returned 127. /usr/hdp/current/hadoop-client/bin/hdfs: line 18:

admin
August 8, 2016
0

How to debug distcp jobs

Tags : distcp debug

Some time when you run distcp jobs on cluster and you see some failure or performance then you want to debug it then you can go by using following command. To turn on debug mode on the job level, issue the following command before executing the distcp job: To turn on debugmode on the mapper level,

admin
August 4, 2016
0

How to check contents of a JAR file

Tags : content jar

Many times we have to check what are the packages,classes included in one jar files, but due to black box(just a simple jar ) we face a trouble to check. So with the help of following ways you can check it. jar tf <PATH_TO_JAR But if you are looking for a specific class or package

admin
August 4, 2016
0

If you delete /hdp/apps/ dir from hdfs

Tags : FileNotFoundException hdp/apps

There is situation when unfortunately and unknowingly you delete /hdp/apps/2.3.4.0-3485 with skipTrash then you will be in trouble and other services will be impacted. You will not be able to run hive,mapreduce or sqoop command, You will get following error. [root@m1 ranger-hdfs-plugin]# hadoop fs -rmr -skipTrash /hdp/apps/2.3.4.0-3485 rmr: DEPRECATED: Please use ‘rm -r’ instead. Deleted /hdp/apps/2.3.4.0-3485 So

admin
August 1, 2016
0

Application Timeline Server (ATS) issue error code: 500, message: Internal Server Error

Tags : .sst issue leveldb Timeline

I have seen an issue with Application Timeline Server (ATS). Actually Application Timeline Server (ATS) uses a LevelDB database which is stored in the location specified by yarn.timeline-service.leveldb-timeline-store.path in yarn-site.xml.All metadata store in *.sst files under specified location. Due to this we may face an space issue.But It is not good practice to delete *.sst files directly. An *.sst file is a

admin
July 30, 2016
0

Real time use cases of Hadoop

As data continues to grow, businesses now have access to (or generate) more data than ever before–much of which goes unused. How can you turn this data into a competitive advantage? In this article, we explore different ways businesses are capitalizing on data. We keep hearing statistics about the growth of data. For instance: Data

admin
July 25, 2016
0

How to change knox heap size

Some time due to heavy load you may a requirement to increase your knox jvm size to deal more reques and to give response in a time. So in that case you can change your knox jvm size in following ways. go to /usr/hdp/current/knox-server/bin/gateway.sh and seach for APP_MEM_OPTS string. Once you get it then you can change

admin
July 21, 2016
1

Analyze your jobs running on top of Tez

Tags : execution hive tez time

Sometime we have to analyze our jobs to tune our jobs or to prepare some reports. We can use following method to get running time for each and every steps for your job in tez execution engine. By setting up hive.tez.exec.print.summary=true property you can achieve it. hive> select count(*) from cars_beeline; Query ID = s0998dnz_20160711080520_e282c377-5607-4cf4-bcda-bd7010918f9c Total

admin
July 11, 2016
0

Import & Export in Hive

When we work on Hive, there would be lots of scenarios that we need to move data(i.e tables from one cluster to another cluster) from one cluster to another. For example, sometimes we need to copy some production table from one cluster to another cluster. Now we have got very good functionality in hive which give us two

admin
July 11, 2016
1

Ambari shows all services down though hadoop services running

Tags : ambai services down

We have seen many time that our hadoop services are up and running but when we open ambari then it shows all are down. So basically it means services do not have any issue,it is a problem with ambari-agent. Ambari server typically gets to know about the service availability from Ambari agent and using the

admin
June 29, 2016
0

How to enable debug logging for HDFS

Tags : debug debug mode

I have seen many time that sometime error does not give a clear picture about issue and it can be mislead to us. Also we have to waste so much time to investigate it. I have found enabling debug mode is a easy way to troubleshoot any hadoop problem as it gives us a detail

admin
June 29, 2016
0

Backup and Restore of Postgres Database

Tags : backup postgres db restore postgres db

How To Backup Postgres Database 1. Backup a single postgres database This example will backup erp database that belongs to user geekstuff, to the file mydb.sql $ pg_dump -U geekstuff erp -f mydb.sql It prompts for password, after authentication mydb.sql got created with create table, alter table and copy commands for all the tables in

admin
June 29, 2016
0

Hive Cross Cluster replication

Tags : cross cluster hive Replication

Hive Cross-Cluster Replication Here I tried to explain cross-Cluster Replication with a Feed entity. This is a simple way to enforce Disaster Recovery policies or aggregate data from multiple clusters to a single cluster for enterprise reporting. To further illustrate Apache Falcon’s capabilities, we will use an HCatalog/Hive table as the Feed entity. Step 1:

admin
June 29, 2016
0

How to read compressed data from hdfs through hadoop command

Sometime we have a requirement where we need to read compressed data from hdfs through hdfs command. And we have many compressed algorithms like(.gz, .snappy, .lzo and .bz2 etc). I have tried to explain how we can achieve this requirement with the help of following ways : Step 1: Copy any compressed file to your hdfs

admin
June 21, 2016
0

How do I change an existing Ambari DB Postgres to MySQL?

Tags : ambari db migrate mysql

By default when you configure your ambari server then it runs on postgres database. And if after sometime we need to change it to our comfortable and your org lovable db(like mysql) then you need to use following steps. Step 1: Please stop your ambari server and then take back of postgres ambari db(the default password

admin
June 18, 2016
7

Error: java.io.IOException: java.lang.RuntimeException: serious problem (state=,code=0)

If you run your hive query on ORC tables in hdp 2.3.4 then you may encounter this issue and it is because ORC split generation running on a global threadpool and doAs not being propagated to that threadpool. Threads in the threadpool are created on demand at execute time and thus execute as random users that

admin
June 14, 2016
1

Ranger User sync does not work due to ERROR UserGroupSync [UnixUserSyncThread]

If we have enabled AD/LDAP user sync in ranger and we get below error then we need to follow given steps to resolve it. LdapUserGroupBuilder [UnixUserSyncThread] – Updating user count: 148, userName:, groupList: [test, groups] 09 Jun 2016 09:04:34 ERROR UserGroupSync [UnixUserSyncThread] – Failed to initialize UserGroup source/sink. Will retry after 3600000 milliseconds. Error details:

admin
June 14, 2016
0

How to enable Node Label in your cluster

Tags : Capacity Scheduler Node Label

Node Label: Here we described how to use Node labels to run YARN/Other applications on cluster nodes that have a specified node label. Node labels can be set as exclusive or shareable: Exclusive— Access is restricted to applications running in queues associated with the node label. Sharable— If idle capacity is available on the labeled node, resources are

admin
May 3, 2016
2

Run pig script though Oozie

Tags : pig pig with oozie

If you have a requirement where you have to read some file through pig and you want to schedule your pig script via Oozie then this article will help you to do your job. Step 1: First create some dir inside hdfs(under your home dir) would be good. $ hadoop fs -mkdir -p /user/<user_id>/oozie-scripts/PigTest Step 2:

admin
May 3, 2016
1

How To Set Up Master Slave Replication in MySQL

Tags : master replication slave sql

MySQL replication is a process that allows you to easily maintain multiple copies of a MySQL data by having them copied automatically from a master to a slave database. This can helpful for many reasons including facilating a backup for the data, a way to analyze it without using the main database, or simply as

admin
April 26, 2016
0

hdfs balancer gets failed after every 30 mins when you run it through ambari

Tags : failed after every 30 hdfs balancer

Actually there is still a bug in ambari 2.2.0, whenever you run balancer though ambari and it has to balance lots of TBs data then it fails after 30 mins due to timeout. You can see following error in your logs: resource_management.core.exceptions.Fail: Execution of ‘ambari-sudo.sh su hdfs -l -s /bin/bash -c ‘export PATH=’”‘”‘/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin’”‘”‘ ; hdfs

admin
April 12, 2016
0

Encrypt password used by Sqoop to import or export data from database.

Tags : Encrypt password used by Sqoop to import or export hadoop.security.credential.provider.path Java KeyStore mysql.password.jceks org.apache.hadoop.security.alias.JavaKeyStoreProvider

Sqoop became very popular and the darling tool for the industries. Sqoop has developed a lot and become very popular amongst Hadoop ecosystem. When we import or export data from database through Sqoop then we have to give password in command or in file only. I feel this is not a fully secure way to

admin
March 7, 2016
15

Distcp between High Availability enabled cluster

Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine. This impacted the

admin
March 2, 2016
1

How to fix corrupted or under replicated blocks issue

To find out whether hadoop hdfs filesystem has corrupt blocks or not also to fix that we can use below steps : [hdfs@m1 ~]$ hadoop fsck / or [hdfs@m1 ~]$ hadoop fsck hdfs://192.168.56.41:50070/ If you get any corrupted blocks or missing at the end of output like below : Total size: 4396621856 B (Total open files

admin
February 23, 2016
2

How to disable ‘Kill Application’ button in Resource Manager web UI

In Resource Manager UI there is option to kill Application and because of that all users can kill jobs. If you want to disable it then you can use following steps : You can login to Ambari and go to YARN Configs page. Search yarn.resourcemanager.webapp.ui-actions.enabled If it exists, change the value to false. If it does not exist, clear

admin
February 21, 2016
0

Rolling Upgrade HDP 2.2 to HDP 2.3

Use this procedure to perform a rolling upgrade from HDP 2.2 to HDP 2.3. It is highly recommended you validate these steps in a test environment to adjust + account for any special configurations for your cluster. Before upgrading to HDP 2.3, you must first upgrade to Ambari 2.1. Make sure Ambari is upgraded and the cluster are

admin
February 21, 2016
0

Why Learn Big Data and Hadoop?

In my experience, people who do things in their career that they are excited about and have a passion for, can go farther and faster with the self-motivation than if they did something that they didn’t like, but felt like they needed to do it for other reasons. You are awesome in already taking initiative

admin
January 30, 2016
0

How to start learning hadoop

Tags : hadoop learn hadoop learningg

The easiest way to get started with Hadoop is Sandbox with VM Player or Virtual Box. It is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials. Sandbox includes many of the most exciting developments from the latest CDH/HDP distribution, packaged up in a virtual environment. You can start working on hadoop

admin
January 9, 2016
3

Hello world!

Welcome to hadoopadmin.co.in. This is welcome post.

admin
January 8, 2016
0