Monthly Archives: January 2018

  • 0

Purging history/old data in oozie database

After some period of time your oozie db will be big and it may start throwing space issue or might be some slowness during oozie UI load. There are some properties which will help you to purge your oozie data but sometime, the oozie purge service does not function as expected. It result to a huge oozie database size which leads to slowdown your oozie UI.

To reduce size of the tables, you can run the below query to delete some old historical records:

  • Backup the database (highly recommend).
    mysqldump -u root -p oozie > /tmp/oozie.sql 
  • Login to the oozie database.
    mysql -u root -p <password> 
  • Run the below queries to clean up the historical records old than specific date (please adjust the date accordingly):
DELETE FROM OOZIE.WF_ACTIONS where WF_ID IN (SELECT ID from OOZIE.WF_JOBS where end_time < timestamp('2015-09-01 00:00:00'));
DELETE FROM oozie.wf_jobs where end_time < timestamp('2015-09-01 00:00:00');

DELETE from oozie.coord_actions where JOB_ID in (select ID from oozie.coord_jobs where END_TIME < timestamp('2015-09-01 00:00:00'));

DELETE from oozie.coord_jobs where END_TIME < timestamp('2015-09-01 00:00:00'); 
  • If you are using mysql then run the following command to reduce the database size:
    mysqlcheck -u root -p<password> -o oozie

And now you need to happy as you have purge old data in oozie db.

You also can update oozie configuration to auto purge:

oozie.service.PurgeService.coord.older.than = 7
oozie.service.PurgeService.bundle.older.than = 7
oozie.service.PurgeService.purge.limit = 100
oozie.service.PurgeService.older.than = 7
oozie.service.PurgeService.purge.interval = 3600
oozie.service.PurgeService.purge.old.coord.action = true

I hope this article helped you, please feel free to give your valuable feedback.


  • 0

Attempt to add *.jar multiple times to the distributed cache

When we submit Spark2 action via oozie then we may see following exception in logs and job will fail:

exception: Attempt to add (hdfs://m1:8020/user/oozie/share/lib/lib_20171129113304/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.

java.lang.IllegalArgumentException: Attempt to add (hdfs://m1:8020/user/oozie/share/lib/lib_20171129113304/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.

The above error occurs because the same jar files exists in both(/user/oozie/share/lib/lib_20171129113304/oozie/ and  /user/oozie/share/lib/lib_20171129113304/spark2/) the locations.

Solution:

You need to deleted duplicate jars from Spark2 directory and will be left with only one copy in Oozie directory.

  1. Identify the oozie sharelib run the command:
    hdfs dfs -ls /user/oozie/share/lib/
  2. Use following command to list all jar files in directory Oozie:
    hdfs dfs -ls /user/oozie/share/lib/lib_<timestamp>/oozie | awk -F \/ ‘{print $8}’ > /tmp/list
  3. Use following command for deleting the jar files in Spark2 directory which matches with Oozie directory:
    for f in $(cat /tmp/list);do echo $f; hdfs dfs -rm -skipTrash /user/oozie/share/lib/lib_<timestamp>/spark2/$f;done
  4. Restart Oozie Service.

Thanks for visiting this blog, please feel free to give your valuable feedback.