Attempt to add *.jar multiple times to the distributed cache
When we submit Spark2 action via oozie then we may see following exception in logs and job will fail:
exception: Attempt to add (hdfs://m1:8020/user/oozie/share/lib/lib_20171129113304/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
java.lang.IllegalArgumentException: Attempt to add (hdfs://m1:8020/user/oozie/share/lib/lib_20171129113304/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
The above error occurs because the same jar files exists in both(/user/oozie/share/lib/lib_20171129113304/oozie/ and /user/oozie/share/lib/lib_20171129113304/spark2/) the locations.
Solution:
You need to deleted duplicate jars from Spark2 directory and will be left with only one copy in Oozie directory.
- Identify the oozie sharelib run the command:
hdfs dfs -ls /user/oozie/share/lib/ - Use following command to list all jar files in directory Oozie:
hdfs dfs -ls /user/oozie/share/lib/lib_<timestamp>/oozie | awk -F \/ ‘{print $8}’ > /tmp/list - Use following command for deleting the jar files in Spark2 directory which matches with Oozie directory:
for f in $(cat /tmp/list);do echo $f; hdfs dfs -rm -skipTrash /user/oozie/share/lib/lib_<timestamp>/spark2/$f;done - Restart Oozie Service.
Thanks for visiting this blog, please feel free to give your valuable feedback.