Encrypt password used by Sqoop to import or export data from database.

  • 11

Encrypt password used by Sqoop to import or export data from database.

Sqoop became very popular and the darling tool for the industries. Sqoop has developed a lot and become very popular amongst Hadoop ecosystem. When we import or export data from database through Sqoop then we have to give password in command or in file only. I feel this is not a fully secure way to keep password in a file or pass password through command line. In this post, I will try cover the ways to specify database passwords to Sqoop in simple and secure way.

The following ways are common to pass database passwords to Sqoop:

Option 1: We can give password in command itself.

sqoop import --connect jdbc:mysql://servera.com:3306/bigdb --username dbuser --password <passwoord> --table sqoop_import -m 1 --target-dir /user/root/sanity_test/ --append

Option 2: We can keep password in a file and can pass this file in command.

sqoop import --connect jdbc:mysql://servera.com:3306/bigdb --username dbuser -password-file /home/root/.mysql.password --table sqoop_import -m 1 --target-dir /user/root/sanity_test/ --append

However, storing password in a text file is still considered not secure even though we have set the permissions.

As of Sqoop version 1.4.5, Sqoop supports the use of JKS to store passwords which would be encrypted,so that you do not need to store passwords in clear text in a file.This can be achieved using Java KeyStore.

Option 3: To generate the key:

Note: On prompt, enter the password that will be used to access the database.

[root@m1.hdp22 ~]$ hadoop credential create mydb.password.alias -provider jceks://hdfs/user/root/mysql.password.jceks
Enter password:
Enter password again:
mydb.password.alias has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.

## Now you can use mydb.password.alias file in your sqoop command like below:
## Please add --Dhadoop.security.credential.provider.path=jceks://hdfs/user/root/mysql.password.jceks to the command and change the password like follows: 

sqoop import --Dhadoop.security.credential.provider.path=jceks://hdfs/user/root/mysql.password.jceks --connect jdbc:mysql://server:3306/sqoop --username sqoop --password-alias mydb.password.alias --table Test --delete-target-dir --target-dir /apps/hive/warehouse/test --split-by name

##or
sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/root/mysql.password.jceks --connect ‘jdbc:mysql://servera.com:3306/bigdb’ --tablesqoop_import --username dbuser --password-alias mydb.password.alias -m 1 --target-dir /user/root/sanity_test/ --append

This way password is hidden inside jceks://hdfs/user/root/mysql.password.jceks and no one is able to see it.

Note: the “mydb.password.alias” is the alias that we can use to pass to Sqoop when running the command, so that no password is needed.

I hope you enjoy reading this article and will help you to protect your password. Please give your valuable feedback.


11 Comments

Yury

September 28, 2016 at 3:37 pm

Hi. Thank you for your post.
I just want to ask if you tried to run hadoop.security.credential.provider API inside Oozie Sqoop action.
Also using this on non Kerberos environment is not a real protection as anybody can present himself as specific Hadoop user and get password from encrypted file. So probably local Linux password for provider itself is required. Is my understanding correct here ?
Please see
http://aajisaka.github.io/hadoop-project/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html

Thanks.
Yury

    admin

    September 29, 2016 at 6:49 am

    Hi Yury,

    I did not try to run this API in oozie sqoop action, but I will try and will let you know the outcome.
    Also you are right in non kerberos env anyone can impersonate to any hadoop user and can extract password. Local linux password is first step to login to server and then run the given API for sqoop jobs. Please feel free to reach out to me for any further clarifications.

      Yury

      September 29, 2016 at 10:39 am

      Still fighting with Oozie Sqoop action. Same issue as other lads having on
      https://community.cloudera.com/t5/Data-Ingestion-Integration/Oozie-workflow-with-Sqoop-action-and-password-alias-option-does/td-p/39226
      I can’t figure out default password to Keystore and difference between invoking credentials API through Sqoop command line and Oozie Sqoop action.
      See class:
      org.apache.hadoop.security.alias.JavaKeyStoreProvider
      (constructor initialization for password)
      ( hadoop-common:2.6.0 )

      Yury

        Yury

        September 29, 2016 at 4:13 pm

        Hi Admin (sorry can’t see your name )
        I was able to run Oozie Sqoop on my another Hadoop env ( newer version 5.8.0 )
        I think my original problem was either
        HADOOP_CREDSTORE_PASSWORD
        or
        hadoop.security.credstore.java-keystore-provider.password-file
        in my first env.

        On new env:

        $ hadoop credential create my.password -provider jceks://hdfs/user/my_path.jceks
        WARNING: You have accepted the use of the default provider password
        by not configuring a password in one of the two following locations:
        * In the environment variable HADOOP_CREDSTORE_PASSWORD
        * In a file referred to by the configuration entry
        hadoop.security.credstore.java-keystore-provider.password-file.
        Please review the documentation regarding provider passwords in
        the keystore passwords section of the Credential Provider API
        Continuing with the default provider password.

        Enter alias password:
        Enter alias password again:
        warehouse.password has been successfully created.
        See thread
        https://issues.apache.org/jira/browse/HADOOP-12942

          admin

          September 30, 2016 at 7:39 am

          Thanks for updating Yury,and I am happy that your problem resolved. Yes you are right it was a problem with older version for HADOOP_CREDSTORE_PASSWORD, they have fixed it in new version. Please feel free to reach out to me in case of any further assistance.

          Purva Sharma

          June 30, 2017 at 7:32 am

          Hi Yury and Admin,
          Thank you for the detailed blog. Its very informative.
          As mentioned above you are able to run oozie sqoop on Hadoop env newer version 5.8.0.
          Here you mean CDH not hadoop ? Can you please mention the Hadoop/Sqoop/Oozie version you are using to execute this ? I am using Hortonwork framework(hdp 2.5.3.0-37) with hadoop(2.7.3.2.5.3.0-37), oozie(4.2.0.2.5.3.0-37),Sqoop(1.4.6.2.5.3.0-37)
          Request you to please reply.

          admin

          June 30, 2017 at 9:27 am

          Hello Purva,

          Thanks for your comments.
          It is working on both env CDH and HDP, but if you are facing any issue please post your error here and will help you to debug it.

Anuj Rastogi

December 4, 2016 at 9:57 am

Hi Admin,
I tried, the steps mentioned above and it is working fine. Thanks a lot for information.

Just one question… If we can to decrypt and see the password that was encrypted using : “HADOOP CREDENTIAL PROVIDER API”, do we have any such functionality ? Kindly share ur opinion on this.

    admin

    December 4, 2016 at 11:26 am

    Hello Anuj,

    Thanks for your valuable feedback.
    I do not think that we can decrypt this Java Keystore Provider password, but I can try to find out the way to decrypt it or if you find please let me know.

fayaz

May 8, 2017 at 4:51 pm

Dear admin do we have to mention Dhadoop.security.credential.provider.path=jceks://hdfs/user/root/mysql.password.jc

this path every time we use sqoop . or is there any other way to access it ??
can we add property in conf file??

    admin

    May 9, 2017 at 6:45 am

    Hello Fayaz,

    Yes,as of now you have to mention this path everytime when you do sqoop. And I don’t see any property where we can set this path or password. I will check if some enhancement request or already available in new version.

    But if you don’t want to mention it every time then you can set this value to some variable in unix and then use that variable in sqoop import. like following.

    var1=”hadoop.security.credential.provider.path=jceks://hdfs/user/root/mysql.password.jceks”

    sqoop import –D$var1 –connect jdbc:mysql://:3306/sqoop –username sqoop –password-alias mydb.password.alias –table Test –delete-target-dir –target-dir /apps/hive/warehouse/test –split-by name

    Please feel free to reach out to me in case of any further assistance.

Leave a Reply