“It used to be expensive to make things public and cheap to make them private. Now it’s expensive to make things private and cheap to make them public” – Clay Shirky
Hadoop has begun a slow and inexorable move from prototypes and experimental projects to core enterprise data management. As the ecosystem of Hadoop projects has matured, new capabilities have filtered into Hadoop to grant it enterprise-ready status.
Mostly notably, Hadoop is finally getting serious capabilities to provide the level of security expected from a platform entrusted with the crown jewels of a firm’s data.
In May 2014, Hortonworks acquired XA Secure systems and began the process of open-sourcing its Hadoop security management system. The source code from the XA Secure platform has been donated to the Apache Software Foundation, where it forms the basis of the Apache Ranger project. Ranger (known as Apache Argus for a brief time) is now part of the Apache Incubator and recently celebrated its first release as an Incubator project.
Ranger provides a centralized, comprehensive platform for managing authorization, access control, auditing, administration, and data protection for data stored in Hadoop. Ranger hooks into HDFS, WebHDFS, Hive, HBase, Knox, and Storm, and it offers a central authorization provider that each of those projects can use to validate data requests. Ranger also provides a comprehensive audit log for viewing requests and their status, as well as a centralized, Web-based administration console for configuring access rights. As you’d expect for any enterprise-grade security solution, Ranger supports syncing user/group information from LDAP/Active Directory.
To accomplish this functionality, Ranger incorporates a stand-alone Daemon module, which is responsible for syncing with LDAP/AD, and distributing policies to nodes in the cluster. A lightweight agent runs embedded in the individual Hadoop components that need data protection (Hive, HBase, and so on) and uses the security hooks built into those components.
This agent also runs as part of the NameNode to provide access control for HDFS and gathers request details, which are stored in the audit log. Policy enforcement is performed at the level of local nodes, which means Ranger has no significant performance impact at runtime.
Ranger furthers deep integration with Hadoop by working with standard SQL authorization in Hive 1.3, allowing the use of SQL grant/revoke functionality at the table level. In addition, Ranger provides a mode that can validate access within HDFS by using Hadoop file-level permissions. Along with the Web-based administration console, Ranger supports a REST API for policy administration.
Ranger interoperates with existing IdM and SSO solutions, including Siteminder, Tivoli Access Manager, Oracle Access Management Suite, and solutions based on SAML. With Knox integration built in, Ranger can be used to protect any REST endpoint that provides perimeter access to data stored in the Hadoop cluster. Ranger even works to secure ODBC/JDBC connections using HiveServer2, so long as the Thrift gateway is configured to use HTTP for those calls.
Long term, Ranger has stated goals to cover the following aspects of Hadoop security:
- Centralized security administration to manage all security related tasks
- Fine-grained authorization for specific actions and/or operations with a Hadoop component/tool, managed through a central administration tool
- Standardized authorization method across all Hadoop components
- Enhanced support for different authorization methods, including role-based access control, attribute-based access control, and so on
- Enablement of tag-based global policies
- Centralized auditing of all user access and administrative actions related to security for all components of Hadoop
Despite the substantial functionality in Ranger today, open questions persist about how it will fit into the larger Hadoop security ecosystem. For example, some Ranger goals overlap with those of Apache Sentry, and there seems to be little consensus to date about how the projects may synchronize their efforts. Also, because most Hadoop subcomponents are developed as separate projects (usually within the ASF), with different groups of committers and different PMCs, it’s unclear whether all Hadoop projects will actively choose to use Ranger.
To increase its odds, the Ranger team must cultivate buy-in and adoption from the teams building most of the other Hadoop components. Fortunately, with Hortonworks, Ranger has the backing of a Hadoop vendor well positioned to support and cultivate the project into the future. Hortonworks recently filed for a pending IPO and has made Ranger a core element of its HDP 2.2 release.
Already, if you’re evaluating Hadoop for any deployment involving sensitive data, Ranger is likely to play a significant role in protecting that data. Assuming that the proposed integration with Falcon, Accumulo, and other Hadoop tools unfold as planned, Ranger may well evolve into the Hadoop standard for centralized, comprehensive security management.
Audit : Controls access into the system via extensive user access auditing in HDFS, Hive and HBase.
Installation and Configuration:
Let us first see what are the available Ranger packages (optional)
[firstname.lastname@example.org ~]# yum search ranger
Loaded plugins: fastestmirror, priorities, security
Loading mirror speeds from cached hostfile
* base: centos.bytenet.in
* extras: centos.bytenet.in
* updates: centos.bytenet.in
================================================================= N/S Matched: ranger =================================================================
ranger.noarch : ranger HDP virtual package
ranger-admin.noarch : ranger-admin HDP virtual package
ranger-debuginfo.noarch : ranger-debuginfo HDP virtual package
ranger-hbase-plugin.noarch : ranger-hbase-plugin HDP virtual package
ranger-hdfs-plugin.noarch : ranger-hdfs-plugin HDP virtual package
ranger-hive-plugin.noarch : ranger-hive-plugin HDP virtual package
ranger-knox-plugin.noarch : ranger-knox-plugin HDP virtual package
ranger-storm-plugin.noarch : ranger-storm-plugin HDP virtual package
ranger-usersync.noarch : ranger-usersync HDP virtual package
ranger_2_2_0_0_2041-admin.x86_64 : Web Interface for Ranger
ranger_2_2_0_0_2041-debuginfo.x86_64 : Debug information for package ranger_2_2_0_0_2041
ranger_2_2_0_0_2041-hbase-plugin.x86_64 : ranger plugin for hbase
ranger_2_2_0_0_2041-hdfs-plugin.x86_64 : ranger plugin for hdfs
ranger_2_2_0_0_2041-hive-plugin.x86_64 : ranger plugin for hive
ranger_2_2_0_0_2041-knox-plugin.x86_64 : ranger plugin for knox
ranger_2_2_0_0_2041-storm-plugin.x86_64 : ranger plugin for storm
ranger_2_2_0_0_2041-usersync.x86_64 : Synchronize User/Group information from Corporate LD/AD or Unix
Name and summary matches only, use “search all” for everything.
Now let us start –
Step 1: Go ahead and install Ranger
1. yum install ranger-admin
2. yum install ranger-usersync
3. yum install ranger-hdfs-plugin
4. yum install ranger-hive-plugin
5. set JAVA_HOME
export JAVA_HOME=/usr/jdk64/jdk1.7.0_67 (substitute this with jdk path on your system)
echo “export JAVA_HOME=/usr/jdk64/jdk1.7.0_67″ >> ~/.bashrc
Step2: Set up the ranger admin UI
We need to run the setup script present at “/usr/hdp/current/ranger-admin” location. It will –
1. add ranger user and group.
2. set up ranger DB (Please ensure you know your MySQL root password since it will ask for it while setting up the ranger DB)
3. create rangeradmin and rangerlogger MySQL users with appropriate grants.
Besides MySQL root password, whenever it prompts for password for setting up ranger and audit DB, please enter ‘hortonworks’ or anything else you wish. Just remember it for future use.
[email@example.com ranger-admin]# pwd
[firstname.lastname@example.org ranger-admin]# ./setup.sh
[2015/03/31 15:58:41]: ——— Running XASecure PolicyManager Web Application Install Script ———
[2015/03/31 15:58:41]: [I] uname=Linux
[2015/03/31 15:58:41]: [I] hostname=hdpcm.dm.com
[2015/03/31 15:58:41]: [I] DB_FLAVOR=MYSQL
Installation of XASecure PolicyManager Web Application is completed.
Step 3: Start ranger-admin service
[email@example.com ews]# pwd
[firstname.lastname@example.org ews]# sh start-ranger-admin.sh
Apache Ranger Admin has started
Logs available at : /usr/hdp/current/ranger-admin/ews/logs
Step 4: Setup up ranger-usersync
By default it will sync UNIX users to the Ranger UI. You can also sync it with LDAP. This article syncs UNIX users.
1. Edit /usr/hdp/current/ranger-usersync/install.properties file.
2. Update “POLICY_MGR_URL” to point to your ranger host:
POLICY_MGR_URL = http://IP of your Ranger host:6080
Now run /usr/hdp/current/ranger-usersync/setup.sh
Step 5: Start the ranger-usersync service
[email@example.com ranger-usersync]# pwd
[firstname.lastname@example.org ranger-usersync]# sh start.sh
UnixAuthenticationService has started successfully.
Congratulations!! You have installed and configured Ranger successfully
Now Login to the Ranger Web UI by hitting below URL:
Default password for admin user is “admin”. Once you login you can change this admin password via profile settings
Once you log in, your username is also displayed on the Ranger Console home page.
To log out of the Ranger Console, click your username, in the top right-hand corner of the screen. At the drop-down menu, click Logout.
In Ranger, for each component you work with a Repository. These repositories are based on an underlying plug-in or agent that operates with that component.
Associated with each of these repositories is a set of policies, which are associated with the resource you are protecting (a table, folder, or column) and a group (such as administrators) and what they are allowed to do with that thing (read, write, and so on). You give each policy a name — say, “Only the grp_nixon can read the apac_china table.”
A GUI with a central view of who is allowed to do what brings much needed simplicity to the Hadoop ecosystem, but that’s not all that Ranger offers. It also provides audit logging. Although this can’t supplant all the application audit logging you could ever want, if you simply need to know who accessed what on HDFS or what policies were enforced where, it’s probably exactly what you need.
In addition, Ranger can provide Key Management Services in order to work with HDFS’s new TDE (transparent data encryption). So if you need end-to-end encryption and a clean way to manage the keys associated with it, Ranger is not a bad place to start.