To ensure that the ECS bucket can work with a secure Hadoop cluster, the bucket must have access to information about the cluster.
In a secure Hadoop cluster, the Kerberos principal must be mapped to a HDFS username. In addition, the user must be mapped to a UNIX group. Within the Hadoop cluster, the NameNode gathers this information from the Hadoop nodes themselves and from the configuration files (core-site.xml and
hdfs.xml).
To enable the ECS nodes to determine this information and to validate client requests, the following data must be made available to the ECS nodes:
Kerberos user to UNIX user and group mapping
Superuser group
Proxy user settings
The data is made available to the ECS nodes as a set of name-value pairs held as metadata.
Kerberos users
Information about every Kerberos user (not AD users) that requires Hadoop access to a bucket must be uploaded to ECS. The following data is required:
Principal name
Principal shortname (mapped name)
Principal groups
If there are 10 Kerberos principals on a Hadoop node, you must create 30 name value pairs in the JSON input file. Every name must be unique, so you will must uniquely assign a name for every principal name, principal shortname, and principal group. ECS expects a constant prefix and suffix for the JSON entry names.
The required prefix for every Kerberos user entry is
internal.kerberos.user, and the three possible suffixes are
name,
shortname and
groups. As shown in the following example.
Principals can map to a different users. For example, the
rm principal user is usually mapped to the
yarn users using
auth_to_local setting for the Hadoop cluster, like this.
RULE:[2:$1@$0](rm@EXAMPLE_HDFS.EMC.COM)s/.*/yarn/
So for any principal that maps to a different principal (for example, the
rm principal maps to the
yarn principal), you must use the mapped principal in the shortname value, so the entry for the
rm principal would be:
You must tell ECS which Linux group of users on the Hadoop nodes get superuser privileges based on their group. Only one entry in the JSON input file is expected for the supergroup designation. It must be like the following:
For proxy support, you must identify all proxy settings that are allowed for each Hadoop application, where application means one of the Hadoop-supported applications, for example, hive, and so on.
In the following example, proxy support for the hive application is granted to users who are members of the
s3users group (AD or Linux group), and can run hive on any of the hosts in the Hadoop cluster. So the JSON entry for this is two name/value pairs, one for the hosts setting, and one for the groups setting.
Once metadata is loaded into a bucket, it is referred to as a
secure bucket and you must have Kerberos principals to access it. A request from a non-secure Hadoop node is rejected. If metadata is not loaded, the bucket is not secure and a request from a secure Hadoop node is rejected.
The following error is seen if you try and access a secure bucket from a non-secure cluster. A similar message is seen if you try and access a non-secure bucket from a secure cluster.