Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

ECS 3.6.2 Data Access Guide

PDF

Secure the ECS bucket using metadata

To ensure that the ECS bucket can work with a secure Hadoop cluster, the bucket must have access to information about the cluster.

In a secure Hadoop cluster, the Kerberos principal must be mapped to a HDFS username. In addition, the user must be mapped to a UNIX group. Within the Hadoop cluster, the NameNode gathers this information from the Hadoop nodes themselves and from the configuration files (core-site.xml and hdfs.xml).

To enable the ECS nodes to determine this information and to validate client requests, the following data must be made available to the ECS nodes:

  • Kerberos user to UNIX user and group mapping
  • Superuser group
  • Proxy user settings

The data is made available to the ECS nodes as a set of name-value pairs held as metadata.

Kerberos users

Information about every Kerberos user (not AD users) that requires Hadoop access to a bucket must be uploaded to ECS. The following data is required:

  • Principal name
  • Principal shortname (mapped name)
  • Principal groups

If there are 10 Kerberos principals on a Hadoop node, you must create 30 name value pairs in the JSON input file. Every name must be unique, so you will must uniquely assign a name for every principal name, principal shortname, and principal group. ECS expects a constant prefix and suffix for the JSON entry names.

The required prefix for every Kerberos user entry is internal.kerberos.user, and the three possible suffixes are name, shortname and groups. As shown in the following example.

{
    "name": "internal.kerberos.user.hdfs.name",
    "value": "hdfs-cluster999@EXAMPLE_HDFS.EMC.COM"
},
{
    "name": "internal.kerberos.user.hdfs.shortname",
    "value": "hdfs"
},
{
    "name": "internal.kerberos.user.hdfs.groups",
    "value": "hadoop,hdfs"
},
The value between the prefix and suffix can be anything, as long is it uniquely identifies the entry. For example, you could use:
"name": "internal.kerberos.user.1.name",
"name": "internal.kerberos.user.1.shortname",
"name": "internal.kerberos.user.1.groups",
Principals can map to a different users. For example, the rm principal user is usually mapped to the yarn users using auth_to_local setting for the Hadoop cluster, like this.
RULE:[2:$1@$0](rm@EXAMPLE_HDFS.EMC.COM)s/.*/yarn/
So for any principal that maps to a different principal (for example, the rm principal maps to the yarn principal), you must use the mapped principal in the shortname value, so the entry for the rm principal would be:
{
"name": "internal.kerberos.user.rm.name",
"value": "rm@EXAMPLE_HDFS.EMC.COM"
},
{
"name": "internal.kerberos.user.yarn.shortname",
"value": "yarn@EXAMPLE_HDFS.EMC.COM"
},
{
"name": "internal.kerberos.user.yarn.groups",
"value": "hadoop"
},

Supergroup

You must tell ECS which Linux group of users on the Hadoop nodes get superuser privileges based on their group. Only one entry in the JSON input file is expected for the supergroup designation. It must be like the following:

{
    "name": "dfs.permissions.supergroup",
    "value": "hdfs"
}

Proxy settings

For proxy support, you must identify all proxy settings that are allowed for each Hadoop application, where application means one of the Hadoop-supported applications, for example, hive, and so on.

In the following example, proxy support for the hive application is granted to users who are members of the s3users group (AD or Linux group), and can run hive on any of the hosts in the Hadoop cluster. So the JSON entry for this is two name/value pairs, one for the hosts setting, and one for the groups setting.

{
    "name": "hadoop.proxyuser.hive.hosts",
    "value": "*"
},
{
    "name": "hadoop.proxyuser.hive.groups",
    "value": "s3users"
}

The complete file

The three types of metadata must be combined into a single JSON file. The JSON file format is as shown in the following example.

{
    "head_type": "hdfs",
    "metadata": [
    {
        "name": "METADATANAME_1",
        "value": "METADATAVALUE_1"
    },
    {
        "name": "METADATANAME_2",
        "value": "METADATAVALUE_2"
    },

        :

    {
        "name": "METADATANAME_N",
        "value": "METADATAVALUE_N"
    }
    ]
}
NOTE:

The last name/value pair does not have a trailing "," character.

An example of a JSON file is shown in: Secure bucket metadata.

Secure and non-secure buckets

Once metadata is loaded into a bucket, it is referred to as a secure bucket and you must have Kerberos principals to access it. A request from a non-secure Hadoop node is rejected. If metadata is not loaded, the bucket is not secure and a request from a secure Hadoop node is rejected.

The following error is seen if you try and access a secure bucket from a non-secure cluster. A similar message is seen if you try and access a non-secure bucket from a secure cluster.
[hdfs@sandbox ~]$ hadoop fs -ls -R viprfs://hdfsBucket3.s3.site1/
ls: ViPRFS internal error (ERROR_FAILED_TO_PROCESS_REQUEST).

Rate this content

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please select whether the article was helpful or not.
  Comments cannot contain these special characters: <>()\