Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Create and access a list of your products

ECS 3.6.1 Administration Guide

Configuration at Ambari Node

About this task

Following are some of the basic configurations which should be added to make S3A work at Ambari UI > HDFS > core-site.xml. There are other parameters that can be added to the Ambari node. All these parameters are listed here.

NOTE

Putting S3A credentials in Hadoop core-site file leads to security vulnerability, since this allows bucket access for any Hadoop user that can view the credentials. If your Hadoop cluster contains sensitive data on the S3A object bucket, use one of the two IAM methods of authorization that is discussed above.

The following list of configuration parameters should be added in core-site.xml on Ambari UI. If you are using credential providers or IAM, you would not be defining the access key or secret key in core-site.

fs.s3a.endpoint= <ECS IP address (only one node address) or LoadBalancer IP>:9020 #Comment //s3a does not support multiple IP addresses so better to have a loadbalancer 
fs.s3a.access.key= <S3 Object User as created on ECS>
fs.s3a.secret.key=<S3 Object User Secret Key as on ECS>
fs.s3a.connection.maximum=15
fs.s3a.connection.ssl.enabled=false
fs.s3a.path.style.access=false
fs.s3a.connection.establish.timeout=5000
fs.s3a.connection.timeout=200000
fs.s3a.paging.maximum=1000
fs.s3a.threads.max=10
fs.s3a.socket.send.buffer=8192
fs.s3a.socket.recv.buffer=8192
fs.s3a.threads.keepalivetime=60
fs.s3a.max.total.tasks=5
fs.s3a.multipart.size=100M
fs.s3a.multipart.threshold=2147483647
fs.s3a.multiobjectdelete.enable=true
fs.s3a.acl.default=PublicReadWrite
fs.s3a.multipart.purge=false
fs.s3a.multipart.purge.age=86400
fs.s3a.block.size=32M
fs.s3a.readahead.range=64K
fs.s3a.buffer.dir=${hadoop.tmp.dir}/s3a
Significance of each parameter:
fs.s3a.access.key - Your AWS access key ID
fs.s3a.secret.key - Your AWS secret key
fs.s3a.connection.maximum - Controls how many parallel connections HttpClient spawns (default: 15)
fs.s3a.connection.ssl.enabled - Enables or disables SSL connections to S3 (default: true)
fs.s3a.attempts.maximum - How many times we should retry commands on transient errors (default: 10)
fs.s3a.connection.timeout - Socket connect timeout (default: 5000)
fs.s3a.paging.maximum - How many keys to request from S3 when doing directory listings at a time (default: 5000)
fs.s3a.multipart.size - How big (in bytes) to split a upload or copy operation up into (default: 100 MB)
fs.s3a.multipart.threshold - Until a file is this large (in bytes), use non-parallel upload (default: 2 GB)
fs.s3a.acl.default - Set a canned ACL on newly created/copied objects (Private | PublicRead | PublicReadWrite | AuthenticatedRead | LogDeliveryWrite | BucketOwnerRead | BucketOwnerFullControl)
fs.s3a.multipart.purge - True if you want to purge existing multipart uploads that may not have been completed/aborted correctly (default: false)
fs.s3a.multipart.purge.age - Minimum age in seconds of multipart uploads to purge (default: 86400)
fs.s3a.buffer.dir - Comma separated list of directories that will be used to buffer file writes out of (default: uses fs.s3.buffer.dir)
fs.s3a.server-side-encryption-algorithm - Name of server side encryption algorithm to use for writing files (e.g. AES256) (default: null)
For details, see here.

Add this parameter in your command line while copying a large file: -D fs.s3a.fast.upload=true. For details, see here.


Rate this content

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please select whether the article was helpful or not.
  Comments cannot contain these special characters: <>()\