AWS S3

Configuration Options

Required Options

bucket(required)

The S3 bucket name. Do not include a leading s3:// or a trailing /.

TypeSyntaxDefaultExample
stringliteral["my-bucket"]
inputs(required)

A list of upstream source or transform IDs. Wildcards (*) are supported.

See configuration for more info.

TypeSyntaxDefaultExample
arrayliteral["my-source-or-transform-id","prefix-*"]
encoding(required)

Configures the encoding specific sink behavior.

TypeSyntaxDefaultExample
hashliteral[]
region(required)

The AWS region of the target service. If endpoint is provided it will override this value since the endpoint includes the region.

TypeSyntaxDefaultExample
stringliteral["us-east-1"]
type(required)

The component type. This is a required field for all components and tells Vector which component to use.

TypeSyntaxDefaultExample
stringliteral["aws_s3"]

Advanced Options

auth(optional)

Options for the authentication strategy.

TypeSyntaxDefaultExample
hashliteral[]
endpoint(optional)

Custom endpoint for use with AWS-compatible services. Providing a value for this option will make region moot.

TypeSyntaxDefaultExample
stringliteral["127.0.0.0:5000/path/to/service"]
acl(optional)

Canned ACL to apply to the created objects. For more information, see Canned ACL.

TypeSyntaxDefaultExample
stringliteral
content_encoding(optional)

Specifies what content encodings have been applied to the object and thus what decoding mechanisms must be applied to obtain the media-type referenced by the Content-Type header field. By default calculated from compression value.

TypeSyntaxDefaultExample
stringliteral["gzip"]
content_type(optional)

A standard MIME type describing the format of the contents.

TypeSyntaxDefaultExample
stringliteraltext/x-log
filename_append_uuid(optional)

Whether or not to append a UUID v4 token to the end of the file. This ensures there are no name collisions high volume use cases.

TypeSyntaxDefaultExample
bool
filename_extension(optional)

The filename extension to use in the object name.

TypeSyntaxDefaultExample
stringliterallog
filename_time_format(optional)

The format of the resulting object file name. strftime specifiers are supported.

TypeSyntaxDefaultExample
stringstrftime%s
grant_full_control(optional)

Gives the named grantee READ, READ_ACP, and WRITE_ACP permissions on the created objects.

TypeSyntaxDefaultExample
stringliteral["79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be","person@email.com","http://acs.amazonaws.com/groups/global/AllUsers"]
grant_read(optional)

Allows the named grantee to read the created objects and their metadata.

TypeSyntaxDefaultExample
stringliteral["79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be","person@email.com","http://acs.amazonaws.com/groups/global/AllUsers"]
grant_read_acp(optional)

Allows the named grantee to read the created objects' ACL.

TypeSyntaxDefaultExample
stringliteral["79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be","person@email.com","http://acs.amazonaws.com/groups/global/AllUsers"]
grant_write_acp(optional)

Allows the named grantee to write the created objects' ACL.

TypeSyntaxDefaultExample
stringliteral["79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be","person@email.com","http://acs.amazonaws.com/groups/global/AllUsers"]
key_prefix(optional)

A prefix to apply to all object key names. This should be used to partition your objects, and it's important to end this value with a / if you want this to be the root S3 "folder".

TypeSyntaxDefaultExample
stringtemplatedate=%F/["date=%F/","date=%F/hour=%H/","year=%Y/month=%m/day=%d/","application_id={{ application_id }}/date=%F/"]
server_side_encryption(optional)

The Server-side Encryption algorithm used when storing these objects.

TypeSyntaxDefaultExample
stringliteral
ssekms_key_id(optional)

If server_side_encryption has the value "aws.kms", this specifies the ID of the AWS Key Management Service (AWS KMS) symmetrical customer managed customer master key (CMK) that will used for the created objects. If not specified, Amazon S3 uses the AWS managed CMK in AWS to protect the data.

TypeSyntaxDefaultExample
stringliteral["abcd1234"]
storage_class(optional)

The storage class for the created objects. See the S3 Storage Classes for more details.

TypeSyntaxDefaultExample
stringliteral
buffer(optional)

Configures the sink specific buffer behavior.

TypeSyntaxDefaultExample
hashliteral[]
batch(optional)

Configures the sink batching behavior.

TypeSyntaxDefaultExample
hash[]
compression(optional)

The compression strategy used to compress the encoded event data before transmission.

Some cloud storage API clients and browsers will handle decompression transparently, so files may not always appear to be compressed depending how they are accessed.

TypeSyntaxDefaultExample
stringliteralgzip
healthcheck(optional)

Health check options for the sink.

TypeSyntaxDefaultExample
hash[]
request(optional)

Configures the sink request behavior.

TypeSyntaxDefaultExample
hash[]
proxy(optional)

Configures an HTTP(S) proxy for Vector to use. By default, the globally configured proxy is used.

TypeSyntaxDefaultExample
hashliteral[]
tags(optional)

The tag-set for the object.

TypeSyntaxDefaultExample
hash[{"Tag1":"Value1"}]

How it Works

AWS authentication

Vector checks for AWS credentials in the following order:

  1. The access_key_id and secret_access_key options.
  2. The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.
  3. The credential_process command in the AWS config file (usually located at ~/.aws/config).
  4. The AWS credentials file (usually located at ~/.aws/credentials).
  5. The IAM instance profile (only works if running on an EC2 instance with an instance profile/role).

If no credentials are found, Vector's health check fails and an error is logged. If your AWS credentials expire, Vector will automatically search for up-to-date credentials in the places (and order) described above.

Cross account object writing

If you're using Vector to write objects across AWS accounts then you should consider setting the grant_full_control option to the bucket owner's canonical user ID. AWS provides a full tutorial for this use case. If don't know the bucket owner's canonical ID you can find it by following this tutorial.

Object Access Control List (ACL)

AWS S3 supports access control lists (ACL) for buckets and objects. In the context of Vector, only object ACLs are relevant (Vector does not create or modify buckets). You can set the object level ACL by using one of the acl, grant_full_control, grant_read, grant_read_acp, or grant_write_acp options.

Object naming

Vector uses two different naming schemes for S3 objects. If you set the compression parameter to true (this is the default), Vector uses this scheme:

<key_prefix><timestamp>-<uuidv4>.log.gz

If compression isn't enabled, Vector uses this scheme (only the file extension is different):

<key_prefix><timestamp>-<uuidv4>.log

Some sample S3 object names (with and without compression, respectively):

date=2019-06-18/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log.gz
date=2019-06-18/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log

Vector appends a UUIDV4 token to ensure there are no naming conflicts in the unlikely event that two Vector instances are writing data at the same time.

You can control the resulting name via the key_prefix, filename_time_format, and filename_append_uuid options.

Object Tags & metadata

Vector currently only supports AWS S3 object tags and does not support object metadata. If you require metadata support see issue #1694.

We believe tags are more flexible since they are separate from the actual S3 object. You can freely modify tags without modifying the object. Conversely, object metadata requires a full rewrite of the object to make changes.

Server-Side Encryption (SSE)

AWS S3 offers server-side encryption. You can apply defaults at the bucket level or set the encryption at the object level. In the context, of Vector only the object level is relevant (Vector does not create or modify buckets). Although, we recommend setting defaults at the bucket level when possible. You can explicitly set the object level encryption via the server_side_encryption option.

State

This component is stateless, meaning its behavior is consistent across each input.

Health checks

Health checks ensure that the downstream service is accessible and ready to accept data. This check is performed upon sink initialization. If the health check fails an error will be logged and Vector will proceed to start.

Partitioning

Vector supports dynamic configuration values through a simple template syntax. If an option supports templating, it will be noted with a badge and you can use event fields to create dynamic values. For example:

[sinks.my-sink]
dynamic_option = "application={{ application_id }}"

In the above example, the application_id for each event will be used to partition outgoing data.

Rate limits & adapative concurrency

Buffers and batches

This component buffers & batches data as shown in the diagram above. You'll notice that Vector treats these concepts differently, instead of treating them as global concepts, Vector treats them as sink specific concepts. This isolates sinks, ensuring services disruptions are contained and delivery guarantees are honored.

Batches are flushed when 1 of 2 conditions are met:

  1. The batch age meets or exceeds the configured timeout_secs.
  2. The batch size meets or exceeds the configured max_size or max_events.

Buffers are controlled via the buffer.* options.

Storage class

AWS S3 offers storage classes. You can apply defaults, and rules, at the bucket level or set the storage class at the object level. In the context of Vector only the object level is relevant (Vector does not create or modify buckets). You can set the storage class via the storage_class option.

Retry policy

Vector will retry failed requests (status == 429, >= 500, and != 501). Other responses will not be retried. You can control the number of retry attempts and backoff rate with the request.retry_attempts and request.retry_backoff_secs options.