AWS S3

Requirements

The AWS S3 source requires a SQS queue configured to receive S3 bucket notifications for the desired S3 buckets.

Configuration Options

Required Options

region(required)

The AWS region of the target service. If endpoint is provided it will override this value since the endpoint includes the region.

TypeSyntaxDefaultExample
stringliteral["us-east-1"]
type(required)

The component type. This is a required field for all components and tells Vector which component to use.

TypeSyntaxDefaultExample
stringliteral["aws_s3"]

Advanced Options

auth(optional)

Options for the authentication strategy.

TypeSyntaxDefaultExample
hashliteral[]
endpoint(optional)

Custom endpoint for use with AWS-compatible services. Providing a value for this option will make region moot.

TypeSyntaxDefaultExample
stringliteral["127.0.0.0:5000/path/to/service"]
strategy(optional)

The strategy to use to consume objects from AWS S3.

TypeSyntaxDefaultExample
stringliteralsqs
compression(optional)

The compression format of the S3 objects..

TypeSyntaxDefaultExample
stringliteraltext
multiline(optional)

Multiline parsing configuration. If not specified, multiline parsing is disabled.

TypeSyntaxDefaultExample
hashregex[]
proxy(optional)

Configures an HTTP(S) proxy for Vector to use. By default, the globally configured proxy is used.

TypeSyntaxDefaultExample
hashliteral[]
sqs(optional)

SQS strategy options. Required if strategy=sqs.

TypeSyntaxDefaultExample
hashliteral[]

How it Works

AWS authentication

Vector checks for AWS credentials in the following order:

  1. The access_key_id and secret_access_key options.
  2. The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.
  3. The credential_process command in the AWS config file (usually located at ~/.aws/config).
  4. The AWS credentials file (usually located at ~/.aws/credentials).
  5. The IAM instance profile (only works if running on an EC2 instance with an instance profile/role).

If no credentials are found, Vector's health check fails and an error is logged. If your AWS credentials expire, Vector will automatically search for up-to-date credentials in the places (and order) described above.

Handling events from the `aws_s3` source

This source behaves very similarly to the file source in that it will output one event per line (unless the multiline configuration option is used).

You will commonly want to use transforms to parse the data. For example, to parse VPC flow logs sent to S3 you can chain the tokenizer transform:

[transforms.flow_logs]
type = "tokenizer" # required
inputs = ["s3"]
field_names = ["version", "account_id", "interface_id", "srcaddr", "dstaddr", "srcport", "dstport", "protocol", "packets", "bytes", "start", "end", "action", "log_status"]

types.srcport = "int"
types.dstport = "int"
types.packets = "int"
types.bytes = "int"
types.start = "timestamp|%s"
types.end = "timestamp|%s"

To parse AWS load balancer logs, the regex_parser transform can be used:

[transforms.elasticloadbalancing_fields_parsed]
type = "regex_parser"
inputs = ["s3"]
regex = '(?x)^
		(?P<type>[\w]+)[ ]
		(?P<timestamp>[\w:.-]+)[ ]
		(?P<elb>[^\s]+)[ ]
		(?P<client_host>[\d.:-]+)[ ]
		(?P<target_host>[\d.:-]+)[ ]
		(?P<request_processing_time>[\d.-]+)[ ]
		(?P<target_processing_time>[\d.-]+)[ ]
		(?P<response_processing_time>[\d.-]+)[ ]
		(?P<elb_status_code>[\d-]+)[ ]
		(?P<target_status_code>[\d-]+)[ ]
		(?P<received_bytes>[\d-]+)[ ]
		(?P<sent_bytes>[\d-]+)[ ]
		"(?P<request_method>[\w-]+)[ ]
		(?P<request_url>[^\s]+)[ ]
		(?P<request_protocol>[^"\s]+)"[ ]
		"(?P<user_agent>[^"]+)"[ ]
		(?P<ssl_cipher>[^\s]+)[ ]
		(?P<ssl_protocol>[^\s]+)[ ]
		(?P<target_group_arn>[\w.:/-]+)[ ]
		"(?P<trace_id>[^\s"]+)"[ ]
		"(?P<domain_name>[^\s"]+)"[ ]
		"(?P<chosen_cert_arn>[\w:./-]+)"[ ]
		(?P<matched_rule_priority>[\d-]+)[ ]
		(?P<request_creation_time>[\w.:-]+)[ ]
		"(?P<actions_executed>[\w,-]+)"[ ]
		"(?P<redirect_url>[^"]+)"[ ]
		"(?P<error_reason>[^"]+)"'
field = "message"
drop_failed = false

types.received_bytes = "int"
types.request_processing_time = "float"
types.sent_bytes = "int"
types.target_processing_time = "float"
types.response_processing_time = "float"

[transforms.elasticloadbalancing_url_parsed]
type = "regex_parser"
inputs = ["elasticloadbalancing_fields_parsed"]
regex = '^(?P<url_scheme>[\w]+)://(?P<url_hostname>[^\s:/?#]+)(?::(?P<request_port>[\d-]+))?-?(?:/(?P<url_path>[^\s?#]*))?(?P<request_url_query>\?[^\s#]+)?'
field = "request_url"
drop_failed = false

State

This component is stateless, meaning its behavior is consistent across each input.

Context

By default, the aws_s3 source augments events with helpful context keys.