AWS S3
The AWS S3 source requires a SQS queue configured to receive S3 bucket notifications for the desired S3 buckets.
Configuration Options
Required Options
region(required)
The AWS region of the target service. If endpoint
is provided it will override this value since the endpoint includes the region.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | ["us-east-1"] |
type(required)
The component type. This is a required field for all components and tells Vector which component to use.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | ["aws_s3"] |
Advanced Options
auth(optional)
Options for the authentication strategy.
Type | Syntax | Default | Example |
---|---|---|---|
hash | literal | [] |
endpoint(optional)
Custom endpoint for use with AWS-compatible services. Providing a value for this option will make region
moot.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | ["127.0.0.0:5000/path/to/service"] |
strategy(optional)
The strategy to use to consume objects from AWS S3.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | sqs |
compression(optional)
The compression format of the S3 objects..
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | text |
multiline(optional)
Multiline parsing configuration. If not specified, multiline parsing is disabled.
Type | Syntax | Default | Example |
---|---|---|---|
hash | regex | [] |
proxy(optional)
Configures an HTTP(S) proxy for Vector to use. By default, the globally configured proxy is used.
Type | Syntax | Default | Example |
---|---|---|---|
hash | literal | [] |
sqs(optional)
SQS strategy options. Required if strategy=sqs
.
Type | Syntax | Default | Example |
---|---|---|---|
hash | literal | [] |
How it Works
AWS authentication
Vector checks for AWS credentials in the following order:
- The
access_key_id
andsecret_access_key
options. - The
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables. - The
credential_process
command in the AWS config file (usually located at~/.aws/config
). - The AWS credentials file (usually located at
~/.aws/credentials
). - The IAM instance profile (only works if running on an EC2 instance with an instance profile/role).
If no credentials are found, Vector's health check fails and an error is logged. If your AWS credentials expire, Vector will automatically search for up-to-date credentials in the places (and order) described above.
Handling events from the `aws_s3` source
This source behaves very similarly to the file
source in that
it will output one event per line (unless the multiline
configuration option is used).
You will commonly want to use transforms to
parse the data. For example, to parse VPC flow logs sent to S3 you can
chain the tokenizer
transform:
[transforms.flow_logs]
type = "tokenizer" # required
inputs = ["s3"]
field_names = ["version", "account_id", "interface_id", "srcaddr", "dstaddr", "srcport", "dstport", "protocol", "packets", "bytes", "start", "end", "action", "log_status"]
types.srcport = "int"
types.dstport = "int"
types.packets = "int"
types.bytes = "int"
types.start = "timestamp|%s"
types.end = "timestamp|%s"
To parse AWS load balancer logs, the regex_parser
transform can be used:
[transforms.elasticloadbalancing_fields_parsed]
type = "regex_parser"
inputs = ["s3"]
regex = '(?x)^
(?P<type>[\w]+)[ ]
(?P<timestamp>[\w:.-]+)[ ]
(?P<elb>[^\s]+)[ ]
(?P<client_host>[\d.:-]+)[ ]
(?P<target_host>[\d.:-]+)[ ]
(?P<request_processing_time>[\d.-]+)[ ]
(?P<target_processing_time>[\d.-]+)[ ]
(?P<response_processing_time>[\d.-]+)[ ]
(?P<elb_status_code>[\d-]+)[ ]
(?P<target_status_code>[\d-]+)[ ]
(?P<received_bytes>[\d-]+)[ ]
(?P<sent_bytes>[\d-]+)[ ]
"(?P<request_method>[\w-]+)[ ]
(?P<request_url>[^\s]+)[ ]
(?P<request_protocol>[^"\s]+)"[ ]
"(?P<user_agent>[^"]+)"[ ]
(?P<ssl_cipher>[^\s]+)[ ]
(?P<ssl_protocol>[^\s]+)[ ]
(?P<target_group_arn>[\w.:/-]+)[ ]
"(?P<trace_id>[^\s"]+)"[ ]
"(?P<domain_name>[^\s"]+)"[ ]
"(?P<chosen_cert_arn>[\w:./-]+)"[ ]
(?P<matched_rule_priority>[\d-]+)[ ]
(?P<request_creation_time>[\w.:-]+)[ ]
"(?P<actions_executed>[\w,-]+)"[ ]
"(?P<redirect_url>[^"]+)"[ ]
"(?P<error_reason>[^"]+)"'
field = "message"
drop_failed = false
types.received_bytes = "int"
types.request_processing_time = "float"
types.sent_bytes = "int"
types.target_processing_time = "float"
types.response_processing_time = "float"
[transforms.elasticloadbalancing_url_parsed]
type = "regex_parser"
inputs = ["elasticloadbalancing_fields_parsed"]
regex = '^(?P<url_scheme>[\w]+)://(?P<url_hostname>[^\s:/?#]+)(?::(?P<request_port>[\d-]+))?-?(?:/(?P<url_path>[^\s?#]*))?(?P<request_url_query>\?[^\s#]+)?'
field = "request_url"
drop_failed = false
State
This component is stateless, meaning its behavior is consistent across each input.
Context
By default, the aws_s3
source augments events with helpful
context keys.