Elasticsearch
Elasticsearch's Data streams feature requires Vector to be configured with the create
bulk_action
.
This is not enabled by default.
Configuration Options
Required Options
endpoint(required)
The Elasticsearch endpoint to send logs to. This should be the full URL as shown in the example.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | ["http://10.24.32.122:9000","https://example.com","https://user:password@example.com"] |
inputs(required)
A list of upstream source or transform
IDs. Wildcards (*
) are supported.
See configuration for more info.
Type | Syntax | Default | Example |
---|---|---|---|
array | literal | ["my-source-or-transform-id","prefix-*"] |
encoding(required)
Configures the encoding specific sink behavior.
Type | Syntax | Default | Example |
---|---|---|---|
hash | literal | [] |
type(required)
The component type. This is a required field for all components and tells Vector which component to use.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | ["elasticsearch"] |
Advanced Options
auth(optional)
Options for the authentication strategy.
Type | Syntax | Default | Example |
---|---|---|---|
hash | literal | [] |
aws(optional)
Options for the AWS connections.
Type | Syntax | Default | Example |
---|---|---|---|
hash | literal | [] |
bulk_action(optional)
Action to use when making requests to the Elasticsearch Bulk API.
Currently, Vector only supports index
and create
. update
and delete
actions are not supported.
Type | Syntax | Default | Example |
---|---|---|---|
string | template | index | ["index","create","{{ action }}"] |
data_stream(optional)
Options for the data stream mode.
Type | Syntax | Default | Example |
---|---|---|---|
hash | template | [] |
doc_type(optional)
The doc_type
for your index data. This is only relevant for Elasticsearch <= 6.X. If you are using >= 7.0 you do not need to set this option since Elasticsearch has removed it.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | _doc |
id_key(optional)
The name of the event key that should map to Elasticsearch's _id
field. By default, Vector does not set the _id
field, which allows Elasticsearch to set this automatically. You should think carefully about setting your own Elasticsearch IDs, since this can hinder perofrmance.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | ["id","_id"] |
index(optional)
Index name to write events to.
Type | Syntax | Default | Example |
---|---|---|---|
string | template | vector-%F | ["application-{{ application_id }}-%Y-%m-%d","vector-%Y-%m-%d"] |
metrics(optional)
Options for metrics.
Type | Syntax | Default | Example |
---|---|---|---|
hash | literal | [] |
mode(optional)
The type of index mechanism. If data_stream
mode is enabled, the bulk_action
is set to create
.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | normal | ["normal","data_stream"] |
pipeline(optional)
Name of the pipeline to apply.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | ["pipeline-name"] |
buffer(optional)
Configures the sink specific buffer behavior.
Type | Syntax | Default | Example |
---|---|---|---|
hash | literal | [] |
batch(optional)
Configures the sink batching behavior.
Type | Syntax | Default | Example |
---|---|---|---|
hash | [] |
compression(optional)
The compression strategy used to compress the encoded event data before transmission.
Some cloud storage API clients and browsers will handle decompression transparently, so files may not always appear to be compressed depending how they are accessed.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | none |
healthcheck(optional)
Health check options for the sink.
Type | Syntax | Default | Example |
---|---|---|---|
hash | [] |
request(optional)
Configures the sink request behavior.
Type | Syntax | Default | Example |
---|---|---|---|
hash | [] |
tls(optional)
Configures the TLS options for incoming connections.
Type | Syntax | Default | Example |
---|---|---|---|
hash | literal | [] |
proxy(optional)
Configures an HTTP(S) proxy for Vector to use. By default, the globally configured proxy is used.
Type | Syntax | Default | Example |
---|---|---|---|
hash | literal | [] |
query(optional)
Custom parameters to Elasticsearch query string.
Type | Syntax | Default | Example |
---|---|---|---|
hash | [{"X-Powered-By":"Vector"}] |
How it Works
Conflicts
Vector batches data and flushes it to Elasticsearch's
_bulk
API endpoint. By default, all events are
inserted via the index
action, which replaces documents if an existing
one has the same id
. If bulk_action
is configured with create
, Elasticsearch
does not replace an existing document and instead returns a conflict error.
Data streams
By default, Vector uses the index
action with Elasticsearch's Bulk API.
To use Data streams, set the mode
to
data_stream
. Use the combination of data_stream.type
, data_stream.dataset
and
data_stream.namespace
instead of index
.
Partial Failures
By default, Elasticsearch allows partial bulk ingestion failures. This is typically
due to Elasticsearch index mapping errors, where data keys aren't consistently
typed. To change this behavior, refer to the Elasticsearch ignore_malformed
setting.
State
This component is stateless, meaning its behavior is consistent across each input.
Health checks
Health checks ensure that the downstream service is accessible and ready to accept data. This check is performed upon sink initialization. If the health check fails an error will be logged and Vector will proceed to start.
Partitioning
Vector supports dynamic configuration values through a simple template syntax. If an option supports templating, it will be noted with a badge and you can use event fields to create dynamic values. For example:
[sinks.my-sink]
dynamic_option = "application={{ application_id }}"
In the above example, the application_id
for each event will be
used to partition outgoing data.
Rate limits & adapative concurrency
Transport Layer Security (TLS)
Buffers and batches
This component buffers & batches data as shown in the diagram above. You'll notice that Vector treats these concepts differently, instead of treating them as global concepts, Vector treats them as sink specific concepts. This isolates sinks, ensuring services disruptions are contained and delivery guarantees are honored.
Batches are flushed when 1 of 2 conditions are met:
- The batch age meets or exceeds the configured
timeout_secs
. - The batch size meets or exceeds the configured
max_size
ormax_events
.
Buffers are controlled via the buffer.*
options.
AWS authentication
Vector checks for AWS credentials in the following order:
- The
access_key_id
andsecret_access_key
options. - The
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables. - The
credential_process
command in the AWS config file (usually located at~/.aws/config
). - The AWS credentials file (usually located at
~/.aws/credentials
). - The IAM instance profile (only works if running on an EC2 instance with an instance profile/role).
If no credentials are found, Vector's health check fails and an error is logged. If your AWS credentials expire, Vector will automatically search for up-to-date credentials in the places (and order) described above.
Retry policy
Vector will retry failed requests (status == 429, >= 500, and != 501).
Other responses will not be retried. You can control the number of
retry attempts and backoff rate with the request.retry_attempts
and
request.retry_backoff_secs
options.