Elasticsearch

Requirements

Elasticsearch's Data streams feature requires Vector to be configured with the create bulk_action. This is not enabled by default.

Configuration Options

Required Options

endpoint(required)

The Elasticsearch endpoint to send logs to. This should be the full URL as shown in the example.

Type	Syntax	Default	Example
string	literal		["http://10.24.32.122:9000","https://example.com","https://user:password@example.com"]

inputs(required)

A list of upstream source or transform IDs. Wildcards (*) are supported.

See configuration for more info.

Type	Syntax	Default	Example
array	literal		["my-source-or-transform-id","prefix-*"]

encoding(required)

Configures the encoding specific sink behavior.

Type	Syntax	Default	Example
hash	literal		[]

type(required)

The component type. This is a required field for all components and tells Vector which component to use.

Type	Syntax	Default	Example
string	literal		["elasticsearch"]

Advanced Options

auth(optional)

Options for the authentication strategy.

Type	Syntax	Default	Example
hash	literal		[]

aws(optional)

Options for the AWS connections.

Type	Syntax	Default	Example
hash	literal		[]

bulk_action(optional)

Action to use when making requests to the Elasticsearch Bulk API. Currently, Vector only supports index and create. update and delete actions are not supported.

Type	Syntax	Default	Example
string	template	index	["index","create","{{ action }}"]

data_stream(optional)

Options for the data stream mode.

Type	Syntax	Default	Example
hash	template		[]

doc_type(optional)

The doc_type for your index data. This is only relevant for Elasticsearch <= 6.X. If you are using >= 7.0 you do not need to set this option since Elasticsearch has removed it.

Type	Syntax	Default	Example
string	literal	_doc

id_key(optional)

The name of the event key that should map to Elasticsearch's _id field. By default, Vector does not set the _id field, which allows Elasticsearch to set this automatically. You should think carefully about setting your own Elasticsearch IDs, since this can hinder perofrmance.

Type	Syntax	Default	Example
string	literal		["id","_id"]

index(optional)

Index name to write events to.

Type	Syntax	Default	Example
string	template	vector-%F	["application-{{ application_id }}-%Y-%m-%d","vector-%Y-%m-%d"]

metrics(optional)

Options for metrics.

Type	Syntax	Default	Example
hash	literal		[]

mode(optional)

The type of index mechanism. If data_stream mode is enabled, the bulk_action is set to create.

Type	Syntax	Default	Example
string	literal	normal	["normal","data_stream"]

pipeline(optional)

Name of the pipeline to apply.

Type	Syntax	Default	Example
string	literal		["pipeline-name"]

buffer(optional)

Configures the sink specific buffer behavior.

Type	Syntax	Default	Example
hash	literal		[]

batch(optional)

Configures the sink batching behavior.

Type	Syntax	Default	Example
hash			[]

compression(optional)

The compression strategy used to compress the encoded event data before transmission.

Some cloud storage API clients and browsers will handle decompression transparently, so files may not always appear to be compressed depending how they are accessed.

Type	Syntax	Default	Example
string	literal	none

healthcheck(optional)

Health check options for the sink.

Type	Syntax	Default	Example
hash			[]

request(optional)

Configures the sink request behavior.

Type	Syntax	Default	Example
hash			[]

tls(optional)

Configures the TLS options for incoming connections.

Type	Syntax	Default	Example
hash	literal		[]

proxy(optional)

Configures an HTTP(S) proxy for Vector to use. By default, the globally configured proxy is used.

Type	Syntax	Default	Example
hash	literal		[]

query(optional)

Custom parameters to Elasticsearch query string.

Type	Syntax	Default	Example
hash			[{"X-Powered-By":"Vector"}]

How it Works

Conflicts

Vector batches data and flushes it to Elasticsearch's _bulk API endpoint. By default, all events are inserted via the index action, which replaces documents if an existing one has the same id. If bulk_action is configured with create, Elasticsearch does not replace an existing document and instead returns a conflict error.

Data streams

By default, Vector uses the index action with Elasticsearch's Bulk API. To use Data streams, set the mode to data_stream. Use the combination of data_stream.type, data_stream.dataset and data_stream.namespace instead of index.

Partial Failures

By default, Elasticsearch allows partial bulk ingestion failures. This is typically due to Elasticsearch index mapping errors, where data keys aren't consistently typed. To change this behavior, refer to the Elasticsearch ignore_malformed setting.

State

This component is stateless, meaning its behavior is consistent across each input.

Health checks

Health checks ensure that the downstream service is accessible and ready to accept data. This check is performed upon sink initialization. If the health check fails an error will be logged and Vector will proceed to start.

Partitioning

Vector supports dynamic configuration values through a simple template syntax. If an option supports templating, it will be noted with a badge and you can use event fields to create dynamic values. For example:

[sinks.my-sink]
dynamic_option = "application={{ application_id }}"

In the above example, the application_id for each event will be used to partition outgoing data.

Rate limits & adapative concurrency

Transport Layer Security (TLS)

Vector uses OpenSSL for TLS protocols due to OpenSSL's maturity. You can enable and adjust TLS behavior using the tls.* options.

Buffers and batches

This component buffers & batches data as shown in the diagram above. You'll notice that Vector treats these concepts differently, instead of treating them as global concepts, Vector treats them as sink specific concepts. This isolates sinks, ensuring services disruptions are contained and delivery guarantees are honored.

Batches are flushed when 1 of 2 conditions are met:

The batch age meets or exceeds the configured timeout_secs.
The batch size meets or exceeds the configured max_size or max_events.

Buffers are controlled via the buffer.* options.

AWS authentication

Vector checks for AWS credentials in the following order:

The access_key_id and secret_access_key options.
The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.
The credential_process command in the AWS config file (usually located at ~/.aws/config).
The AWS credentials file (usually located at ~/.aws/credentials).
The IAM instance profile (only works if running on an EC2 instance with an instance profile/role).

If no credentials are found, Vector's health check fails and an error is logged. If your AWS credentials expire, Vector will automatically search for up-to-date credentials in the places (and order) described above.

Retry policy

Vector will retry failed requests (status == 429, >= 500, and != 501). Other responses will not be retried. You can control the number of retry attempts and backoff rate with the request.retry_attempts and request.retry_backoff_secs options.