Azure Blob Storage

Configuration Options

Required Options

connection_string(required)

The Azure Blob Storage Account connection string. Only authentication with access key supported.

TypeSyntaxDefaultExample
stringliteral["DefaultEndpointsProtocol=https;AccountName=mylogstorage;AccountKey=storageaccountkeybase64encoded;EndpointSuffix=core.windows.net"]
container_name(required)

The Azure Blob Storage Account container name.

TypeSyntaxDefaultExample
stringliteral["my-logs"]
inputs(required)

A list of upstream source or transform IDs. Wildcards (*) are supported.

See configuration for more info.

TypeSyntaxDefaultExample
arrayliteral["my-source-or-transform-id","prefix-*"]
encoding(required)

Configures the encoding specific sink behavior.

TypeSyntaxDefaultExample
hashliteral[]
type(required)

The component type. This is a required field for all components and tells Vector which component to use.

TypeSyntaxDefaultExample
stringliteral["azure_blob"]

Advanced Options

blob_prefix(optional)

A prefix to apply to all object key names. This should be used to partition your objects, and it's important to end this value with a / if you want this to be the root azure storage "folder".

TypeSyntaxDefaultExample
stringtemplateblob/%F/["date/%F/","date/%F/hour/%H/","year=%Y/month=%m/day=%d/","kubernetes/{{ metadata.cluster }}/{{ metadata.application_name }}/"]
blob_append_uuid(optional)

Whether or not to append a UUID v4 token to the end of the file. This ensures there are no name collisions high volume use cases.

TypeSyntaxDefaultExample
bool
buffer(optional)

Configures the sink specific buffer behavior.

TypeSyntaxDefaultExample
hashliteral[]
batch(optional)

Configures the sink batching behavior.

TypeSyntaxDefaultExample
hash[]
compression(optional)

The compression strategy used to compress the encoded event data before transmission.

Some cloud storage API clients and browsers will handle decompression transparently, so files may not always appear to be compressed depending how they are accessed.

TypeSyntaxDefaultExample
stringliteralgzip
healthcheck(optional)

Health check options for the sink.

TypeSyntaxDefaultExample
hash[]
request(optional)

Configures the sink request behavior.

TypeSyntaxDefaultExample
hash[]
blob_time_format(optional)

The format of the resulting object file name. strftime specifiers are supported.

TypeSyntaxDefaultExample
stringstrftime%s

How it Works

Object naming

By default, Vector names your blobs different based on whether or not the blobs are compressed.

Here is the format without compression:

<key_prefix><timestamp>-<uuidv4>.log

Here's an example blob name without compression:

blob/2021-06-23/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log

And here is the format with compression:

<key_prefix><timestamp>-<uuidv4>.log.gz

An example blob name with compression:

blob/2021-06-23/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log.gz

Vector appends a UUIDV4 token to ensure there are no name conflicts in the unlikely event that two Vector instances are writing data at the same time.

You can control the resulting name via the blob_prefix, blob_time_format, and blob_append_uuid options.

State

This component is stateless, meaning its behavior is consistent across each input.

Health checks

Health checks ensure that the downstream service is accessible and ready to accept data. This check is performed upon sink initialization. If the health check fails an error will be logged and Vector will proceed to start.

Partitioning

Vector supports dynamic configuration values through a simple template syntax. If an option supports templating, it will be noted with a badge and you can use event fields to create dynamic values. For example:

[sinks.my-sink]
dynamic_option = "application={{ application_id }}"

In the above example, the application_id for each event will be used to partition outgoing data.

Rate limits & adapative concurrency

Buffers and batches

This component buffers & batches data as shown in the diagram above. You'll notice that Vector treats these concepts differently, instead of treating them as global concepts, Vector treats them as sink specific concepts. This isolates sinks, ensuring services disruptions are contained and delivery guarantees are honored.

Batches are flushed when 1 of 2 conditions are met:

  1. The batch age meets or exceeds the configured timeout_secs.
  2. The batch size meets or exceeds the configured max_size or max_events.

Buffers are controlled via the buffer.* options.

Retry policy

Vector will retry failed requests (status == 429, >= 500, and != 501). Other responses will not be retried. You can control the number of retry attempts and backoff rate with the request.retry_attempts and request.retry_backoff_secs options.