GCP Cloud Storage (GCS)

Configuration Options

Required Options

bucket(required)

The GCS bucket name.

TypeSyntaxDefaultExample
stringliteral["my-bucket"]
inputs(required)

A list of upstream source or transform IDs. Wildcards (*) are supported.

See configuration for more info.

TypeSyntaxDefaultExample
arrayliteral["my-source-or-transform-id","prefix-*"]
encoding(required)

Configures the encoding specific sink behavior.

TypeSyntaxDefaultExample
hashliteral[]
type(required)

The component type. This is a required field for all components and tells Vector which component to use.

TypeSyntaxDefaultExample
stringliteral["gcp_cloud_storage"]

Advanced Options

acl(optional)

Predefined ACL to apply to the created objects. For more information, see Predefined ACLs. If this is not set, GCS will apply a default ACL when the object is created.

TypeSyntaxDefaultExample
stringliteral
credentials_path(optional)

The filename for a Google Cloud service account credentials JSON file used to authenticate access to the Cloud Storage API. If this is unset, Vector checks the GOOGLE_APPLICATION_CREDENTIALS environment variable for a filename.

If no filename is named, Vector will attempt to fetch an instance service account for the compute instance the program is running on. If Vector is not running on a GCE instance, you must define a credentials file as above.

TypeSyntaxDefaultExample
stringliteral["/path/to/credentials.json"]
filename_append_uuid(optional)

Whether or not to append a UUID v4 token to the end of the file. This ensures there are no name collisions high volume use cases.

TypeSyntaxDefaultExample
bool
filename_extension(optional)

The filename extension to use in the object name.

TypeSyntaxDefaultExample
stringliterallog
filename_time_format(optional)

The format of the resulting object file name. strftime specifiers are supported.

TypeSyntaxDefaultExample
stringliteral%s
key_prefix(optional)

A prefix to apply to all object key names. This should be used to partition your objects, and it's important to end this value with a / if you want this to be the root GCS "folder".

TypeSyntaxDefaultExample
stringtemplatedate=%F/["date=%F/","date=%F/hour=%H/","year=%Y/month=%m/day=%d/","application_id={{ application_id }}/date=%F/"]
metadata(optional)

The set of metadata key:value pairs for the created objects. See the GCS custom metadata documentation for more details.

TypeSyntaxDefaultExample
stringliteral[]
buffer(optional)

Configures the sink specific buffer behavior.

TypeSyntaxDefaultExample
hashliteral[]
batch(optional)

Configures the sink batching behavior.

TypeSyntaxDefaultExample
hash[]
compression(optional)

The compression strategy used to compress the encoded event data before transmission.

Some cloud storage API clients and browsers will handle decompression transparently, so files may not always appear to be compressed depending how they are accessed.

TypeSyntaxDefaultExample
stringliteralnone
healthcheck(optional)

Health check options for the sink.

TypeSyntaxDefaultExample
hash[]
request(optional)

Configures the sink request behavior.

TypeSyntaxDefaultExample
hash[]
tls(optional)

Configures the TLS options for incoming connections.

TypeSyntaxDefaultExample
hashliteral[]
proxy(optional)

Configures an HTTP(S) proxy for Vector to use. By default, the globally configured proxy is used.

TypeSyntaxDefaultExample
hashliteral[]
storage_class(optional)

The storage class for the created objects. See the GCP storage classes for more details.

TypeSyntaxDefaultExample
stringliteral

How it Works

Object access control list (ACL)

GCP Cloud Storage supports access control lists (ACL) for buckets and objects. In the context of Vector, only object ACLs are relevant (Vector does not create or modify buckets). You can set the object level ACL by using the acl option, which allows you to set one of the predefined ACLs on each created object.

Object naming

By default, Vector names your GCS objects in accordance with one of two formats.

If compression is enabled, this format is used:

key_prefix><timestamp>-<uuidv4>.log.gz

Here's an example name in the compression-enabled format:

date=2019-06-18/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log.gz

If compression is not enabled, this format is used:

<key_prefix><timestamp>-<uuidv4>.log

Here's an example name in the compression-disabled format:

date=2019-06-18/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log

Vector appends a UUIDV4 token to ensure there are no name conflicts in the unlikely event that two Vector instances are writing data at the same time.

You can control the resulting name via the key_prefix, filename_time_format, and filename_append_uuid options.

Storage Class

GCS offers storage classes. You can apply defaults, and rules, at the bucket level or set the storage class at the object level. In the context of Vector only the object level is relevant (Vector does not create or modify buckets). You can set the storage class via the storage_class option.

State

This component is stateless, meaning its behavior is consistent across each input.

GCP Authentication

GCP offers a variety of authentication methods and Vector is concerned with the server to server methods and will find credentials in the following order:

  1. If the credentials_path option is set.
  2. If the api_key option is set.
  3. If the GOOGLE_APPLICATION_CREDENTIALS envrionment variable is set.
  4. Finally, Vector will check for an instance service account.

If credentials aren't found, Vector's health checks fail and an error is logged.

Health checks

Health checks ensure that the downstream service is accessible and ready to accept data. This check is performed upon sink initialization. If the health check fails an error will be logged and Vector will proceed to start.

Partitioning

Vector supports dynamic configuration values through a simple template syntax. If an option supports templating, it will be noted with a badge and you can use event fields to create dynamic values. For example:

[sinks.my-sink]
dynamic_option = "application={{ application_id }}"

In the above example, the application_id for each event will be used to partition outgoing data.

Rate limits & adapative concurrency

Transport Layer Security (TLS)

Vector uses OpenSSL for TLS protocols due to OpenSSL's maturity. You can enable and adjust TLS behavior using the tls.* options.

Buffers and batches

This component buffers & batches data as shown in the diagram above. You'll notice that Vector treats these concepts differently, instead of treating them as global concepts, Vector treats them as sink specific concepts. This isolates sinks, ensuring services disruptions are contained and delivery guarantees are honored.

Batches are flushed when 1 of 2 conditions are met:

  1. The batch age meets or exceeds the configured timeout_secs.
  2. The batch size meets or exceeds the configured max_size or max_events.

Buffers are controlled via the buffer.* options.

Tags & Metadata

Vector supports adding custom metadata to created objects. These metadata items are a way of associating extra data items with the object that are not part of the uploaded data.

Retry policy

Vector will retry failed requests (status == 429, >= 500, and != 501). Other responses will not be retried. You can control the number of retry attempts and backoff rate with the request.retry_attempts and request.retry_backoff_secs options.