File
The vector process must have the ability to read the files
listed in include and execute any of the parent directories
for these files. Please see File
permissions for more details.
Example Configuration
Apache Access Log
1[sources.my_source_id]
2type = "file"
3include = [ "/var/log/**/*.log" ]
1"53.126.150.246 - - [01/Oct/2020:11:25:58 -0400] \"GET /disintermediate HTTP/2.0\" 401 20308"
1{
2 "log": {
3 "file": "/var/log/apache/access.log",
4 "host": "my-host.local",
5 "message": "53.126.150.246 - - [01/Oct/2020:11:25:58 -0400] \"GET /disintermediate HTTP/2.0\" 401 20308",
6 "timestamp": "2020-10-10T17:07:36.452332Z"
7 }
8}
Configuration Options
Required Options
include(required)
Array of file patterns to include. Globbing is supported.
| Type | Syntax | Default | Example |
|---|---|---|---|
| array | literal | ["/var/log/**/*.log"] |
type(required)
The component type. This is a required field for all components and tells Vector which component to use.
| Type | Syntax | Default | Example |
|---|---|---|---|
| string | literal | ["file"] |
Advanced Options
acknowledgements(optional)
Controls if the source will wait for destination sinks to deliver the events before acknowledging receipt.
| Type | Syntax | Default | Example |
|---|---|---|---|
| bool |
exclude(optional)
Array of file patterns to exclude. Globbing is supported.Takes precedence over the include option.
| Type | Syntax | Default | Example |
|---|---|---|---|
| array | literal | ["/var/log/binary-file.log"] |
file_key(optional)
The key name added to each event with the full path of the file.
| Type | Syntax | Default | Example |
|---|---|---|---|
| string | literal | file | ["file"] |
fingerprint(optional)
Configuration for how the file source should identify files.
| Type | Syntax | Default | Example |
|---|---|---|---|
| hash | literal | [] |
glob_minimum_cooldown_ms(optional)
Delay between file discovery calls. This controls the interval at which Vector searches for files. Higher value result in greater chances of some short living files being missed between searches, but lower value increases the performance impact of file discovery.
| Type | Syntax | Default | Example |
|---|---|---|---|
| uint | 1000 |
host_key(optional)
The key name added to each event representing the current host. This can also be globally set via the
global host_key option.
| Type | Syntax | Default | Example |
|---|---|---|---|
| string | literal | host |
ignore_not_found(optional)
Ignore missing files when fingerprinting. This may be useful when used with source directories containing dangling symlinks.
| Type | Syntax | Default | Example |
|---|---|---|---|
| bool |
ignore_older_secs(optional)
Ignore files with a data modification date older than the specified number of seconds.
| Type | Syntax | Default | Example |
|---|---|---|---|
| uint | [600] |
line_delimiter(optional)
String sequence used to separate one file line from another
| Type | Syntax | Default | Example |
|---|---|---|---|
| string | literal | ["\r\n"] |
max_line_bytes(optional)
The maximum number of a bytes a line can contain before being discarded. This protects against malformed lines or tailing incorrect files.
| Type | Syntax | Default | Example |
|---|---|---|---|
| uint | 102400 |
max_read_bytes(optional)
An approximate limit on the amount of data read from a single file at a given time.
| Type | Syntax | Default | Example |
|---|---|---|---|
| uint | [2048] |
oldest_first(optional)
Instead of balancing read capacity fairly across all watched files, prioritize draining the oldest files before moving on to read data from younger files.
| Type | Syntax | Default | Example |
|---|---|---|---|
| bool |
remove_after_secs(optional)
Timeout from reaching eof after which file will be removed from filesystem, unless new data is written in the meantime. If not specified, files will not be removed.
| Type | Syntax | Default | Example |
|---|---|---|---|
| uint | [0,5,60] |
read_from(optional)
In the absence of a checkpoint, this setting tells Vector where to start reading files that are present at startup.
| Type | Syntax | Default | Example |
|---|---|---|---|
| string | literal | beginning |
multiline(optional)
Multiline parsing configuration. If not specified, multiline parsing is disabled.
| Type | Syntax | Default | Example |
|---|---|---|---|
| hash | regex | [] |
data_dir(optional)
The directory used to persist file checkpoint positions. By default, the global data_dir option is used. Please make sure the Vector project has write permissions to this dir.
| Type | Syntax | Default | Example |
|---|---|---|---|
| string | file_system_path | ["/var/lib/vector"] |
encoding(optional)
Configures the encoding specific source behavior.
| Type | Syntax | Default | Example |
|---|---|---|---|
| hash | literal | [] |
ignore_checkpoints(optional)
This causes Vector to ignore existing checkpoints when determining where to start reading a file. Checkpoints are still written normally.
| Type | Syntax | Default | Example |
|---|---|---|---|
| bool |
How it Works
Autodiscovery
Vector will continually look for new files matching any of your
include patterns. The frequency is controlled via the
glob_minimum_cooldown option. If a new file is added that matches
any of the supplied patterns, Vector will begin tailing it. Vector
maintains a unique list of files and will not tail a file more than
once, even if it matches multiple patterns. You can read more about
how we identify files in the Identification section.
Compressed Files
Vector will transparently detect files which have been compressed using Gzip and decompress them for reading. This detection process looks for the unique sequence of bytes in the Gzip header and does not rely on the compressed files adhering to any kind of naming convention.
One caveat with reading compressed files is that Vector is not able to efficiently seek into them. Rather than implement a potentially-expensive full scan as a seek mechanism, Vector currently will not attempt to make further reads from a file for which it has already stored a checkpoint in a previous run. For this reason, users should take care to allow Vector to fully process any compressed files before shutting the process down or moving the files to another location on disk.
File Deletion
When a watched file is deleted, Vector will maintain its open file
handle and continue reading until it reaches EOF. When a file is
no longer findable in the includes option and the reader has
reached EOF, that file's reader is discarded.
File Read Order
By default, Vector attempts to allocate its read bandwidth fairly across all of the files it's currently watching. This prevents a single very busy file from starving other independent files from being read. In certain situations, however, this can lead to interleaved reads from files that should be read one after the other.
For example, consider a service that logs to timestamped file, creating a new one at an interval and leaving the old one as-is. Under normal operation, Vector would follow writes as they happen to each file and there would be no interleaving. In an overload situation, however, Vector may pick up and begin tailing newer files before catching up to the latest writes from older files. This would cause writes from a single logical log stream to be interleaved in time and potentially slow down ingestion as a whole, since the fixed total read bandwidth is allocated across an increasing number of files.
To address this type of situation, Vector provides the
oldest_first option. When set, Vector will not read from any file
younger than the oldest file that it hasn't yet caught up to. In
other words, Vector will continue reading from older files as long
as there is more data to read. Only once it hits the end will it
then move on to read from younger files.
Whether or not to use the oldest_first flag depends on the
organization of the logs you're configuring Vector to tail. If your
include option contains multiple independent logical log streams
(e.g. Nginx's access.log and error.log, or logs from multiple
services), you are likely better off with the default behavior. If
you're dealing with a single logical log stream or if you value
per-stream ordering over fairness across streams, consider setting
the oldest_first option to true.
File Rotation
Vector supports tailing across a number of file rotation strategies.
The default behavior of logrotate is simply to move the old log
file and create a new one. This requires no special configuration of
Vector, as it will maintain its open file handle to the rotated log
until it has finished reading and it will find the newly created
file normally.
A popular alternative strategy is copytruncate, in which
logrotate will copy the old log file to a new location before
truncating the original. Vector will also handle this well out of
the box, but there are a couple configuration options that will help
reduce the very small chance of missed data in some edge cases. We
recommend a combination of delaycompress (if applicable) on the
logrotate side and including the first rotated file in Vector's
include option. This allows Vector to find the file after rotation,
read it uncompressed to identify it, and then ensure it has all of
the data, including any written in a gap between Vector's last read
and the actual rotation event.
Fingerprinting
By default, Vector identifies files by running a cyclic redundancy
check (CRC) on the first N lines of the file. This serves as a
fingerprint that uniquely identifies the file. The number of lines, N, that are
read can be set using the fingerprint.lines and
fingerprint.ignored_header_bytes options.
This strategy avoids the common pitfalls associated with using device and inode names since inode names can be reused across files. This enables Vector to properly tail files across various rotation strategies.
Globbing
Globbing is supported in all provided file paths,
files will be autodiscovered continually at a rate defined by the
glob_minimum_cooldown option.
Line Delimiters
Each line is read until a new line delimiter (by default, i.e.
the 0xA byte) or EOF is found. If needed, the default line
delimiter can be overriden via the line_delimiter option.
Multiline Messages
Sometimes a single log event will appear as multiple log lines. To
handle this, Vector provides a set of multiline options. These
options were carefully thought through and will allow you to solve the
simplest and most complex cases. Let's look at a few examples:
File permissions
To be able to source events from the files, Vector must be able to read the files and execute their parent directories.
If you have deployed Vector as using one our distributed
packages, then you will find Vector running as the vector
user. You should ensure this user has read access to the desired
files used as include. Strategies for this include:
-
Create a new unix group, make it the group owner of the target files, with read access, and add
vectorto that group -
Use POSIX ACLs to grant access to the files to the
vectoruser -
Grant the
CAP_DAC_READ_SEARCHLinux capability. This capability bypasses the file system permissions checks to allow Vector to read any file. This is not recommended as it gives Vector more permissions than it requires, but it is recommended over running Vector asrootwhich would grant it even broader permissions. This can be granted via SystemD by creating an override file usingsystemctl edit vectorand adding:AmbientCapabilities=CAP_DAC_READ_SEARCH CapabilityBoundingSet=CAP_DAC_READ_SEARCH
On Debian-based distributions, the vector user is
automatically added to the adm
group, if it exists, which has
permissions to read /var/log.
State
This component is stateless, meaning its behavior is consistent across each input.
Checkpointing
Vector checkpoints the current read position after each
successful read. This ensures that Vector resumes where it left
off if restarted, preventing data from being read twice. The
checkpoint positions are stored in the data directory which is
specified via the global data_dir option, but can be overridden
via the data_dir option in the file source directly.
Read Position
By default, Vector will read from the beginning of newly discovered
files. You can change this behavior by setting the read_from option to
"end".
Previously discovered files will be checkpointed, and
the read position will resume from the last checkpoint. To disable this
behavior, you can set the ignore_checkpoints option to true. This
will cause Vector to disregard existing checkpoints when determining the
starting read position of a file.
Context
By default, the file source augments events with helpful
context keys.