Regex Parser
This transform has been deprecated in favor of the remap
transform, which enables you to use Vector Remap Language (VRL for short) to
create transform logic of any degree of complexity. The examples below show how you can use VRL to
replace this transform's functionality.
.message = parse_regex(.message, r'(?P<number>.*?) group')
Example Configuration
Syslog 5424
1[transforms.my_transform_id]
2type = "regex_parser"
3patterns = [
4 "^(?P<host>[\\w\\.]+) - (?P<user>[\\w]+) (?P<bytes_in>[\\d]+) \\[(?P<timestamp>.*)\\] \"(?P<method>[\\w]+) (?P<path>.*)\" (?P<status>[\\d]+) (?P<bytes_out>[\\d]+)$"
5]
6field = "message"
7
8 [transforms.my_transform_id.types]
9 bytes_in = "int"
10 timestamp = "timestamp|%d/%m/%Y:%H:%M:%S %z"
11 status = "int"
12 bytes_out = "int"
1{
2 "log": {
3 "message": "5.86.210.12 - zieme4647 5667 [19/06/2019:17:20:49 -0400] \"GET /embrace/supply-chains/dynamic/vertical\" 201 20574"
4 }
5}
1{
2 "log": {
3 "bytes_in": 5667,
4 "host": "5.86.210.12",
5 "user_id": "zieme4647",
6 "timestamp": "2019-06-19T17:20:49-0400",
7 "method": "GET",
8 "path": "/embrace/supply-chains/dynamic/vertical",
9 "status": 201,
10 "bytes_out": 20574
11 }
12}
Configuration Options
Required Options
patterns(required)
The Regular Expressions to apply. Do not include the leading or trailing /
in any of the expressions.
Type | Syntax | Default | Example |
---|---|---|---|
array | literal | ["^(?P<timestamp>[\\\\w\\\\-:\\\\+]+) (?P<level>\\\\w+) (?P<message>.*)$"] |
inputs(required)
A list of upstream source or transform
IDs. Wildcards (*
) are supported.
See configuration for more info.
Type | Syntax | Default | Example |
---|---|---|---|
array | literal | ["my-source-or-transform-id","prefix-*"] |
type(required)
The component type. This is a required field for all components and tells Vector which component to use.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | ["regex_parser"] |
Advanced Options
drop_failed(optional)
If the event should be dropped if parsing fails.
Type | Syntax | Default | Example |
---|---|---|---|
bool |
drop_field(optional)
If the specified field
should be dropped (removed) after parsing.
Type | Syntax | Default | Example |
---|---|---|---|
bool |
field(optional)
The log field to parse.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | message | ["message","parent.child"] |
overwrite_target(optional)
If target_field
is set and the log contains a field of the same name as the target, it will only be overwritten if this is set to true
.
Type | Syntax | Default | Example |
---|---|---|---|
bool |
target_field(optional)
If this setting is present, the parsed fields will be inserted into the log as a sub-object with this name. If a field with the same name already exists, the parser will fail and produce an error.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | ["root_field","parent.child"] |
timezone(optional)
The name of the time zone to apply to timestamp conversions that do not contain an explicit time
zone. This overrides the global timezone
option.
The time zone name may be any name in the TZ database, or local
to
indicate system local time.
Type | Syntax | Default | Example |
---|---|---|---|
string | literal | local | ["local","America/NewYork","EST5EDT"] |
types(optional)
Key/value pairs representing mapped log field names and types. This is used to coerce log fields from strings into their proper types. The available types are listed in the Types list below.
Timestamp coercions need to be prefaced with timestamp|
, for example
"timestamp|%F"
. Timestamp specifiers can use either of the following:
- One of the built-in-formats listed in the Timestamp Formats table below.
- The time format specifiers from Rust's
chrono
library.
Types
array
bool
bytes
float
int
map
null
timestamp
(see the table below for formats)
Timestamp Formats
Format | Description | Example |
---|---|---|
%F %T | YYYY-MM-DD HH:MM:SS | 2020-12-01 02:37:54 |
%v %T | DD-Mmm-YYYY HH:MM:SS | 01-Dec-2020 02:37:54 |
%FT%T | ISO 8601[RFC 3339](https://tools.ietf.org/html/rfc3339) format without time zone | 2020-12-01T02:37:54 |
%a, %d %b %Y %T | RFC 822/2822 without time zone | Tue, 01 Dec 2020 02:37:54 |
%a %d %b %T %Y | date command output without time zone | Tue 01 Dec 02:37:54 2020 |
%a %b %e %T %Y | ctime format | Tue Dec 1 02:37:54 2020 |
%s | UNIX timestamp | 1606790274 |
%FT%TZ | ISO 8601/RFC 3339 UTC | 2020-12-01T09:37:54Z |
%+ | ISO 8601/RFC 3339 UTC with time zone | 2020-12-01T02:37:54-07:00 |
%a %d %b %T %Z %Y | date command output with time zone | Tue 01 Dec 02:37:54 PST 2020 |
%a %d %b %T %z %Y | date command output with numeric time zone | Tue 01 Dec 02:37:54 -0700 2020 |
%a %d %b %T %#z %Y | date command output with numeric time zone (minutes can be missing or present) | Tue 01 Dec 02:37:54 -07 2020 |
Note: the examples in this table are for 54 seconds after 2:37 am on December 1st, 2020 in Pacific Standard Time.
Type | Syntax | Default | Example |
---|---|---|---|
hash | [{"status":"int","duration":"float","success":"bool","timestamp_iso8601":"timestamp|%F","timestamp_custom":"timestamp|%a %b %e %T %Y","timestamp_unix":"timestamp|%F %T","parent":{"child":"int"}}] |
How it Works
Failed Parsing
By default, if the input message text does not match any of the configured regular expression patterns, this transform will log an error message but leave the log event unchanged. If you instead wish to have this transform drop the event, set drop_failed = true
.
Flags
Regex flags can be toggled with the (?flags)
syntax. The available flags are:
Flag | Descriuption |
---|---|
i | case-insensitive: letters match both upper and lower case |
m | multi-line mode: ^ and $ match begin/end of line |
s | allow . to match \n |
U | swap the meaning of x* and x*? |
u | Unicode support (enabled by default) |
x | ignore whitespace and allow line comments (starting with # ) |
For example, to enable the case-insensitive flag you can write:
(?i)Hello world
More info can be found in the Regex grouping and flags documentation.
Named Captures
You can name Regex captures with the <name>
syntax. For example:
^(?P<timestamp>\w*) (?P<level>\w*) (?P<message>.*)$
Will capture timestamp
, level
, and message
. All values are extracted as
string
values and must be coerced with the types
table.
More info can be found in the Regex grouping and flags documentation.
Regex Debugger
If you are having difficulty with your regular expression not matching text, you may try debugging your patterns at Regex 101. This site includes a regular expression tester and debugger. The regular expression engine used by Vector is most similar to the "Go" implementation, so make sure that is selected in the "Flavor" menu.
State
This component is stateless, meaning its behavior is consistent across each input.
Regex Syntax
Vector uses the Rust standard regular expression engine for pattern matching. Its syntax shares most of the features of Perl-style regular expressions, with a few exceptions. You can find examples of patterns in the Rust regex module documentation.