regex
Explore a comprehensive collection of regex patterns and examples for log parsing, data extraction, and more. Enhance your development workflow with practical regex solutions.
Regex Patterns and Examples
Regex Tools and Resources
Regular expressions (regex) are powerful tools for pattern matching and text manipulation. Below are some useful resources and examples to help you master regex.
Online Regex Testers
- Regex101.com - An interactive regex debugger and tester.
Regex Sources and Guides
- Fluentd Regex Discussion - A forum thread discussing regex patterns for log parsing.
- MDN Web Docs: Regular Expressions - Comprehensive guide to JavaScript regular expressions.
- Regular-Expressions.info - A vast resource for learning and referencing regular expressions.
Common Regex Patterns and Examples
Log Entry Example
A sample log line:
10.0.2.2 - - [19/Jul/2019 10:02:48] "GET /?ccnum=1234 HTTP/1.1" 200 -
Matching `ccnum` Value
To extract the value associated with `ccnum`:
ccnum=\d+
Matching Log Line Up to `ccnum=`
A pattern to capture the beginning of the log line until `ccnum=`:
\d+.\d+.\d+.\d+ .* \[\d{2}\/\w+\/\d{4}.*\d{2}:\d{2}:\d{2}\].*"\w+.*\/?ccnum=\d+
Detailed Log Entry Example
A more complex log entry:
2020-04-21 08:37:04 172.16.1.1 - - [21/Apr/2020:08:37:04 +0200] "POST /path?foo=bar HTTP/1.1" 200 540 "http://localhost/bar/" "Mozilla/5.0 (Linux; Android 10; One Build/One-Boo; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/81.0.4044.111 Mobile Safari/537.36" "1.1.1.1"
Extracting Date and Time
Patterns to capture date and time stamps:
.[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]
\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}
Matching IP Addresses
A common pattern for matching IPv4 addresses:
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Apache / Nginx Log Parsing
Example log lines and corresponding regex patterns for parsing:
172.128.80.109 - Bins5273 656 [2019-05-03T13:11:48-04:00] "PUT /mesh" 406 10272
^([\w\.]+) - ([\w]+) ([\d]+) \[(.*)\] "([\w]+) (.*)" ([\d]+) ([\d]+)$
127.0.0.1 - - [21/Apr/2020:11:47:07 +0000] "GET / HTTP/1.1" 200 612 "http://" "curl/7.58.0"
^([\w\.]+) ([^ ]*) ([^ ]*) \[(.*)\] "(\S+)(?: +([^ ]*) +\S*)?" ([\d]+) ([\d]+) "([^"]*)" "([^\"]*)"?
127.0.0.1 - - [21/Apr/2020:11:47:07 +0000] "GET / HTTP/1.1" 200 612 "http://" "curl/7.58.0"
^([\w\.]+) - ([^ ]*) \[(.*)\] "([^ ]*) ([^ ]*) ([^ ]*)" ([\d]+) ([\d]+) "([^"]*)" "([^\"]*)"?
127.0.0.1 - - [21/Apr/2020:11:47:07 +0000] "GET / HTTP/1.1" 200 612 "http://" "curl/7.58.0" "10.20.30.1"
^([\w\.]+) - ([^ ]*) \[(.*)\] "([^ ]*) ([^ ]*) ([^ ]*)" ([\d]+) ([\d]+) "([^"]*)" "([^\"]*)" "([\w\.]+)"?
Structured Logging with Named Capture Groups (Loki/Promtail)
A regex pattern using named capture groups for structured log parsing, suitable for tools like Loki and Promtail:
^(?P<remote_ip>[\w\.]+) - (?P<user>[^ ]*) \[(?P<timestamp>.*)\] "(?P<method>[^ ]*) (?P<request_url>[^ ]*) (?P<request_http_protocol>[^ ]*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+) "(?P<http_referer>[^"]*)" "(?P<user_agent>[^"]*)" "(?P<client_ip>[\w\.]+)"?
Extracting Content Within Brackets
To find content enclosed in square brackets:
this is [foo] bar
\(?\w+(?=\]):?
Using Positive Lookbehind
Extracting content after an opening bracket using positive lookbehind:
this is [foo] bar
(?<=\[)[\w+.-]*
Matching Up To a Specific String (Excluding It)
Match everything until `abc` without including `abc`:
/^(.*?)abc/
Matching Up To the First Number (Excluding It)
Match everything until the first digit:
/^(.*?)[0-9]/
Advanced Log Parsing Example (Loki)
A complex log line and a corresponding regex for parsing with Loki:
1.2.3.4 - - [23/Nov/2020:17:31:00 +0200] "POST /foo/bar?token=x.x HTTP/1.1" 201 83 "http://localhost/" "Mozilla/5.0 (Linux; Android 10; Nokia 6.1 Build/x.x.x; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/x.0.x.110 Mobile Safari/537.36" "1.2.3.4"
| regexp ""(?P<ip>\\d+.\\d+.\\d+.\\d+) (.*) (.*) (?P<date>\\[(.*)\\]) (\")(?P<verb>(\\w+)) (?P<request_path>([^\"]*)) (?P<http_ver>([^\"]*))(\") (?P<status_code>\\d+) (?P<bytes>\\d+) (\")(?P<referrer>(([^\"]*)))(\") (\")(?P<user_agent>(([^\"]*)))(\")"