Normalization
Normalization is the process of replacing dynamic parts of the string with placeholders.
Patterns
Pattern is a pair of a placeholder and an expression in a regular expression language.
Each pattern has a priority that determines the placeholder in case the element falls under several patterns at once.
Built-in patterns
We support a set of patterns out of the box.
| priority |
pattern id |
placeholder |
examples |
| 1 |
curly_bracketed |
<curly_bracketed> |
{a:"1",b:"2"} |
| 2 |
square_bracketed |
<square_bracketed> |
[bla1, bla2] |
| 3 |
parenthesized |
<parenthesized> |
(bla bla) |
| 4 |
double_quoted |
<double_quoted> |
"bla bla" """bla bla""" |
| 5 |
single_quoted |
<single_quoted> |
'bla bla' '''bla bla''' |
| 6 |
grave_quoted |
<grave_quoted> |
`bla bla` ```bla bla``` |
| 7 |
email |
<email> |
test@host1.host2.com |
| 8 |
url |
<url> |
https://some.host.com/page1?a=1 ws://some.host1.host2.net ftp://login:pass@serv.example.com:21/ |
| 9 |
host |
<host> |
www.weather.jp |
| 10 |
filepath |
<filepath> |
/home/user/photos |
| 11 |
uuid |
<uuid> |
7c1811ed-e98f-4c9c-a9f9-58c757ff494f |
| 12 |
hash |
<hash> |
48757ec9f04efe7faacec8722f3476339b125a6b6172b8a69ff3aa329e0bd0ff a94a8fe5ccb19ba61c4c0873d391e987982fbbd3 098f6bcd4621d373cade4e832627b4f6 |
| 13 |
datetime |
<datetime> |
2025-01-13T10:20:40.999999Z 2025-01-13T10:20:40+04:00 2025-01-13 10:20:40 2025-01-13 10:20:40 |
| 14 |
ip |
<ip> |
IPv4: 1.2.3.4 IPv6: 2001:db8:3333:4444:5555:6666:1.2.3.4 |
| 15 |
duration |
<duration> |
-1m5s 1w2d3h4m5s6ms7us8ns |
| 16 |
hex |
<hex> |
0x13eb85e69dfbc0758b12acdaae36287d 0X553026A59C |
| 17 |
float |
<float> |
100.23 -4.56 |
| 18 |
int |
<int> |
100 -200 |
| 19 |
bool |
<bool> |
TRUE false |
Limitations of the RE language
We use the lexmachine package to search for tokens according to the described patterns (lexical analysis).
This package doesn't support the full syntax of the RE language. For more information, see readme section and grammar file.