regexp2

package module

v2.1.0 Latest Latest Go to latest Published: May 22, 2026 License: MIT Imports: 17 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/dlclark/regexp2

Links

Open Source Insights

README ¶

regexp2 - full featured regular expressions for Go

Regexp2 is a feature-rich RegExp engine for Go. It doesn't have constant time guarantees like the built-in regexp package, but it allows backtracking and is compatible with Perl5 and .NET. You'll likely be better off with the RE2 engine from the regexp package and should only use this if you need to write very complex patterns or require compatibility with .NET.

Basis of the engine

The engine is ported from the .NET framework's System.Text.RegularExpressions.Regex engine. That engine was open sourced in 2015 under the MIT license. There are some fundamental differences between .NET strings and Go strings that required a bit of borrowing from the Go framework regex engine as well. I cleaned up a couple of the dirtier bits during the port (regexcharclass.cs was terrible), but the parse tree, code emmitted, and therefore patterns matched should be identical.

New Code Generation

For extra performance use regexp2 with regexp2cg. It is a code generation utility for regexp2 and you can likely improve your regexp runtime performance by 3-10x in hot code paths. As always you should benchmark your specifics to confirm the results. Give it a try!

Installing

This is a go-gettable library, so install is easy:

go get github.com/dlclark/regexp2/v2@latest

Changes in v2

Version 2 includes changes that may affect compatibility with existing v1 users:

The module path is now github.com/dlclark/regexp2/v2, so imports need to use the /v2 suffix.
The minimum supported Go version is now Go 1.26.
Changes to support https://github.com/dlclark/regexp2cg are merged in to support generated regex engines.
Regexp.Split is now available for splitting strings with regexp matches.
The new compat sub-package provides a regexp compatibility adapter with the same Find* and Match* method signatures as regexp.Regexp, plus a compat.Matcher interface that is implemented by both *regexp.Regexp and the adapter.
The parser, optimizer, and runner internals have changed significantly to support generated regexes and additional matching optimizations.
Compile and MustCompile now use variadic compile options for regex behavior and memory/performance tuning. See Compile options for more details.
Moved regexp2.Debug and regexp2.Compile to new regexp2.OptionDebug() and regexp2.OptionIsCodeGen() compile options.
Some types and constants in the syntax package have been exported or changed to support code generation.
Conceptually changed the goal of the regexp2.ECMAScript option to be closer to the ECMAScript standard rather than C#'s ECMAScript behavior.
Renamed the fields Capture.Index and Capture.Length to Capture.RuneIndex and Capture.RuneLength to be more clear that we're dealing with rune offsets.
Added Capture.ByteRange() to return the byte offset index and length of the captured text. This requires some additional processing to be done behind the scenes the first time it's called for a given capture to convert the native rune offsets to byte offsets.

Usage

Usage is similar to the Go regexp package. Just like in regexp, you start by converting a regex into a state machine via the Compile or MustCompile methods. They ultimately do the same thing, but MustCompile will panic if the regex is invalid. You can then use the provided Regexp struct to find matches repeatedly. A Regexp struct is safe to use across goroutines.

re := regexp2.MustCompile(`Your pattern`)
if isMatch, _ := re.MatchString(`Something to match`); isMatch {
    //do something
}

The only error that the *Match* methods should return is a Timeout if you set the re.MatchTimeout field. Any other error is a bug in the regexp2 package. If you need more details about capture groups in a match then use the FindStringMatch method, like so:

if m, _ := re.FindStringMatch(`Something to match`); m != nil {
    // the whole match is always group 0
    fmt.Printf("Group 0: %v\n", m.String())

    // you can get all the groups too
    gps := m.Groups()

    // a group can be captured multiple times, so each cap is separately addressable
    fmt.Printf("Group 1, first capture", gps[1].Captures[0].String())
    fmt.Printf("Group 1, second capture", gps[1].Captures[1].String())
}

Group 0 is embedded in the Match. Group 0 is an automatically-assigned group that encompasses the whole pattern. This means that m.String() is the same as m.Group.String() and m.Groups()[0].String()

The last capture is embedded in each group, so g.String() will return the same thing as g.Capture.String() and g.Captures[len(g.Captures)-1].String().

If you want to find multiple matches from a single input string you should use the FindNextMatch method. For example, to implement a function similar to regexp.FindAllString:

func regexp2FindAllString(re *regexp2.Regexp, s string) []string {
	var matches []string
	m, _ := re.FindStringMatch(s)
	for m != nil {
		matches = append(matches, m.String())
		m, _ = re.FindNextMatch(m)
	}
	return matches
}

FindNextMatch is optmized so that it re-uses the underlying string/rune slice.

The internals of regexp2 always operate on []rune so RuneIndex and RuneLength data in a Match always reference a position in runes rather than bytes (even if the input was given as a string). ByteRange() provides UTF-8 byte offsets, matching the original string input for string APIs. It's advisable to use the provided String() methods when you do not need explicit offsets. ByteRange() lazily caches byte offsets on the shared match text, so the first call on captures from the same match is not safe to run concurrently with other ByteRange() calls on that match.

`regexp` compatibility adapter

The github.com/dlclark/regexp2/v2/compat package provides an adapter for callers that want the same Find* and Match* method signatures as the standard library's regexp.Regexp, while still using the regexp2 engine.

import (
	"github.com/dlclark/regexp2/v2"
	"github.com/dlclark/regexp2/v2/compat"
)

re := compat.MustCompile(`Your pattern`, regexp2.RE2)
if re.MatchString(`Something to match`) {
	// do something
}

matches := re.FindAllString(`abc axbc`, -1)
_ = matches

You can also wrap an existing compiled regexp:

base := regexp2.MustCompile(`Your pattern`)
re := compat.Wrap(base)

The adapter includes the standard-library matching surface: Match, MatchString, MatchReader, and all Find(All)?(String)?(Submatch)?(Index)? methods. Index-returning methods use UTF-8 byte offsets like regexp, not regexp2's rune offsets.

The package also defines compat.Matcher, a common interface implemented by both *regexp.Regexp and *compat.Regexp. Use it when code should accept either the standard library engine or a regexp2-backed adapter:

func findWords(re compat.Matcher, input string) []string {
	return re.FindAllString(input, -1)
}

Because those standard-library method signatures do not return errors, the adapter panics if the wrapped regexp2 matcher returns an error such as a match timeout. Use the main regexp2 APIs directly when you need to handle timeouts as errors.

Compile options

Compile and MustCompile take variadic compile options. Most users can omit them and get default regex behavior plus bounded shared pools for rune buffers and replacement output buffers, plus per-regexp caches for parsed replacement patterns and ASCII character class bitmaps.

Regex option constants can be passed directly, individually or as a bitmask:

re := regexp2.MustCompile(`Your pattern`, regexp2.IgnoreCase, regexp2.Singleline)
re = regexp2.MustCompile(`Your pattern`, regexp2.IgnoreCase|regexp2.Singleline)

Performance tuning options override the default cache settings:

re := regexp2.MustCompile(`Your pattern`,
	regexp2.IgnoreCase,
	regexp2.OptionMaxCachedRuneBufferLength(64*1024),
	regexp2.OptionMaxCachedReplacerDataEntries(8),
)

Compile-only options configure behavior that is not settable from the pattern:

re := regexp2.MustCompile(`(?<first>This) (is)`, regexp2.OptionMaintainCaptureOrder())

The defaults are intentionally bounded:

Option	Default	Used by	Working-set growth	Tradeoffs
`OptionMaintainCaptureOrder()`	false	Parser capture-slot assignment for mixed named and unnamed captures.	None at match time. This changes compile-time capture numbering only.	Keeps named and unnamed captures in pattern order instead of appending named captures after unnamed captures. This can change numeric backreference meaning, so it is caller-controlled rather than an inline regex option.
`OptionDebug()`	false	Compile dumps and runner tracing.	Debug output volume only.	Useful for diagnostics, but it can produce noisy output and slower traced matching.
`OptionIsCodeGen()`	false	Compile-time find-optimization analysis for `regexp2cg`.	Per compiled regexp, during `Compile` or `MustCompile`.	Enables more expensive analysis intended for generated engines. Do not use it for normal interpreter execution; the interpreter defaults intentionally avoid this extra compile-time cost.
`OptionMaxCachedRuneBufferLength(n)`	256K runes	String APIs that run through pooled runners, such as `MatchString` and replacement-pattern `Replace`, when converting input strings to the engine's internal `[]rune` representation.	Process-wide shared `sync.Pool` retention by size class. This does not grow per compiled regexp or per input string; the practical working set follows recent and concurrent use across all regexps and can be dropped by GC.	Raising this lets calls use larger pooled rune buffers and can reduce allocations for repeated matches against large strings. Lowering it prevents larger buffers from being borrowed or returned, so large inputs allocate directly.
`OptionMaxCachedReplaceBufferLength(n)`	256 KB	Replacement-pattern `Replace` calls that build output through a shared byte buffer.	Process-wide shared `sync.Pool` retention by size class after replacement-pattern `Replace` runs. It does not grow from evaluator-based `ReplaceFunc` output and is shared across compiled regexps.	Raising this lets larger replacement outputs use pooled buffers and can reduce allocations. Lowering it prevents larger output buffers from being retained, so large replacements allocate directly.
`OptionMaxCachedReplacerDataEntries(n)`	`16`	`Replace` with replacement pattern strings, after the replacement pattern is parsed into reusable replacement data.	Per compiled regexp. The cache grows as distinct cacheable replacement strings are used with `Replace`, up to this entry count.	Raising this helps when a single compiled regexp is used with many recurring replacement patterns. It increases per-regexp cache memory and lock-protected cache bookkeeping. Setting it to `0` disables this cache.
`OptionMaxCachedReplacerDataBytes(n)`	4 KB	The parsed replacement-pattern cache. Replacement strings longer than this are parsed for the call but not retained.	Per compiled regexp, combined with `OptionMaxCachedReplacerDataEntries`. Only replacement strings whose source text is at or below this size can add parsed data to the cache.	Raising this helps if large replacement patterns are reused. It can retain more memory per cached replacement. Lowering it avoids keeping unusual large replacement patterns around.
`OptionDisableCharClassASCIIBitmap()`	false	Compile-time preparation of character classes and first-character prefix sets. By default, character classes with ASCII membership get a small bitmap used by `CharIn`.	Per compiled regexp, during `Compile` or `MustCompile`. Each eligible character class can hold one small bitmap; this does not scale with match concurrency or input size.	Leaving this false speeds up ASCII-heavy character class checks at the cost of a small amount of per-char-class memory and compile-time work. Setting to true can reduce memory for large numbers of compiled char classes in regexps, but ASCII character class matching may be slower.

For pooled buffer cache options, set n to 0 to disable pooling, or -1 to allow all built-in size classes. The rune buffer classes are 1K, 4K, 16K, 64K, and 256K runes. The replacement byte buffer classes are 4 KB, 16 KB, 64 KB, 256 KB, and 1 MB. By default the 1 MB pool is unused. For replacement data byte-size cache options, -1 means unbounded. For entry-count cache options, set n to 0 to disable the cache.

Compare `regexp` and `regexp2`

Category	regexp	regexp2
Catastrophic backtracking possible	no, constant execution time guarantees	yes, if your pattern is at risk you can use the `re.MatchTimeout` field
Python-style capture groups `(?P<name>re)`	yes	no (yes in RE2 compat mode)
.NET-style capture groups `(?<name>re)` or `(?'name're)`	yes	yes
comments `(?#comment)`	no	yes
branch numbering reset `(?\|a\|b)`	no	no
possessive match `(?>re)`	no	yes
positive lookahead `(?=re)`	no	yes
negative lookahead `(?!re)`	no	yes
positive lookbehind `(?<=re)`	no	yes
negative lookbehind `(?<!re)`	no	yes
back reference `\1`	no	yes
named back reference `\k'name'`	no	yes
Python-style named back reference `(?P=name)`	no	no (yes in RE2 compat mode)
named ascii character class `[[:foo:]]`	yes	no (yes in RE2 compat mode)
conditionals `(?(expr)yes\|no)`	no	yes

RE2 compatibility mode

The default behavior of regexp2 is to match the .NET regexp engine, however the RE2 option is provided to change the parsing to increase compatibility with RE2. Using the RE2 option when compiling a regexp will not take away any features, but will change the following behaviors:

add support for named ascii character classes (e.g. [[:foo:]])
add support for python-style capture groups (e.g. (?P<name>re))
add support for python-style named backreferences (e.g. (?P=name))
change singleline behavior for $ to only match end of string (like RE2) (see #24)
change the character classes \d \s and \w to match the same characters as RE2. NOTE: if you also use the ECMAScript option then this will change the \s character class to match ECMAScript instead of RE2. ECMAScript allows more whitespace characters in \s than RE2 (but still fewer than the the default behavior).
allow character escape sequences to have defaults. For example, by default \_ isn't a known character escape and will fail to compile, but in RE2 mode it will match the literal character _

re := regexp2.MustCompile(`Your RE2-compatible pattern`, regexp2.RE2)
if isMatch, _ := re.MatchString(`Something to match`); isMatch {
    //do something
}

This feature is a work in progress and I'm open to ideas for more things to put here (maybe more relaxed character escaping rules?).

Catastrophic Backtracking and Timeouts

regexp2 supports features that can lead to catastrophic backtracking. Regexp.MatchTimeout can be set to to limit the impact of such behavior; the match will fail with an error after approximately MatchTimeout. No timeout checks are done by default.

Timeout checking is not free. The current timeout checking implementation starts a background worker that updates a clock value approximately once every 100 milliseconds. The matching code compares this value against the precomputed deadline for the match. The performance impact is as follows.

A match with a timeout runs almost as fast as a match without a timeout.
If any live matches have a timeout, there will be a background CPU load (~0.15% currently on a modern machine). This load will remain constant regardless of the number of matches done including matches done in parallel.
If no live matches are using a timeout, the background load will remain until the longest deadline (match timeout + the time when the match started) is reached. E.g., if you set a timeout of one minute the load will persist for approximately a minute even if the match finishes quickly.

See PR #58 for more details and alternatives considered.

Goroutine leak error

If you're using a library during unit tests (e.g. https://github.com/uber-go/goleak) that validates all goroutines are exited then you'll likely get an error if you or any of your dependencies use regex's with a MatchTimeout. To remedy the problem you'll need to tell the unit test to wait until the backgroup timeout goroutine is exited.

func TestSomething(t *testing.T) {
    defer goleak.VerifyNone(t)
    defer regexp2.StopTimeoutClock()

    // ... test
}

//or

func TestMain(m *testing.M) {
    // setup
    // ...

    // run 
    m.Run()

    //tear down
    regexp2.StopTimeoutClock()
    goleak.VerifyNone(t)
}

This will add ~100ms runtime to each test (or TestMain). If that's too much time you can set the clock cycle rate of the timeout goroutine in an init function in a test file. regexp2.SetTimeoutCheckPeriod isn't threadsafe so it must be setup before starting any regex's with Timeouts.

func init() {
	//speed up testing by making the timeout clock 1ms
	regexp2.SetTimeoutCheckPeriod(time.Millisecond)
}

ECMAScript compatibility mode

In this mode the engine attempts to match the regex engine described in the ECMAScript specification as closely as reasonably possible within regexp2's API and implementation.

This flag should not be treated as compatibility with C#'s RegexOptions.ECMAScript. regexp2's ECMAScript behavior prioritizes ECMAScript specification behavior over matching the C# regex engine's interpretation of that option.

Additionally a Unicode mode is provided which allows parsing of \u{CodePoint} syntax only when both ECMAScript and Unicode are provided.

Potential bugs

I've run a battery of tests against regexp2 from various sources and found the debug output matches the .NET engine, but .NET and Go handle strings very differently. I've attempted to handle these differences, but most of my testing deals with basic ASCII with a little bit of multi-byte Unicode. There's a chance that there are bugs in the string handling related to character sets with supplementary Unicode chars. Right-to-Left support is coded, but not well tested either.

Find a bug?

I'm open to new issues and pull requests with tests if you find something odd!

Documentation ¶

Overview ¶

Package regexp2 is a regexp package that has an interface similar to Go's framework regexp engine but uses a more feature full regex engine behind the scenes.

It doesn't have constant time guarantees, but it allows backtracking and is compatible with Perl5 and .NET. You'll likely be better off with the RE2 engine from the regexp package and should only use this if you need to write very complex patterns or require compatibility with .NET.

Index ¶

Constants
Variables
func Escape(input string) string
func RegisterEngine(pattern string, engine RuntimeEngineData, options ...CompileOption)
func SetTimeoutCheckPeriod(d time.Duration)
func StopTimeoutClock()
func Unescape(input string) (string, error)
type Capture
- func (c *Capture) ByteRange() (index, length int)
- func (c *Capture) Runes() []rune
- func (c *Capture) String() string
type CompileOption
- func OptionDebug() CompileOption
- func OptionDisableCharClassASCIIBitmap() CompileOption
- func OptionIsCodeGen() CompileOption
- func OptionMaintainCaptureOrder() CompileOption
- func OptionMaxCachedReplaceBufferLength(n int) CompileOption
- func OptionMaxCachedReplacerDataBytes(n int) CompileOption
- func OptionMaxCachedReplacerDataEntries(n int) CompileOption
- func OptionMaxCachedRuneBufferLength(n int) CompileOption
type Group
type Match
- func (m *Match) GroupByName(name string) *Group
- func (m *Match) GroupByNumber(num int) *Group
- func (m *Match) GroupCount() int
- func (m *Match) Groups() []Group
type MatchEvaluator
type OptimizationOptions
type RegexOptions
type Regexp
- func Compile(expr string, options ...CompileOption) (*Regexp, error)
- func MustCompile(str string, options ...CompileOption) *Regexp
- func (re *Regexp) Debug() bool
- func (re *Regexp) FindAllRunesIndex(r []rune, n int) ([][]int, error)
- func (re *Regexp) FindAllStringIndex(s string, n int) ([][]int, error)
- func (re *Regexp) FindNextMatch(m *Match) (*Match, error)
- func (re *Regexp) FindRunesMatch(r []rune) (*Match, error)
- func (re *Regexp) FindRunesMatchStartingAt(r []rune, startAt int) (*Match, error)
- func (re *Regexp) FindStringMatch(s string) (*Match, error)
- func (re *Regexp) FindStringMatchStartingAt(s string, startAt int) (*Match, error)
- func (re *Regexp) GetGroupNames() []string
- func (re *Regexp) GetGroupNumbers() []int
- func (re *Regexp) GroupNameFromNumber(i int) string
- func (re *Regexp) GroupNumberFromName(name string) int
- func (re *Regexp) MarshalText() ([]byte, error)
- func (re *Regexp) MatchRunes(r []rune) (bool, error)
- func (re *Regexp) MatchString(s string) (bool, error)
- func (re *Regexp) Replace(input, replacement string, startAt, count int) (string, error)
- func (re *Regexp) ReplaceFunc(input string, evaluator MatchEvaluator, startAt, count int) (string, error)
- func (re *Regexp) RightToLeft() bool
- func (re *Regexp) Split(input string, count int) ([]string, error)
- func (re *Regexp) String() string
- func (re *Regexp) UnmarshalText(text []byte) error
type Runner
- func (r *Runner) Capture(capnum, start, end int)
- func (r *Runner) CheckTimeout() error
- func (r *Runner) Crawlpos() int
- func (r *Runner) IsBoundary(index int) bool
- func (r *Runner) IsECMABoundary(index int) bool
- func (r *Runner) IsMatched(cap int) bool
- func (r *Runner) LastIndexOfRune(startIndex int, endIndex int, find rune) int
- func (r *Runner) MatchIndex(cap int) int
- func (r *Runner) MatchLength(cap int) int
- func (r *Runner) StackPop() int
- func (r *Runner) StackPush(val int)
- func (r *Runner) StackPush2(val1, val2 int)
- func (r *Runner) StackPush3(val1, val2, val3 int)
- func (r *Runner) StackPush4(val1, val2, val3, val4 int)
- func (r *Runner) StackPush5(val1, val2, val3, val4, val5 int)
- func (r *Runner) StackPushN(vals ...int)
- func (r *Runner) UncaptureUntil(capturePos int)
type RuntimeEngineData
type StringPrefixFilter

Constants ¶

View Source

const DefaultClockPeriod = 100 * time.Millisecond

Variables ¶

View Source

var (
	// DefaultUnmarshalOptions used when unmarshaling a regex from text
	DefaultUnmarshalOptions = None
	// DefaultOptimizationOptions controls the default memory/performance trade-offs used by Compile.
	DefaultOptimizationOptions = OptimizationOptions{
		MaxCachedRuneBufferLength:    256 << 10,
		MaxCachedReplaceBufferLength: 256 << 10,
		MaxCachedReplacerDataEntries: 16,
		MaxCachedReplacerDataBytes:   4 << 10,
		DisableCharClassASCIIBitmap:  false,
	}
)

View Source

var (
	// DefaultMatchTimeout used when running regexp matches -- "forever"
	DefaultMatchTimeout = time.Duration(math.MaxInt64)
)

Functions ¶

func Escape ¶

func Escape(input string) string

Escape adds backslashes to any special characters in the input string

func RegisterEngine ¶

func RegisterEngine(pattern string, engine RuntimeEngineData, options ...CompileOption)

func SetTimeoutCheckPeriod ¶

func SetTimeoutCheckPeriod(d time.Duration)

SetTimeoutPeriod is a debug function that sets the frequency of the timeout goroutine's sleep cycle. Defaults to 100ms. The only benefit of setting this lower is that the 1 background goroutine that manages timeouts may exit slightly sooner after all the timeouts have expired. See Github issue #63

func StopTimeoutClock ¶

func StopTimeoutClock()

StopTimeoutClock should only be used in unit tests to prevent the timeout clock goroutine from appearing like a leaking goroutine

func Unescape ¶

func Unescape(input string) (string, error)

Unescape removes any backslashes from previously-escaped special characters in the input string

Types ¶

type Capture ¶

type Capture struct {

	// RuneIndex is the position in the underlying rune slice where the first character of
	// captured substring was found. Even if you pass in a string this will be in Runes.
	RuneIndex int
	// RuneLength is the number of runes in the captured substring.
	RuneLength int
	// contains filtered or unexported fields
}

Capture is a single capture of text within the larger original string

func (*Capture) ByteRange ¶

func (c *Capture) ByteRange() (index, length int)

ByteRange returns the UTF-8 byte index and byte length of the captured substring. The first call lazily caches byte offsets on shared match text, so it is not safe to call concurrently with ByteRange on another capture from the same match until the cache has been initialized.

func (*Capture) Runes ¶

func (c *Capture) Runes() []rune

Runes returns the captured text as a rune slice

func (*Capture) String ¶

func (c *Capture) String() string

String returns the captured text as a String

type CompileOption ¶

type CompileOption interface {
	// contains filtered or unexported methods
}

CompileOption configures Compile and MustCompile.

func OptionDebug ¶

func OptionDebug() CompileOption

OptionDebug enables debug output and runner tracing for the compiled regexp.

func OptionDisableCharClassASCIIBitmap ¶

func OptionDisableCharClassASCIIBitmap() CompileOption

OptionDisableCharClassASCIIBitmap disables compile-time ASCII bitmaps for character classes.

func OptionIsCodeGen ¶

func OptionIsCodeGen() CompileOption

OptionIsCodeGen enables more expensive compile-time analysis intended for regexp2cg generated engines.

func OptionMaintainCaptureOrder ¶

func OptionMaintainCaptureOrder() CompileOption

OptionMaintainCaptureOrder assigns named and unnamed capture slots in pattern order.

func OptionMaxCachedReplaceBufferLength ¶

func OptionMaxCachedReplaceBufferLength(n int) CompileOption

OptionMaxCachedReplaceBufferLength limits retained replacement output buffers in the shared size-classed pool.

func OptionMaxCachedReplacerDataBytes ¶

func OptionMaxCachedReplacerDataBytes(n int) CompileOption

OptionMaxCachedReplacerDataBytes skips caching replacement patterns longer than n bytes.

func OptionMaxCachedReplacerDataEntries ¶

func OptionMaxCachedReplacerDataEntries(n int) CompileOption

OptionMaxCachedReplacerDataEntries limits parsed replacement patterns cached per Regexp.

func OptionMaxCachedRuneBufferLength ¶

func OptionMaxCachedRuneBufferLength(n int) CompileOption

OptionMaxCachedRuneBufferLength limits retained string-to-rune buffers in the shared size-classed pool.

type Group ¶

type Group struct {
	Capture // the last capture of this group is embeded for ease of use

	Name     string    // group name
	Captures []Capture // captures of this group
}

Group is an explicit or implit (group 0) matched group within the pattern

type Match ¶

type Match struct {
	Group //embeded group 0
	// contains filtered or unexported fields
}

Match is a single regex result match that contains groups and repeated captures

	-Groups
   -Capture

func (*Match) GroupByName ¶

func (m *Match) GroupByName(name string) *Group

GroupByName returns a group based on the name of the group, or nil if the group name does not exist

func (*Match) GroupByNumber ¶

func (m *Match) GroupByNumber(num int) *Group

GroupByNumber returns a group based on the number of the group, or nil if the group number does not exist

func (*Match) GroupCount ¶

func (m *Match) GroupCount() int

GroupCount returns the number of groups this match has matched

func (*Match) Groups ¶

func (m *Match) Groups() []Group

Groups returns all the capture groups, starting with group 0 (the full match)

type MatchEvaluator ¶

type MatchEvaluator func(Match) string

MatchEvaluator is a function that takes a match and returns a replacement string to be used

type OptimizationOptions ¶

type OptimizationOptions struct {
	// MaxCachedRuneBufferLength limits retained string-to-rune buffers in the shared size-classed pool.
	MaxCachedRuneBufferLength int
	// MaxCachedReplaceBufferLength limits retained replacement output buffers in the shared size-classed pool.
	MaxCachedReplaceBufferLength int
	// MaxCachedReplacerDataEntries limits the number of parsed replacement patterns cached per Regexp.
	MaxCachedReplacerDataEntries int
	// MaxCachedReplacerDataBytes skips caching replacement patterns longer than this many bytes.
	MaxCachedReplacerDataBytes int
	// DisableCharClassASCIIBitmap disables compile-time ASCII bitmap construction for character classes.
	DisableCharClassASCIIBitmap bool
}

OptimizationOptions controls optional runtime caches and compile-time fast paths.

For replacement data cache size fields, 0 disables persistent retention and -1 means unbounded. For pooled buffer cache size fields, 0 disables pooling and -1 allows all built-in size classes. Defaults are intentionally bounded so Compile is safe for mixed-cardinality inputs.

type RegexOptions ¶

type RegexOptions int32

RegexOptions impact the runtime and parsing behavior for each specific regex. They are setable in code as well as in the regex pattern itself.

const (
	None                    RegexOptions = 0x0
	IgnoreCase              RegexOptions = 0x0001 // "i"
	Multiline               RegexOptions = 0x0002 // "m"
	ExplicitCapture         RegexOptions = 0x0004 // "n"
	Singleline              RegexOptions = 0x0010 // "s"
	IgnorePatternWhitespace RegexOptions = 0x0020 // "x"
	RightToLeft             RegexOptions = 0x0040 // "r"
	// ECMAScript attempts to follow ECMAScript regex behavior rather than C# RegexOptions.ECMAScript compatibility.
	ECMAScript RegexOptions = 0x0100 // "e"
	RE2        RegexOptions = 0x0200 // RE2 (regexp package) compatibility mode
	Unicode    RegexOptions = 0x0400 // "u"
)

type Regexp ¶

type Regexp struct {
	// A match will time out if it takes (approximately) more than
	// MatchTimeout. This is a safety check in case the match
	// encounters catastrophic backtracking.  The default value
	// (DefaultMatchTimeout) causes all time out checking to be
	// suppressed.
	MatchTimeout time.Duration
	// contains filtered or unexported fields
}

Regexp is the representation of a compiled regular expression. A Regexp is safe for concurrent use by multiple goroutines.

func Compile ¶

func Compile(expr string, options ...CompileOption) (*Regexp, error)

Compile parses a regular expression and returns, if successful, a Regexp object that can be used to match against text.

func MustCompile ¶

func MustCompile(str string, options ...CompileOption) *Regexp

MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.

func (*Regexp) Debug ¶

func (re *Regexp) Debug() bool

func (*Regexp) FindAllRunesIndex ¶ added in v2.1.0

func (re *Regexp) FindAllRunesIndex(r []rune, n int) ([][]int, error)

FindAllRunesIndex returns a slice of rune index pairs identifying all successive matches in r.

func (*Regexp) FindAllStringIndex ¶ added in v2.1.0

func (re *Regexp) FindAllStringIndex(s string, n int) ([][]int, error)

FindAllStringIndex returns a slice of byte index pairs identifying all successive matches in s.

func (*Regexp) FindNextMatch ¶

func (re *Regexp) FindNextMatch(m *Match) (*Match, error)

FindNextMatch returns the next match in the same input string as the match parameter. Will return nil if there is no next match or if given a nil match.

func (*Regexp) FindRunesMatch ¶

func (re *Regexp) FindRunesMatch(r []rune) (*Match, error)

FindRunesMatch searches the input rune slice for a Regexp match

func (*Regexp) FindRunesMatchStartingAt ¶

func (re *Regexp) FindRunesMatchStartingAt(r []rune, startAt int) (*Match, error)

FindRunesMatchStartingAt searches the input rune slice for a Regexp match starting at the startAt index

func (*Regexp) FindStringMatch ¶

func (re *Regexp) FindStringMatch(s string) (*Match, error)

FindStringMatch searches the input string for a Regexp match

func (*Regexp) FindStringMatchStartingAt ¶

func (re *Regexp) FindStringMatchStartingAt(s string, startAt int) (*Match, error)

FindStringMatchStartingAt searches the input string for a Regexp match starting at the startAt index

func (*Regexp) GetGroupNames ¶

func (re *Regexp) GetGroupNames() []string

GetGroupNames Returns the set of strings used to name capturing groups in the expression.

func (*Regexp) GetGroupNumbers ¶

func (re *Regexp) GetGroupNumbers() []int

GetGroupNumbers returns the integer group numbers corresponding to a group name.

func (*Regexp) GroupNameFromNumber ¶

func (re *Regexp) GroupNameFromNumber(i int) string

GroupNameFromNumber retrieves a group name that corresponds to a group number. It will return "" for an unknown group number. Unnamed groups automatically receive a name that is the decimal string equivalent of its number, except in ECMAScript mode where unnamed groups have no name.

func (*Regexp) GroupNumberFromName ¶

func (re *Regexp) GroupNumberFromName(name string) int

GroupNumberFromName returns a group number that corresponds to a group name. Returns -1 if the name is not a recognized group name. Numbered groups automatically get a group name that is the decimal string equivalent of its number, except in ECMAScript mode where unnamed groups have no name.

func (*Regexp) MarshalText ¶

func (re *Regexp) MarshalText() ([]byte, error)

MarshalText implements encoding.TextMarshaler. The output matches that of calling the Regexp.String method.

func (*Regexp) MatchRunes ¶

func (re *Regexp) MatchRunes(r []rune) (bool, error)

MatchRunes return true if the runes matches the regex error will be set if a timeout occurs

func (*Regexp) MatchString ¶

func (re *Regexp) MatchString(s string) (bool, error)

MatchString return true if the string matches the regex error will be set if a timeout occurs

func (*Regexp) Replace ¶

func (re *Regexp) Replace(input, replacement string, startAt, count int) (string, error)

Replace searches the input string and replaces each match found with the replacement text. Count will limit the number of matches attempted and startAt will allow us to skip past possible matches at the start of the input (left or right depending on RightToLeft option). Set startAt and count to -1 to go through the whole string

func (*Regexp) ReplaceFunc ¶

func (re *Regexp) ReplaceFunc(input string, evaluator MatchEvaluator, startAt, count int) (string, error)

ReplaceFunc searches the input string and replaces each match found using the string from the evaluator Count will limit the number of matches attempted and startAt will allow us to skip past possible matches at the start of the input (left or right depending on RightToLeft option). Set startAt and count to -1 to go through the whole string.

func (*Regexp) RightToLeft ¶

func (re *Regexp) RightToLeft() bool

func (*Regexp) Split ¶

func (re *Regexp) Split(input string, count int) ([]string, error)

Split splits the given input string using the pattern and returns a slice of the parts. Count limits the number of matches to process. If Count is -1, then it will process the input fully. If Count is 0, returns nil. If Count is 1, returns the original input. The only expected error is a Timeout, if it's set.

If capturing parentheses are used in the Regex expression, any captured text is included in the resulting string array For example, a pattern of "-" Split("a-b") will return ["a", "b"] but a pattern with "(-)" Split ("a-b") will return ["a", "-", "b"]

func (*Regexp) String ¶

func (re *Regexp) String() string

String returns the source text used to compile the regular expression.

func (*Regexp) UnmarshalText ¶

func (re *Regexp) UnmarshalText(text []byte) error

UnmarshalText implements encoding.TextUnmarshaler by calling Compile on the encoded value.

type Runner ¶

type Runner struct {
	Runtextstart int // starting point for search

	Runtext    []rune // text to search
	Runtextpos int    // current position in text
	Runtextend int

	Runtrackpos int

	Runstackpos int
	// contains filtered or unexported fields
}

func (*Runner) Capture ¶

func (r *Runner) Capture(capnum, start, end int)

Capture captures a subexpression. Note that the capnum used here has already been mapped to a non-sparse index (by the code generator RegexWriter).

func (*Runner) CheckTimeout ¶

func (r *Runner) CheckTimeout() error

func (*Runner) Crawlpos ¶

func (r *Runner) Crawlpos() int

Get the height of the stack

func (*Runner) IsBoundary ¶

func (r *Runner) IsBoundary(index int) bool

decide whether the pos at the specified index is a boundary or not. It's just not worth emitting inline code for this logic.

func (*Runner) IsECMABoundary ¶

func (r *Runner) IsECMABoundary(index int) bool

func (*Runner) IsMatched ¶

func (r *Runner) IsMatched(cap int) bool

func (*Runner) LastIndexOfRune ¶

func (r *Runner) LastIndexOfRune(startIndex int, endIndex int, find rune) int

func (*Runner) MatchIndex ¶

func (r *Runner) MatchIndex(cap int) int

func (*Runner) MatchLength ¶

func (r *Runner) MatchLength(cap int) int

func (*Runner) StackPop ¶

func (r *Runner) StackPop() int

func (*Runner) StackPush ¶

func (r *Runner) StackPush(val int)

func (*Runner) StackPush2 ¶

func (r *Runner) StackPush2(val1, val2 int)

func (*Runner) StackPush3 ¶

func (r *Runner) StackPush3(val1, val2, val3 int)

func (*Runner) StackPush4 ¶

func (r *Runner) StackPush4(val1, val2, val3, val4 int)

func (*Runner) StackPush5 ¶

func (r *Runner) StackPush5(val1, val2, val3, val4, val5 int)

func (*Runner) StackPushN ¶

func (r *Runner) StackPushN(vals ...int)

func (*Runner) UncaptureUntil ¶

func (r *Runner) UncaptureUntil(capturePos int)

Undo captures until it reaches the specified capture position

type RuntimeEngineData ¶

type RuntimeEngineData struct {
	Caps               map[int]int        // capnum->index
	CapNames           map[string]int     // cap group name -> index
	CapsList           []string           // sorted list of capture group names
	CapSize            int                // size of the capture array
	FindFirstChar      func(*Runner) bool // generated candidate search
	Execute            func(*Runner) error
	StringPrefixFilter StringPrefixFilter // optional pre-decode candidate search for string input
}

type StringPrefixFilter ¶

type StringPrefixFilter func(input string, startAt int) (candidateByteIndex int, ok bool)

StringPrefixFilter optionally searches string input before the engine decodes it to runes. It returns a byte index for a candidate match start, or ok=false if the regex cannot match. The filter must be conservative: false positives are allowed, false negatives are not.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
compat Package compat provides a regexp2 adapter with regexp.Regexp-compatible matching method signatures.	Package compat provides a regexp2 adapter with regexp.Regexp-compatible matching method signatures.
helpers
syntax

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

regexp2 - full featured regular expressions for Go

Basis of the engine

New Code Generation

Installing

Changes in v2

Usage

regexp compatibility adapter

Compile options

Compare regexp and regexp2

RE2 compatibility mode

Catastrophic Backtracking and Timeouts

Goroutine leak error

ECMAScript compatibility mode

Potential bugs

Find a bug?

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func Escape ¶

func RegisterEngine ¶

func SetTimeoutCheckPeriod ¶

func StopTimeoutClock ¶

func Unescape ¶

Types ¶

type Capture ¶

func (*Capture) ByteRange ¶

func (*Capture) Runes ¶

func (*Capture) String ¶

type CompileOption ¶

func OptionDebug ¶

func OptionDisableCharClassASCIIBitmap ¶

func OptionIsCodeGen ¶

func OptionMaintainCaptureOrder ¶

func OptionMaxCachedReplaceBufferLength ¶

func OptionMaxCachedReplacerDataBytes ¶

func OptionMaxCachedReplacerDataEntries ¶

func OptionMaxCachedRuneBufferLength ¶

type Group ¶

type Match ¶

func (*Match) GroupByName ¶

func (*Match) GroupByNumber ¶

func (*Match) GroupCount ¶

func (*Match) Groups ¶

type MatchEvaluator ¶

type OptimizationOptions ¶

type RegexOptions ¶

type Regexp ¶

func Compile ¶

func MustCompile ¶

func (*Regexp) Debug ¶

func (*Regexp) FindAllRunesIndex ¶ added in v2.1.0

func (*Regexp) FindAllStringIndex ¶ added in v2.1.0

func (*Regexp) FindNextMatch ¶

func (*Regexp) FindRunesMatch ¶

func (*Regexp) FindRunesMatchStartingAt ¶

func (*Regexp) FindStringMatch ¶

func (*Regexp) FindStringMatchStartingAt ¶

func (*Regexp) GetGroupNames ¶

func (*Regexp) GetGroupNumbers ¶

func (*Regexp) GroupNameFromNumber ¶

func (*Regexp) GroupNumberFromName ¶

func (*Regexp) MarshalText ¶

func (*Regexp) MatchRunes ¶

func (*Regexp) MatchString ¶

func (*Regexp) Replace ¶

func (*Regexp) ReplaceFunc ¶

func (*Regexp) RightToLeft ¶

func (*Regexp) Split ¶

func (*Regexp) String ¶

func (*Regexp) UnmarshalText ¶

type Runner ¶

func (*Runner) Capture ¶

func (*Runner) CheckTimeout ¶

func (*Runner) Crawlpos ¶

func (*Runner) IsBoundary ¶

func (*Runner) IsECMABoundary ¶

`regexp` compatibility adapter

Compare `regexp` and `regexp2`