Documentation
¶
Overview ¶
Package regexp2 is a regexp package that has an interface similar to Go's framework regexp engine but uses a more feature full regex engine behind the scenes.
It doesn't have constant time guarantees, but it allows backtracking and is compatible with Perl5 and .NET. You'll likely be better off with the RE2 engine from the regexp package and should only use this if you need to write very complex patterns or require compatibility with .NET.
Index ¶
- Constants
- Variables
- func Escape(input string) string
- func RegisterEngine(pattern string, engine RuntimeEngineData, options ...CompileOption)
- func SetTimeoutCheckPeriod(d time.Duration)
- func StopTimeoutClock()
- func Unescape(input string) (string, error)
- type Capture
- type CompileOption
- func OptionDebug() CompileOption
- func OptionDisableCharClassASCIIBitmap() CompileOption
- func OptionIsCodeGen() CompileOption
- func OptionMaintainCaptureOrder() CompileOption
- func OptionMaxCachedReplaceBufferLength(n int) CompileOption
- func OptionMaxCachedReplacerDataBytes(n int) CompileOption
- func OptionMaxCachedReplacerDataEntries(n int) CompileOption
- func OptionMaxCachedRuneBufferLength(n int) CompileOption
- type Group
- type Match
- type MatchEvaluator
- type OptimizationOptions
- type RegexOptions
- type Regexp
- func (re *Regexp) Debug() bool
- func (re *Regexp) FindAllRunesIndex(r []rune, n int) ([][]int, error)
- func (re *Regexp) FindAllStringIndex(s string, n int) ([][]int, error)
- func (re *Regexp) FindNextMatch(m *Match) (*Match, error)
- func (re *Regexp) FindRunesMatch(r []rune) (*Match, error)
- func (re *Regexp) FindRunesMatchStartingAt(r []rune, startAt int) (*Match, error)
- func (re *Regexp) FindStringMatch(s string) (*Match, error)
- func (re *Regexp) FindStringMatchStartingAt(s string, startAt int) (*Match, error)
- func (re *Regexp) GetGroupNames() []string
- func (re *Regexp) GetGroupNumbers() []int
- func (re *Regexp) GroupNameFromNumber(i int) string
- func (re *Regexp) GroupNumberFromName(name string) int
- func (re *Regexp) MarshalText() ([]byte, error)
- func (re *Regexp) MatchRunes(r []rune) (bool, error)
- func (re *Regexp) MatchString(s string) (bool, error)
- func (re *Regexp) Replace(input, replacement string, startAt, count int) (string, error)
- func (re *Regexp) ReplaceFunc(input string, evaluator MatchEvaluator, startAt, count int) (string, error)
- func (re *Regexp) RightToLeft() bool
- func (re *Regexp) Split(input string, count int) ([]string, error)
- func (re *Regexp) String() string
- func (re *Regexp) UnmarshalText(text []byte) error
- type Runner
- func (r *Runner) Capture(capnum, start, end int)
- func (r *Runner) CheckTimeout() error
- func (r *Runner) Crawlpos() int
- func (r *Runner) IsBoundary(index int) bool
- func (r *Runner) IsECMABoundary(index int) bool
- func (r *Runner) IsMatched(cap int) bool
- func (r *Runner) LastIndexOfRune(startIndex int, endIndex int, find rune) int
- func (r *Runner) MatchIndex(cap int) int
- func (r *Runner) MatchLength(cap int) int
- func (r *Runner) StackPop() int
- func (r *Runner) StackPush(val int)
- func (r *Runner) StackPush2(val1, val2 int)
- func (r *Runner) StackPush3(val1, val2, val3 int)
- func (r *Runner) StackPush4(val1, val2, val3, val4 int)
- func (r *Runner) StackPush5(val1, val2, val3, val4, val5 int)
- func (r *Runner) StackPushN(vals ...int)
- func (r *Runner) UncaptureUntil(capturePos int)
- type RuntimeEngineData
- type StringPrefixFilter
Constants ¶
const DefaultClockPeriod = 100 * time.Millisecond
Variables ¶
var ( // DefaultUnmarshalOptions used when unmarshaling a regex from text DefaultUnmarshalOptions = None // DefaultOptimizationOptions controls the default memory/performance trade-offs used by Compile. DefaultOptimizationOptions = OptimizationOptions{ MaxCachedRuneBufferLength: 256 << 10, MaxCachedReplaceBufferLength: 256 << 10, MaxCachedReplacerDataEntries: 16, MaxCachedReplacerDataBytes: 4 << 10, DisableCharClassASCIIBitmap: false, } )
var ( // DefaultMatchTimeout used when running regexp matches -- "forever" DefaultMatchTimeout = time.Duration(math.MaxInt64) )
Functions ¶
func RegisterEngine ¶
func RegisterEngine(pattern string, engine RuntimeEngineData, options ...CompileOption)
func SetTimeoutCheckPeriod ¶
SetTimeoutPeriod is a debug function that sets the frequency of the timeout goroutine's sleep cycle. Defaults to 100ms. The only benefit of setting this lower is that the 1 background goroutine that manages timeouts may exit slightly sooner after all the timeouts have expired. See Github issue #63
func StopTimeoutClock ¶
func StopTimeoutClock()
StopTimeoutClock should only be used in unit tests to prevent the timeout clock goroutine from appearing like a leaking goroutine
Types ¶
type Capture ¶
type Capture struct {
// RuneIndex is the position in the underlying rune slice where the first character of
// captured substring was found. Even if you pass in a string this will be in Runes.
RuneIndex int
// RuneLength is the number of runes in the captured substring.
RuneLength int
// contains filtered or unexported fields
}
Capture is a single capture of text within the larger original string
func (*Capture) ByteRange ¶
ByteRange returns the UTF-8 byte index and byte length of the captured substring. The first call lazily caches byte offsets on shared match text, so it is not safe to call concurrently with ByteRange on another capture from the same match until the cache has been initialized.
type CompileOption ¶
type CompileOption interface {
// contains filtered or unexported methods
}
CompileOption configures Compile and MustCompile.
func OptionDebug ¶
func OptionDebug() CompileOption
OptionDebug enables debug output and runner tracing for the compiled regexp.
func OptionDisableCharClassASCIIBitmap ¶
func OptionDisableCharClassASCIIBitmap() CompileOption
OptionDisableCharClassASCIIBitmap disables compile-time ASCII bitmaps for character classes.
func OptionIsCodeGen ¶
func OptionIsCodeGen() CompileOption
OptionIsCodeGen enables more expensive compile-time analysis intended for regexp2cg generated engines.
func OptionMaintainCaptureOrder ¶
func OptionMaintainCaptureOrder() CompileOption
OptionMaintainCaptureOrder assigns named and unnamed capture slots in pattern order.
func OptionMaxCachedReplaceBufferLength ¶
func OptionMaxCachedReplaceBufferLength(n int) CompileOption
OptionMaxCachedReplaceBufferLength limits retained replacement output buffers in the shared size-classed pool.
func OptionMaxCachedReplacerDataBytes ¶
func OptionMaxCachedReplacerDataBytes(n int) CompileOption
OptionMaxCachedReplacerDataBytes skips caching replacement patterns longer than n bytes.
func OptionMaxCachedReplacerDataEntries ¶
func OptionMaxCachedReplacerDataEntries(n int) CompileOption
OptionMaxCachedReplacerDataEntries limits parsed replacement patterns cached per Regexp.
func OptionMaxCachedRuneBufferLength ¶
func OptionMaxCachedRuneBufferLength(n int) CompileOption
OptionMaxCachedRuneBufferLength limits retained string-to-rune buffers in the shared size-classed pool.
type Group ¶
type Group struct {
Capture // the last capture of this group is embeded for ease of use
Name string // group name
Captures []Capture // captures of this group
}
Group is an explicit or implit (group 0) matched group within the pattern
type Match ¶
type Match struct {
Group //embeded group 0
// contains filtered or unexported fields
}
Match is a single regex result match that contains groups and repeated captures
-Groups -Capture
func (*Match) GroupByName ¶
GroupByName returns a group based on the name of the group, or nil if the group name does not exist
func (*Match) GroupByNumber ¶
GroupByNumber returns a group based on the number of the group, or nil if the group number does not exist
func (*Match) GroupCount ¶
GroupCount returns the number of groups this match has matched
type MatchEvaluator ¶
MatchEvaluator is a function that takes a match and returns a replacement string to be used
type OptimizationOptions ¶
type OptimizationOptions struct {
// MaxCachedRuneBufferLength limits retained string-to-rune buffers in the shared size-classed pool.
MaxCachedRuneBufferLength int
// MaxCachedReplaceBufferLength limits retained replacement output buffers in the shared size-classed pool.
MaxCachedReplaceBufferLength int
// MaxCachedReplacerDataEntries limits the number of parsed replacement patterns cached per Regexp.
MaxCachedReplacerDataEntries int
// MaxCachedReplacerDataBytes skips caching replacement patterns longer than this many bytes.
MaxCachedReplacerDataBytes int
// DisableCharClassASCIIBitmap disables compile-time ASCII bitmap construction for character classes.
DisableCharClassASCIIBitmap bool
}
OptimizationOptions controls optional runtime caches and compile-time fast paths.
For replacement data cache size fields, 0 disables persistent retention and -1 means unbounded. For pooled buffer cache size fields, 0 disables pooling and -1 allows all built-in size classes. Defaults are intentionally bounded so Compile is safe for mixed-cardinality inputs.
type RegexOptions ¶
type RegexOptions int32
RegexOptions impact the runtime and parsing behavior for each specific regex. They are setable in code as well as in the regex pattern itself.
const ( None RegexOptions = 0x0 IgnoreCase RegexOptions = 0x0001 // "i" Multiline RegexOptions = 0x0002 // "m" ExplicitCapture RegexOptions = 0x0004 // "n" Singleline RegexOptions = 0x0010 // "s" IgnorePatternWhitespace RegexOptions = 0x0020 // "x" RightToLeft RegexOptions = 0x0040 // "r" // ECMAScript attempts to follow ECMAScript regex behavior rather than C# RegexOptions.ECMAScript compatibility. ECMAScript RegexOptions = 0x0100 // "e" RE2 RegexOptions = 0x0200 // RE2 (regexp package) compatibility mode Unicode RegexOptions = 0x0400 // "u" )
type Regexp ¶
type Regexp struct {
// A match will time out if it takes (approximately) more than
// MatchTimeout. This is a safety check in case the match
// encounters catastrophic backtracking. The default value
// (DefaultMatchTimeout) causes all time out checking to be
// suppressed.
MatchTimeout time.Duration
// contains filtered or unexported fields
}
Regexp is the representation of a compiled regular expression. A Regexp is safe for concurrent use by multiple goroutines.
func Compile ¶
func Compile(expr string, options ...CompileOption) (*Regexp, error)
Compile parses a regular expression and returns, if successful, a Regexp object that can be used to match against text.
func MustCompile ¶
func MustCompile(str string, options ...CompileOption) *Regexp
MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.
func (*Regexp) FindAllRunesIndex ¶ added in v2.1.0
FindAllRunesIndex returns a slice of rune index pairs identifying all successive matches in r.
func (*Regexp) FindAllStringIndex ¶ added in v2.1.0
FindAllStringIndex returns a slice of byte index pairs identifying all successive matches in s.
func (*Regexp) FindNextMatch ¶
FindNextMatch returns the next match in the same input string as the match parameter. Will return nil if there is no next match or if given a nil match.
func (*Regexp) FindRunesMatch ¶
FindRunesMatch searches the input rune slice for a Regexp match
func (*Regexp) FindRunesMatchStartingAt ¶
FindRunesMatchStartingAt searches the input rune slice for a Regexp match starting at the startAt index
func (*Regexp) FindStringMatch ¶
FindStringMatch searches the input string for a Regexp match
func (*Regexp) FindStringMatchStartingAt ¶
FindStringMatchStartingAt searches the input string for a Regexp match starting at the startAt index
func (*Regexp) GetGroupNames ¶
GetGroupNames Returns the set of strings used to name capturing groups in the expression.
func (*Regexp) GetGroupNumbers ¶
GetGroupNumbers returns the integer group numbers corresponding to a group name.
func (*Regexp) GroupNameFromNumber ¶
GroupNameFromNumber retrieves a group name that corresponds to a group number. It will return "" for an unknown group number. Unnamed groups automatically receive a name that is the decimal string equivalent of its number, except in ECMAScript mode where unnamed groups have no name.
func (*Regexp) GroupNumberFromName ¶
GroupNumberFromName returns a group number that corresponds to a group name. Returns -1 if the name is not a recognized group name. Numbered groups automatically get a group name that is the decimal string equivalent of its number, except in ECMAScript mode where unnamed groups have no name.
func (*Regexp) MarshalText ¶
MarshalText implements encoding.TextMarshaler. The output matches that of calling the Regexp.String method.
func (*Regexp) MatchRunes ¶
MatchRunes return true if the runes matches the regex error will be set if a timeout occurs
func (*Regexp) MatchString ¶
MatchString return true if the string matches the regex error will be set if a timeout occurs
func (*Regexp) Replace ¶
Replace searches the input string and replaces each match found with the replacement text. Count will limit the number of matches attempted and startAt will allow us to skip past possible matches at the start of the input (left or right depending on RightToLeft option). Set startAt and count to -1 to go through the whole string
func (*Regexp) ReplaceFunc ¶
func (re *Regexp) ReplaceFunc(input string, evaluator MatchEvaluator, startAt, count int) (string, error)
ReplaceFunc searches the input string and replaces each match found using the string from the evaluator Count will limit the number of matches attempted and startAt will allow us to skip past possible matches at the start of the input (left or right depending on RightToLeft option). Set startAt and count to -1 to go through the whole string.
func (*Regexp) RightToLeft ¶
func (*Regexp) Split ¶
Split splits the given input string using the pattern and returns a slice of the parts. Count limits the number of matches to process. If Count is -1, then it will process the input fully. If Count is 0, returns nil. If Count is 1, returns the original input. The only expected error is a Timeout, if it's set.
If capturing parentheses are used in the Regex expression, any captured text is included in the resulting string array For example, a pattern of "-" Split("a-b") will return ["a", "b"] but a pattern with "(-)" Split ("a-b") will return ["a", "-", "b"]
func (*Regexp) UnmarshalText ¶
UnmarshalText implements encoding.TextUnmarshaler by calling Compile on the encoded value.
type Runner ¶
type Runner struct {
Runtextstart int // starting point for search
Runtext []rune // text to search
Runtextpos int // current position in text
Runtextend int
Runtrackpos int
Runstackpos int
// contains filtered or unexported fields
}
func (*Runner) Capture ¶
Capture captures a subexpression. Note that the capnum used here has already been mapped to a non-sparse index (by the code generator RegexWriter).
func (*Runner) CheckTimeout ¶
func (*Runner) IsBoundary ¶
decide whether the pos at the specified index is a boundary or not. It's just not worth emitting inline code for this logic.
func (*Runner) IsECMABoundary ¶
func (*Runner) LastIndexOfRune ¶
func (*Runner) MatchIndex ¶
func (*Runner) MatchLength ¶
func (*Runner) StackPush2 ¶
func (*Runner) StackPush3 ¶
func (*Runner) StackPush4 ¶
func (*Runner) StackPush5 ¶
func (*Runner) StackPushN ¶
func (*Runner) UncaptureUntil ¶
Undo captures until it reaches the specified capture position
type RuntimeEngineData ¶
type RuntimeEngineData struct {
Caps map[int]int // capnum->index
CapNames map[string]int // cap group name -> index
CapsList []string // sorted list of capture group names
CapSize int // size of the capture array
FindFirstChar func(*Runner) bool // generated candidate search
Execute func(*Runner) error
StringPrefixFilter StringPrefixFilter // optional pre-decode candidate search for string input
}
type StringPrefixFilter ¶
StringPrefixFilter optionally searches string input before the engine decodes it to runes. It returns a byte index for a candidate match start, or ok=false if the regex cannot match. The filter must be conservative: false positives are allowed, false negatives are not.