matcher

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 4, 2026 License: BSD-3-Clause Imports: 6 Imported by: 0

Documentation

Overview

Package matcher composes DNS rule matchers into a single match interface.

By default the package uses a hybrid representation: patterns without wildcards are stored in a hash-based suffix map for O(k) lookup (k = number of DNS labels), while patterns containing '*' are compiled into a minimized DFA for O(n) matching (n = input length). This keeps compilation fast for the vast majority of rules (typically 99%+ are literal) while retaining full wildcard support.

Callers can also request a pure DFA representation that compiles every rule into one automaton. In both modes the ||domain^ anchoring semantics are preserved: a rule for "example.com" matches both the exact domain and any subdomain such as "www.example.com".

DFA patterns are compiled in reversed form and matched against the input walked from right to left. The reversal naturally anchors the pattern at the domain suffix, and the automaton.DFA.MatchDomain method records a hit whenever an accepting state coincides with a DNS label boundary ('.' or start-of-name). This replaces the earlier approach of generating synthetic "*.<pattern>" variants, which caused exponential DFA state growth when the original pattern already contained wildcards.

Example usage:

m, err := matcher.CompileRules(rules, matcher.CompileOptions{MaxStates: 200000})
if err != nil { ... }
matched, ruleIDs := m.Match("ads.example.com")

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CompileOptions

type CompileOptions struct {
	MaxStates      int
	CompileTimeout time.Duration
	Minimize       *bool
	Mode           Mode
	Logger         Logger
}

CompileOptions controls compilation behavior.

MaxStates limits the DFA state count during subset construction (0 = no limit). CompileTimeout aborts compilation if the deadline is exceeded. Minimize can be set to false to skip Hopcroft minimization (defaults to true). Mode selects the runtime representation for compiled rules. The zero value defaults to ModeHybrid. Logger receives progress messages; nil silences output.

type Logger

type Logger interface {
	Infof(format string, args ...any)
}

Logger receives progress messages during compilation. Callers typically pass their watcher or plugin logger here.

type Matcher

type Matcher struct {
	// contains filtered or unexported fields
}

Matcher holds the compiled runtime structures for one ruleset. Depending on the selected mode it may use a suffix map, a DFA, or both. It is safe for concurrent reads after construction.

func CompileRules

func CompileRules(rules []listparser.Rule, opts CompileOptions) (*Matcher, error)

CompileRules compiles parsed rules into the runtime representation selected by opts.Mode.

The rules parameter contains parsed filter list entries. Each rule's Pattern field is inspected and lowercased. In the default hybrid mode, patterns without '*' are treated as literal domain suffixes and stored in the suffix map, while patterns containing '*' are compiled through the automaton package. In pure DFA mode, every rule is compiled through the automaton. Rule indices are used as rule IDs for match attribution.

All DFA patterns are stored in reversed form so that automaton.DFA.MatchDomain can walk the input backwards and check acceptance at DNS label boundaries, preserving ||domain^ semantics without extra synthetic patterns.

Returns an error if DFA compilation fails (e.g. state limit exceeded or timeout). An empty rule set produces a valid Matcher that never matches.

func (*Matcher) LiteralCount

func (m *Matcher) LiteralCount() int

LiteralCount returns the number of distinct literal domain patterns in the compiled rule set, regardless of whether they are backed by the suffix map or expanded into the DFA.

func (*Matcher) Match

func (m *Matcher) Match(input string) (matched bool, ruleIDs []uint32)

Match checks input against both the literal suffix map and the wildcard DFA.

The input parameter should be a domain name without trailing dot. Returns true if any stored pattern matches, along with all matching rule IDs. The suffix map is checked first; if both literal and wildcard patterns match, all rule IDs are combined.

In pure DFA mode (no suffix map) the [byteIndex] table inside the automaton handles case-insensitivity, so strings.ToLower is skipped and the automaton.DFA.MatchDomain result is returned directly — zero allocation.

The DFA was compiled from reversed patterns, so Match delegates to automaton.DFA.MatchDomain which walks the input backwards and checks acceptance at DNS label boundaries.

func (*Matcher) StateCount

func (m *Matcher) StateCount() int

StateCount returns the number of DFA states held by the matcher. Returns 0 when no DFA was compiled.

type Mode

type Mode string

Mode selects how the matcher represents parsed rules at runtime.

ModeHybrid stores literal rules in a suffix map and compiles only wildcard rules into a DFA. ModeDFA compiles every rule into one DFA. Both modes preserve ||domain^ suffix semantics through reversed-pattern DFA matching; see automaton.DFA.MatchDomain.

const (
	// ModeHybrid keeps literal domains in the suffix map and uses the DFA only
	// for wildcard patterns. This is the default mode.
	ModeHybrid Mode = "hybrid"

	// ModeDFA compiles all rules into a single DFA, including literal domain
	// rules expanded to cover both the exact host and its subdomains.
	ModeDFA Mode = "dfa"
)

func ParseMode

func ParseMode(value string) (Mode, error)

ParseMode validates a textual matcher mode and returns its canonical form.

Accepted values are "hybrid", "dfa", and "pure_dfa". The latter is kept as a compatibility alias and normalizes to ModeDFA.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL