Documentation
¶
Overview ¶
Package automaton compiles domain filter patterns into a cache-optimized, array-based deterministic finite automaton (DFA) with rule attribution.
Compilation Pipeline ¶
The pipeline follows classic automaton theory in five stages:
- Thompson NFA construction — each pattern becomes a small NFA (nfa.go)
- NFA combination — individual NFAs merge via epsilon fan-out (nfa.go)
- Subset (powerset) construction — NFA → deterministic DFA (subset.go)
- Hopcroft minimization — merge equivalent DFA states (minimize.go)
- Runtime DFA — compact transition tables plus exported state graph (dfa.go)
Pattern Language ¶
The supported pattern language is intentionally small:
- Literal characters: a-z, 0-9, '-', '.'
- Wildcard: '*' matches zero or more DNS characters
- Patterns are implicitly anchored (full match required)
Reversed-Pattern Domain Matching ¶
Callers that need ||domain^ anchoring semantics (matching a domain and all of its subdomains) should store patterns in reversed form and query the DFA through DFA.MatchDomain instead of DFA.Match. MatchDomain walks the input name from right to left and records a hit whenever an accepting state coincides with a DNS label boundary ('.' or start-of-name). This replaces the older approach of generating synthetic "*.<pattern>" variants, which caused exponential DFA state growth when the pattern already contained wildcards.
Performance Design ¶
Every design choice in the runtime DFA favors O(n) match-time performance:
- Transition entries embed the target's accept flag in bit 31, avoiding a separate accept-array load at domain-boundary positions
- [byteIndex] maps both upper- and lowercase ASCII letters to the same index, making Match/MatchDomain inherently case-insensitive
- Accept flags and rule IDs are split into dense side arrays for locality
- DFA.Match and DFA.MatchDomain use byte-indexed hot loops
- NFA states use bit-packed flags to improve cache utilization
File Organization ¶
- alphabet.go — DNS character ↔ transition-index mapping
- nfa.go — Thompson NFA types and ε-closure
- subset.go — powerset (subset) construction
- minimize.go — Hopcroft partition-refinement minimization
- dfa.go — exported DFA type, DFA.Match and DFA.MatchDomain
- dot.go — Graphviz DOT export ([DFA.DumpDot])
- compile.go — pipeline orchestration (Compile)
Example ¶
dfa, err := automaton.Compile(patterns, automaton.CompileOptions{})
matched, ruleIDs := dfa.Match("ads.example.com")
Index ¶
Constants ¶
const AlphabetSize = 38
AlphabetSize is the number of distinct characters in the DNS transition alphabet: the 26 lowercase letters (a-z), 10 digits (0-9), hyphen ('-'), and dot ('.').
Every runtime transition row and every intermediate state during compilation is sized to exactly AlphabetSize slots, so changing this constant reshapes the entire automaton. The value 38 keeps each transition row small and dense, which helps the DFA's flat tables remain cache-friendly.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CompileOptions ¶
type CompileOptions struct {
// MaxStates limits the number of intermediate DFA states during subset
// construction. When exceeded, Compile returns an error. 0 means no limit.
MaxStates int
// Minimize enables Hopcroft minimization after subset construction.
// nil (default) and *true both enable it; only *false disables it.
Minimize *bool
// CompileTimeout is the maximum wall-clock time allowed for the entire
// compilation. Zero means no timeout. Checked between pipeline stages
// and inside tight loops.
CompileTimeout time.Duration
// Logger receives progress messages during compilation. May be nil.
Logger Logger
}
CompileOptions controls the DFA compilation pipeline.
Zero-value defaults are safe: minimization is enabled, no state cap, no timeout, and no logging.
type DFA ¶
type DFA struct {
// contains filtered or unexported fields
}
DFA is a compiled deterministic finite automaton for domain name matching.
The runtime layout encodes each transition as a uint32 with the target state index in bits 0–30 and the target's accept flag in bit 31. [noTransitionState] (0xFFFFFFFF) represents "no transition". Embedding the accept flag lets DFA.MatchDomain check acceptance from the same value it already loaded for the transition — no separate array access needed at domain-boundary positions.
[byteIndex] maps both upper- and lowercase ASCII letters to the same alphabet index, making DFA.Match and DFA.MatchDomain inherently case-insensitive. Callers need not lowercase their input.
Create a DFA with Compile; query it with DFA.Match.
func Compile ¶
func Compile(patterns []Pattern, opts CompileOptions) (*DFA, error)
Compile transforms a set of filter patterns into a minimized, pointer-based DFA ready for repeated DFA.Match calls.
Each Pattern carries a lowercase expression string and a caller-assigned rule ID that is preserved in accept states for match attribution.
The pipeline is:
- Build one Thompson NFA per pattern ([buildPatternNFA])
- Combine all NFAs via epsilon fan-out ([combineNFAs])
- Subset (powerset) construction → deterministic DFA ([subsetConstruction])
- Optionally minimize via Hopcroft's algorithm ([hopcroftMinimize])
- Convert to the exported pointer-based DFA
On failure Compile returns an error describing invalid patterns, timeout exhaustion, or MaxStates violations. An empty pattern slice produces a valid but empty DFA that matches nothing.
func (*DFA) Match ¶
Match checks whether input is accepted by the compiled DFA and returns the matching rule IDs.
The input parameter should be a DNS domain name. Case is handled transparently by [byteIndex]. Match traverses the DFA one byte at a time in O(n) where n = len(input). It returns false immediately when a byte falls outside the DNS alphabet or leads to a dead end.
A nil or empty DFA always returns (false, nil).
func (*DFA) MatchDomain ¶
MatchDomain checks whether the DFA — compiled from reversed patterns — matches the given domain name with ||domain^ anchoring semantics.
Instead of reversing the input string (which would allocate), MatchDomain walks the input bytes from right to left, feeding them into the DFA one character at a time. This is equivalent to feeding the reversed input in forward order.
A match is recorded whenever the DFA reaches an accepting state at a domain boundary, defined as:
- the beginning of the input (i == 0, meaning the entire name matched), or
- immediately after a '.' label separator (input[i-1] == '.').
This boundary logic replaces the old approach of generating synthetic "*.<pattern>" variants for every rule, which caused exponential DFA state explosion when the original pattern already contained wildcards.
Each d.trans entry stores the target state index in bits 0–30 and the accept flag in bit 31, so boundary accept checks read the already-loaded transition value — no separate d.accept array access required.
The input parameter should be a DNS domain name without trailing dot. Case is handled transparently by [byteIndex]. A nil or empty DFA always returns (false, nil).
Typical callers are [matcher.Matcher.Match] and other high-level entry points that compile reversed patterns via Compile. For raw exact-match semantics on non-reversed patterns, use DFA.Match instead.
func (*DFA) StateCount ¶
StateCount reports how many DFA states are currently allocated.
Returns 0 for a nil receiver or an empty DFA. Callers typically use this for metrics, diagnostics, and capacity planning.
type Logger ¶
Logger receives progress messages during DFA compilation.
Callers such as the watcher and CoreDNS plugin pass their logger here so operators see what is happening during potentially long compilations. A nil Logger is safe — Compile silences output in that case.
type Pattern ¶
type Pattern struct {
Expr string // canonical pattern (lowercase DNS chars and '*')
RuleID uint32 // caller-assigned identifier for match attribution
}
Pattern pairs a canonical filter expression with a caller-assigned rule ID.
Expr is a lowercase DNS pattern using the supported alphabet (a-z, 0-9, '-', '.', '*'). RuleID is preserved through compilation into the DFA's accept states so callers can trace a match back to its originating filter rule.