Documentation
¶
Index ¶
- Constants
- Variables
- func PIIEntitiesToCanonical(entities []PIIEntity) []*entity.CanonicalEntity
- func RecordEnrichmentAttempt(ctx context.Context, entityType string)
- func RecordEnrichmentAttribute(ctx context.Context, attrName, attrValue string)
- func RecordEnrichmentFallbackUnknown(ctx context.Context, entityType string)
- func RecordPIIDetection(ctx context.Context, piiType, direction, action string)
- func RecordPIIRedaction(ctx context.Context, piiType, direction string)
- func WithPIIDirection(ctx context.Context, direction string) context.Context
- type Classification
- type EnrichmentConfig
- type EnrichmentPolicy
- type LanguageContext
- type PIIEntity
- type PIIPattern
- type PatternConfig
- type RecognizerConfig
- type RecognizerFile
- type Scanner
- type ScannerOption
- func WithCustomRecognizers(recognizers []RecognizerConfig) ScannerOption
- func WithDisabledEntities(entities []string) ScannerOption
- func WithEnabledEntities(entities []string) ScannerOption
- func WithMinScore(score float64) ScannerOption
- func WithPatternFile(path string) ScannerOption
- func WithSemanticEnrichment(enricher enrich.Enricher, config *EnrichmentConfig, policy EnrichmentPolicy) ScannerOption
Constants ¶
const ( // PIIDirectionRequest marks PII scanning on inbound/request content. PIIDirectionRequest = "request" // PIIDirectionResponse marks PII scanning on outbound/response content. PIIDirectionResponse = "response" )
const ( // DefaultMinScore is the Presidio-compatible minimum confidence threshold. // Matches below this score are discarded unless boosted by context words. DefaultMinScore = 0.5 // ContextSimilarityFactor is the score boost applied when context words are // found near a match. Matches Presidio's default context_similarity_factor. ContextSimilarityFactor = 0.35 // ContextWindowChars is the number of characters to search before and after // a match when looking for context words. ContextWindowChars = 100 )
Variables ¶
var EUPatterns []PIIPattern
EUPatterns is the compiled default pattern set, built at init time from the embedded YAML. Kept for backward compatibility with code that references this variable directly.
var IBANLengths = map[string]int{
"AT": 20, "BE": 16, "BG": 22, "CY": 28, "CZ": 24,
"DE": 22, "DK": 18, "EE": 20, "ES": 24, "FI": 18,
"FR": 27, "GB": 22, "GR": 27, "HR": 21, "HU": 28,
"IE": 22, "IT": 27, "LT": 20, "LU": 20, "LV": 21,
"MT": 31, "NL": 18, "PL": 28, "PT": 25, "RO": 24,
"SE": 24, "SI": 19, "SK": 24,
}
IBANLengths maps EU+UK country codes to their exact IBAN character length (ISO 13616). Used by ValidateIBAN to reject strings that match the IBAN regex but have the wrong length for their country (e.g. VAT IDs like DE123456789 are 11 chars, not DE's 22).
Functions ¶
func PIIEntitiesToCanonical ¶
func PIIEntitiesToCanonical(entities []PIIEntity) []*entity.CanonicalEntity
PIIEntitiesToCanonical converts a slice of PIIEntity from the scanner to detector-agnostic canonical entities for the enrichment pipeline. Ids are assigned sequentially (1-based). Source is set to entity.SourceCustom.
func RecordEnrichmentAttempt ¶
RecordEnrichmentAttempt records one enrichment attempt for an entity type.
func RecordEnrichmentAttribute ¶
RecordEnrichmentAttribute records one attribute emitted (e.g. gender=female, scope=city).
func RecordEnrichmentFallbackUnknown ¶
RecordEnrichmentFallbackUnknown records fallback to unknown for an attribute.
func RecordPIIDetection ¶
RecordPIIDetection increments the PII detection counter per entity type.
func RecordPIIRedaction ¶
RecordPIIRedaction increments the PII redaction counter per entity type.
Types ¶
type Classification ¶
type Classification struct {
HasPII bool `json:"has_pii"`
Entities []PIIEntity `json:"entities"`
Tier int `json:"tier"` // 0-2
Redacted string `json:"redacted,omitempty"`
}
Classification holds the result of PII scanning.
type EnrichmentConfig ¶
type EnrichmentConfig struct {
Enabled bool
Mode string // off | shadow | enforce
AllowedAttributes []string // e.g. ["gender", "scope"]
ConfidenceThreshold float64
EmitUnknownAttributes bool
DefaultPersonGender string
DefaultLocationScope string
PreserveTitles bool
}
EnrichmentConfig holds semantic enrichment settings. Callers (e.g. runner) populate this from policy; classifier does not depend on policy package.
type EnrichmentPolicy ¶
type EnrichmentPolicy interface {
EmitAttributes(ctx context.Context, mode string, allowed []string, entityType string, attrs map[string]string) []string
}
EnrichmentPolicy is implemented by the caller (e.g. policy engine adapter) to decide which attributes may be emitted for an entity. Classifier does not import policy package.
type LanguageContext ¶
type LanguageContext struct {
Language string `yaml:"language" json:"language"`
Context []string `yaml:"context,omitempty" json:"context,omitempty"`
}
LanguageContext holds context words for a specific language.
type PIIEntity ¶
type PIIEntity struct {
Type string `json:"type"`
Value string `json:"value"`
Position int `json:"position"`
Confidence float64 `json:"confidence"`
Sensitivity int `json:"sensitivity"` // 1-3 from recognizer; 0 means unset (treated as 1 for tiering)
}
PIIEntity represents a detected PII instance.
type PIIPattern ¶
type PIIPattern struct {
Name string
Type string
Pattern *regexp.Regexp
Countries []string
Sensitivity int // 1-3, higher = more sensitive
Score float64 // base confidence from YAML (Presidio-compatible)
ContextWords []string // merged from all supported_languages[].context
ValidateLuhn bool // Talon extension: ISO/IEC 7812 checksum gate
ValidateIBAN bool // Talon extension: ISO 13616 MOD-97 + country length gate
ValidateBSN bool // Talon extension: Dutch BSN 11-test
ValidatePESEL bool // Talon extension: Polish PESEL check digit
}
PIIPattern represents a compiled, ready-to-use PII detection pattern.
func CompilePIIPatterns ¶
func CompilePIIPatterns(recognizers []RecognizerConfig) ([]PIIPattern, error)
CompilePIIPatterns converts a list of recognizer configs into the compiled []PIIPattern slice used by the Scanner at runtime. Disabled recognizers are skipped. Each regex pattern in a recognizer produces one PIIPattern entry, with the entity type normalized to the lower_snake_case used internally.
type PatternConfig ¶
type PatternConfig struct {
Name string `yaml:"name" json:"name"`
Regex string `yaml:"regex" json:"regex"`
Score *float64 `yaml:"score,omitempty" json:"score,omitempty"`
}
PatternConfig is a single regex pattern within a recognizer. Score is optional; when omitted (nil), DefaultMinScore is used at compile time so that custom patterns are not filtered out by the scanner's minScore threshold.
type RecognizerConfig ¶
type RecognizerConfig struct {
Name string `yaml:"name" json:"name"`
SupportedEntity string `yaml:"supported_entity" json:"supported_entity"`
Enabled *bool `yaml:"enabled,omitempty" json:"enabled,omitempty"`
Patterns []PatternConfig `yaml:"patterns,omitempty" json:"patterns,omitempty"`
SupportedLanguages []LanguageContext `yaml:"supported_languages,omitempty" json:"supported_languages,omitempty"`
DenyList []string `yaml:"deny_list,omitempty" json:"deny_list,omitempty"`
DenyListScore float64 `yaml:"deny_list_score,omitempty" json:"deny_list_score,omitempty"`
// Talon extensions (safe to include — Presidio ignores unknown fields)
Sensitivity int `yaml:"sensitivity,omitempty" json:"sensitivity,omitempty"`
Countries []string `yaml:"countries,omitempty" json:"countries,omitempty"`
ValidateLuhn bool `yaml:"validate_luhn,omitempty" json:"validate_luhn,omitempty"`
ValidateIBAN bool `yaml:"validate_iban,omitempty" json:"validate_iban,omitempty"`
ValidateBSN bool `yaml:"validate_bsn,omitempty" json:"validate_bsn,omitempty"`
ValidatePESEL bool `yaml:"validate_pesel,omitempty" json:"validate_pesel,omitempty"`
// Injection-specific extension (used by attachment scanner only)
Severity int `yaml:"severity,omitempty" json:"severity,omitempty"`
}
RecognizerConfig mirrors Presidio's YAML recognizer schema with Talon extensions.
func DefaultRecognizers ¶
func DefaultRecognizers() ([]RecognizerConfig, error)
DefaultRecognizers returns the built-in PII recognizers parsed from the embedded pii_eu.yaml file. This is the first layer in the merge chain.
func FilterByEntities ¶
func FilterByEntities(recognizers []RecognizerConfig, enabledEntities, disabledEntities []string) []RecognizerConfig
FilterByEntities applies enabled/disabled entity filters to a recognizer list. If enabledEntities is non-empty, only recognizers with matching supported_entity are kept (whitelist). Then any recognizer in disabledEntities is removed (blacklist).
func MergeRecognizers ¶
func MergeRecognizers(layers ...[]*RecognizerConfig) []RecognizerConfig
MergeRecognizers performs a 3-layer merge: defaults, then global overrides, then per-agent overrides. Later layers override earlier ones by matching on the recognizer Name field. New recognizers are appended.
type RecognizerFile ¶
type RecognizerFile struct {
Recognizers []RecognizerConfig `yaml:"recognizers"`
}
RecognizerFile is the top-level YAML structure for a recognizer config file. Mirrors Presidio's recognizer registry YAML format.
func LoadRecognizerFile ¶
func LoadRecognizerFile(path string) (*RecognizerFile, error)
LoadRecognizerFile reads and parses a recognizer YAML file from disk. Returns nil (not an error) if the file does not exist, so callers can treat a missing global config as a no-op.
func ParseRecognizerFile ¶
func ParseRecognizerFile(data []byte) (*RecognizerFile, error)
ParseRecognizerFile parses recognizer YAML bytes into a RecognizerFile.
type Scanner ¶
type Scanner struct {
// contains filtered or unexported fields
}
Scanner detects PII in text using configurable regex patterns. Optional semantic enrichment: when Enricher, EnrichmentConfig, and EnrichmentPolicy are set and config.Enabled and config.Mode != "off", Redact uses enriched placeholders.
func MustNewScanner ¶
func MustNewScanner(opts ...ScannerOption) *Scanner
MustNewScanner is like NewScanner but panics on error. Useful for zero-config startup where the embedded defaults are expected to always compile.
func NewScanner ¶
func NewScanner(opts ...ScannerOption) (*Scanner, error)
NewScanner creates a PII scanner. Without options it uses the embedded EU defaults. Options layer global overrides and per-agent customization on top.
func (*Scanner) Redact ¶
Redact replaces PII with type-based placeholders (e.g. "[EMAIL]"). Uses Scan() for validated detection, then position-based replacement to handle overlapping patterns correctly.
func (*Scanner) Scan ¶
func (s *Scanner) Scan(ctx context.Context, text string) *Classification
Scan analyzes text for PII and returns a classification result. Each match goes through hard validation gates (IBAN checksum/length, Luhn) and then Presidio-style score-based context filtering before being accepted.
type ScannerOption ¶
type ScannerOption func(*scannerConfig)
ScannerOption configures a Scanner via the functional options pattern.
func WithCustomRecognizers ¶
func WithCustomRecognizers(recognizers []RecognizerConfig) ScannerOption
WithCustomRecognizers adds per-agent custom recognizer definitions.
func WithDisabledEntities ¶
func WithDisabledEntities(entities []string) ScannerOption
WithDisabledEntities sets a blacklist of entity types to exclude.
func WithEnabledEntities ¶
func WithEnabledEntities(entities []string) ScannerOption
WithEnabledEntities sets a whitelist of entity types. When non-empty, only recognizers with a matching supported_entity will be active.
func WithMinScore ¶
func WithMinScore(score float64) ScannerOption
WithMinScore overrides the default minimum confidence threshold for matches.
func WithPatternFile ¶
func WithPatternFile(path string) ScannerOption
WithPatternFile loads additional recognizers from a global patterns.yaml file. If the file does not exist, it is silently skipped.
func WithSemanticEnrichment ¶
func WithSemanticEnrichment(enricher enrich.Enricher, config *EnrichmentConfig, policy EnrichmentPolicy) ScannerOption
WithSemanticEnrichment enables semantic enrichment of PII placeholders (e.g. gender, scope). When set, Redact may produce <PII type="..." id="..." .../> when config.Mode is "enforce".
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package entity provides a detector-agnostic canonical representation of PII entities for use by the semantic enricher and placeholder renderer.
|
Package entity provides a detector-agnostic canonical representation of PII entities for use by the semantic enricher and placeholder renderer. |