Documentation
¶
Overview ¶
Package pii — context helpers for PII rehydration.
Package pii provides PII redaction for text before storage in memory and vector stores. It delegates to pii-shield's entropy-based scanner which combines Shannon entropy analysis, English bigram scoring, Luhn credit card validation, context-aware key detection, and deterministic HMAC hashing — significantly more robust than static regex.
Index ¶
- func ContainsPII(text string) bool
- func Mask(s string, keepChars int) string
- func Redact(text string) string
- func RedactMap(metadata map[string]string) map[string]string
- func RedactWithPairs(text string) (redacted string, pairs []string)
- func RedactWithReplacer(text string) (redacted string, replacer *strings.Replacer)
- func ReplacerFromContext(ctx context.Context) *strings.Replacer
- func WithReplacer(ctx context.Context, r *strings.Replacer) context.Context
- type Config
- type CustomRegexRule
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ContainsPII ¶
ContainsPII returns true if redaction would modify the text.
func Mask ¶
Mask replaces the middle of a string with asterisks, keeping the first and last n characters visible.
func RedactWithPairs ¶
RedactWithPairs redacts PII and returns the redacted string plus (placeholder, original) pairs so callers can merge multiple messages' pairs into one Replacer for rehydration. Pairs are old1, new1, old2, new2, ...
func RedactWithReplacer ¶
RedactWithReplacer redacts PII in text and returns the redacted string plus a *strings.Replacer that can reverse individual [HIDDEN:hash] → original mappings. Call replacer.Replace(llmOutput) in AfterModel to rehydrate.
It works by diffing the original and redacted texts positionally: both are split on whitespace/punctuation boundaries and matched token-by-token. Where a token changed to a [HIDDEN:*] placeholder, that mapping is recorded.
func ReplacerFromContext ¶
ReplacerFromContext returns the replacer from ctx, or nil. Used by toolwrap to rehydrate tool-call arguments before execution, and by AfterModel to rehydrate assistant response content.
func WithReplacer ¶
WithReplacer stores the given replacer in ctx. Used by model BeforeModel callbacks so tool execution and AfterModel can rehydrate [HIDDEN:hash] back to original values (e.g. email addresses in email_send arguments).
Types ¶
type Config ¶
type Config struct {
// Salt is the HMAC key used for deterministic hashing of redacted values.
// Same input + same salt → same [HIDDEN:hash] output, enabling log
// correlation without exposing PII. Must be ≥16 bytes for security.
// If empty, a cryptographically random salt is generated at startup
// (hashes will differ across restarts).
Salt string `yaml:"salt,omitempty" toml:"salt,omitempty"`
// EntropyThreshold is the Shannon entropy score above which a token is
// considered a potential secret. Lower = more aggressive (more redaction,
// more false positives). Higher = more permissive. Default: 4.2.
// Range: 2.0 (very aggressive) to 5.0 (very permissive).
EntropyThreshold float64 `yaml:"entropy_threshold,omitempty" toml:"entropy_threshold,omitempty,omitzero"`
// MinSecretLength is the minimum character length for a token to be
// considered as a potential secret. Tokens shorter than this are never
// redacted (unless they are values of sensitive keys). Default: 12.
MinSecretLength int `yaml:"min_secret_length,omitempty" toml:"min_secret_length,omitempty,omitzero"`
// SensitiveKeys is a list of key names whose values should always be
// redacted regardless of entropy score. Case-insensitive matching.
// Default: ["pass", "secret", "token", "key", "cvv", "cvc", "auth",
// "sign", "password", "passwd", "api_key", "apikey",
// "access_token", "client_secret"]
SensitiveKeys []string `yaml:"sensitive_keys,omitempty" toml:"sensitive_keys,omitempty"`
// CustomRegexes is a list of custom regex patterns for deterministic
// PII detection. Each rule has a pattern and a name. Matched tokens
// are redacted as [HIDDEN:name].
// Example: [{"pattern": "\\bGHSA-[A-Za-z0-9-]+\\b", "name": "github_advisory"}]
CustomRegexes []CustomRegexRule `yaml:"custom_regexes,omitempty" toml:"custom_regexes,omitempty"`
// SafeRegexes is a allowlist of regex patterns. Tokens matching any
// of these are never redacted, even if they exceed the entropy threshold.
// Useful for known-safe patterns like version strings or build hashes.
SafeRegexes []CustomRegexRule `yaml:"safe_regexes,omitempty" toml:"safe_regexes,omitempty"`
}
Config holds PII redaction configuration that maps to pii-shield's scanner.Config. Only the fields that make sense for application-level tuning are exposed. Advanced internal fields (bigram scores, adaptive baseline samples) are left at their defaults.
func DefaultConfig ¶
func DefaultConfig() Config
type CustomRegexRule ¶
type CustomRegexRule struct {
Pattern string `yaml:"pattern,omitempty" toml:"pattern,omitempty" json:"pattern"`
Name string `yaml:"name,omitempty" toml:"name,omitempty" json:"name"`
}
CustomRegexRule represents a named regex pattern for PII detection.