pii

package
v0.1.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2026 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Overview

Package pii — context helpers for PII rehydration.

Package pii provides PII redaction for text before storage in memory and vector stores. It delegates to pii-shield's entropy-based scanner which combines Shannon entropy analysis, English bigram scoring, Luhn credit card validation, context-aware key detection, and deterministic HMAC hashing — significantly more robust than static regex.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ContainsPII

func ContainsPII(text string) bool

ContainsPII returns true if redaction would modify the text.

func Mask

func Mask(s string, keepChars int) string

Mask replaces the middle of a string with asterisks, keeping the first and last n characters visible.

func Redact

func Redact(text string) string

Redact replaces PII and secrets in text with deterministic HMAC hashes.

func RedactMap

func RedactMap(metadata map[string]string) map[string]string

RedactMap applies redaction to all string values in a metadata map.

func RedactWithPairs

func RedactWithPairs(text string) (redacted string, pairs []string)

RedactWithPairs redacts PII and returns the redacted string plus (placeholder, original) pairs so callers can merge multiple messages' pairs into one Replacer for rehydration. Pairs are old1, new1, old2, new2, ...

func RedactWithReplacer

func RedactWithReplacer(text string) (redacted string, replacer *strings.Replacer)

RedactWithReplacer redacts PII in text and returns the redacted string plus a *strings.Replacer that can reverse individual [HIDDEN:hash] → original mappings. Call replacer.Replace(llmOutput) in AfterModel to rehydrate.

It works by diffing the original and redacted texts positionally: both are split on whitespace/punctuation boundaries and matched token-by-token. Where a token changed to a [HIDDEN:*] placeholder, that mapping is recorded.

func ReplacerFromContext

func ReplacerFromContext(ctx context.Context) *strings.Replacer

ReplacerFromContext returns the replacer from ctx, or nil. Used by toolwrap to rehydrate tool-call arguments before execution, and by AfterModel to rehydrate assistant response content.

func WithReplacer

func WithReplacer(ctx context.Context, r *strings.Replacer) context.Context

WithReplacer stores the given replacer in ctx. Used by model BeforeModel callbacks so tool execution and AfterModel can rehydrate [HIDDEN:hash] back to original values (e.g. email addresses in email_send arguments).

Types

type Config

type Config struct {
	// Salt is the HMAC key used for deterministic hashing of redacted values.
	// Same input + same salt → same [HIDDEN:hash] output, enabling log
	// correlation without exposing PII. Must be ≥16 bytes for security.
	// If empty, a cryptographically random salt is generated at startup
	// (hashes will differ across restarts).
	Salt string `yaml:"salt,omitempty" toml:"salt,omitempty"`

	// EntropyThreshold is the Shannon entropy score above which a token is
	// considered a potential secret. Lower = more aggressive (more redaction,
	// more false positives). Higher = more permissive. Default: 4.2.
	// Range: 2.0 (very aggressive) to 5.0 (very permissive).
	EntropyThreshold float64 `yaml:"entropy_threshold,omitempty" toml:"entropy_threshold,omitempty,omitzero"`

	// MinSecretLength is the minimum character length for a token to be
	// considered as a potential secret. Tokens shorter than this are never
	// redacted (unless they are values of sensitive keys). Default: 12.
	MinSecretLength int `yaml:"min_secret_length,omitempty" toml:"min_secret_length,omitempty,omitzero"`

	// SensitiveKeys is a list of key names whose values should always be
	// redacted regardless of entropy score. Case-insensitive matching.
	// Default: ["pass", "secret", "token", "key", "cvv", "cvc", "auth",
	//           "sign", "password", "passwd", "api_key", "apikey",
	//           "access_token", "client_secret"]
	SensitiveKeys []string `yaml:"sensitive_keys,omitempty" toml:"sensitive_keys,omitempty"`

	// CustomRegexes is a list of custom regex patterns for deterministic
	// PII detection. Each rule has a pattern and a name. Matched tokens
	// are redacted as [HIDDEN:name].
	// Example: [{"pattern": "\\bGHSA-[A-Za-z0-9-]+\\b", "name": "github_advisory"}]
	CustomRegexes []CustomRegexRule `yaml:"custom_regexes,omitempty" toml:"custom_regexes,omitempty"`

	// SafeRegexes is a allowlist of regex patterns. Tokens matching any
	// of these are never redacted, even if they exceed the entropy threshold.
	// Useful for known-safe patterns like version strings or build hashes.
	SafeRegexes []CustomRegexRule `yaml:"safe_regexes,omitempty" toml:"safe_regexes,omitempty"`
}

Config holds PII redaction configuration that maps to pii-shield's scanner.Config. Only the fields that make sense for application-level tuning are exposed. Advanced internal fields (bigram scores, adaptive baseline samples) are left at their defaults.

func DefaultConfig

func DefaultConfig() Config

func (Config) Apply

func (c Config) Apply()

Apply pushes this config into pii-shield's global scanner state. Should be called once during application startup, after config loading. If fields are zero-valued, pii-shield's defaults are preserved.

type CustomRegexRule

type CustomRegexRule struct {
	Pattern string `yaml:"pattern,omitempty" toml:"pattern,omitempty" json:"pattern"`
	Name    string `yaml:"name,omitempty" toml:"name,omitempty" json:"name"`
}

CustomRegexRule represents a named regex pattern for PII detection.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL