redactor

package
v0.13.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2026 License: MIT Imports: 7 Imported by: 0

README

pkg/redactor

JSON-aware sensitive data redaction engine. Applies regex patterns to transcript content before upload, preserving JSON structure.

Files

File Role
redactor.go Core redaction engine: Redactor, Redact, RedactJSONL, JSON walking
types.go Config, Pattern type definitions

Two Pattern Modes

Value-based patterns

A regex applied to all JSON string values. Used for secrets with distinctive formats (e.g., sk-ant-api03-...). No FieldPattern set.

Field-based patterns

A regex on field names (e.g., password|secret|api_key). When a field name matches, the field's value is redacted. Optionally combined with a value Pattern for more precise matching.

Key API

redactor := redactor.NewFromConfig(redactionConfig)  // from pkg/config types
redacted := redactor.RedactJSONL(rawBytes)            // JSON-aware, line-by-line
redacted := redactor.Redact(plainText)                // text-only fallback
  • NewFromConfig(cfg) — Creates redactor from config. Includes default patterns if use_default_patterns is true. Returns nil if no patterns (callers must nil-check).
  • NewRedactor(cfg) — Lower-level constructor from redactor's own config type.
  • RedactJSONL([]byte) — Processes JSONL: parses each line as JSON, recursively walks the structure, redacts string values, re-serializes. Falls back to text-mode Redact() for invalid JSON lines.
  • Redact(input) — Plain text redaction. Only applies value-based patterns (field-based patterns need JSON context).

How to Extend

Adding a new built-in redaction pattern: Add to GetDefaultRedactionPatterns() in pkg/config/upload.go (not here). Patterns are part of the config system; this package is the engine. Order matters — more specific patterns should come before general ones to avoid partial matches.

Adding a new pattern type: Add a field to Pattern in types.go, handle it in redactStringValue(). Consider whether the existing value/field modes are truly insufficient.

Invariants

  • JSON structure must never be corrupted by redaction. The parse-walk-redact-serialize pipeline ensures this. Never apply regex replacement directly to raw JSON strings.
  • Redaction markers use [REDACTED:TYPE] format. The TYPE comes from the pattern's Type field. Must be consistent — the backend may parse these markers.
  • Field-based patterns only work in JSON context. They're skipped in plain text Redact() because there's no field name to match against.
  • Must handle lines up to 10MB. Uses a local maxLineSize constant (10MB, same value as types.MaxJSONLLineSize). Large tool results in transcripts can approach this limit.
  • Capture group redaction uses submatch byte indices, not string replacement, to avoid replacing repeated text elsewhere in the match.

Design Decisions

JSON-aware redaction. Naive regex replacement on raw JSON can break structure — e.g., replacing a value containing a quote character corrupts the JSON. The parse-walk-serialize approach is more work but guarantees valid output.

Patterns defined in config, not redactor. Default patterns live in pkg/config/upload.go because they're part of the user-facing configuration system. The redactor is a pure engine that takes compiled patterns as input. This separation allows custom patterns to be added via config without modifying the engine.

NewFromConfig returns nil when empty. Callers that receive nil skip redaction entirely, which is the correct behavior when redaction is disabled or has no patterns.

Testing

go test ./pkg/redactor/...
  • redactor_test.go — Core engine: JSON walking, value/field patterns, capture groups, edge cases
  • patterns_test.go — Verifies default patterns match 20+ secret formats (API keys, tokens, credentials)
  • config_test.go — Pattern compilation, use_default_patterns flag, config round-trip

Dependencies

Uses: pkg/config (redaction config types, default patterns)

Used by: pkg/sync/ (via FileTracker.ReadChunk), cmd/ (redaction-test command)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	Patterns []Pattern `json:"patterns"`
}

Config represents the redaction configuration

type Pattern

type Pattern struct {
	Name         string `json:"name"`
	Pattern      string `json:"pattern,omitempty"`
	Type         string `json:"type"`
	CaptureGroup int    `json:"capture_group,omitempty"`
	// FieldPattern is a regex that matches JSON field names. When set, only
	// values of matching fields are considered for redaction.
	FieldPattern string `json:"field_pattern,omitempty"`
}

Pattern represents a single redaction pattern.

There are two modes of operation:

  1. Value-based (FieldPattern empty): The Pattern regex is applied to all string values in the JSON. Use this for secrets with distinctive formats that can appear anywhere (API keys, tokens, etc.).
  1. Field-based (FieldPattern set): Only values of fields whose names match FieldPattern are redacted. The Pattern regex (if set) is applied to the field value; if Pattern is empty, the entire value is redacted.

type Redactor

type Redactor struct {
	// contains filtered or unexported fields
}

Redactor handles redaction of sensitive data

func NewFromConfig

func NewFromConfig(cfg *config.RedactionConfig) (*Redactor, error)

NewFromConfig creates a new Redactor from a config.RedactionConfig. Returns nil if cfg is nil or if no patterns are configured. Note: This function does NOT check cfg.Enabled - callers should check that. If UseDefaultPatterns is true (default), default patterns are included. Custom patterns from cfg.Patterns are added after default patterns.

func NewRedactor

func NewRedactor(cfg Config) (*Redactor, error)

NewRedactor creates a new Redactor from a config

func (*Redactor) Redact

func (r *Redactor) Redact(input string) string

Redact redacts sensitive data from a string using value-based patterns only. Field-based patterns are skipped since plain text has no field context.

func (*Redactor) RedactJSONL

func (r *Redactor) RedactJSONL(input []byte) []byte

RedactJSONL redacts sensitive data from JSONL content by parsing each line, recursively redacting string values, and re-serializing. This ensures JSON structure is never corrupted by redaction patterns.

func (*Redactor) RedactJSONLine

func (r *Redactor) RedactJSONLine(line string) string

RedactJSONLine redacts a single JSON line, parsing it and applying redaction to string values only. Returns the redacted JSON. If the input is not valid JSON, falls back to text-based redaction.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL