redact

package
v0.123.16 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 10, 2025 License: Apache-2.0 Imports: 25 Imported by: 12

Documentation

Index

Constants

View Source
const (
	MASK_TEXT = "***HIDDEN***"
)

Variables

View Source
var NEW_LINE = []byte{'\n'}

Functions

func DisableTokenization added in v0.123.0

func DisableTokenization()

DisableTokenization disables tokenization on the global tokenizer

func EnableTokenization added in v0.123.0

func EnableTokenization()

EnableTokenization enables tokenization on the global tokenizer

func Redact

func Redact(input io.Reader, path string, additionalRedactors []*troubleshootv1beta2.Redact) (io.Reader, error)

func ResetGlobalTokenizer added in v0.123.0

func ResetGlobalTokenizer()

ResetGlobalTokenizer resets the global tokenizer instance (useful for testing)

func ResetRedactionList added in v0.9.34

func ResetRedactionList()

func ValidateRedactionMapFile added in v0.123.0

func ValidateRedactionMapFile(filePath string) error

ValidateRedactionMapFile validates the structure and integrity of a redaction map file

Types

type CacheStats added in v0.123.0

type CacheStats struct {
	Hits   int64 `json:"hits"`   // cache hits
	Misses int64 `json:"misses"` // cache misses
	Total  int64 `json:"total"`  // total lookups
}

CacheStats tracks tokenizer cache performance

type CorrelationGroup added in v0.123.0

type CorrelationGroup struct {
	Pattern     string    `json:"pattern"`     // correlation pattern identifier
	Description string    `json:"description"` // human-readable description
	Tokens      []string  `json:"tokens"`      // tokens involved in correlation
	Files       []string  `json:"files"`       // files where correlation was found
	Confidence  float64   `json:"confidence"`  // confidence score (0.0-1.0)
	DetectedAt  time.Time `json:"detectedAt"`  // when correlation was detected
}

CorrelationGroup represents correlated secret patterns across files

type DuplicateGroup added in v0.123.0

type DuplicateGroup struct {
	SecretHash string    `json:"secretHash"` // hash of the normalized secret
	Token      string    `json:"token"`      // the token used for this secret
	SecretType string    `json:"secretType"` // classified type of the secret
	Locations  []string  `json:"locations"`  // file paths where this secret was found
	Count      int       `json:"count"`      // total occurrences
	FirstSeen  time.Time `json:"firstSeen"`  // when first detected
	LastSeen   time.Time `json:"lastSeen"`   // when last detected
}

DuplicateGroup represents a group of identical secrets found in different locations

type FileStats added in v0.123.0

type FileStats struct {
	FilePath     string         `json:"filePath"`
	SecretsFound int            `json:"secretsFound"`
	TokensUsed   int            `json:"tokensUsed"`
	SecretTypes  map[string]int `json:"secretTypes"`
	ProcessedAt  time.Time      `json:"processedAt"`
}

FileStats tracks statistics per file

type LineReader added in v0.123.16

type LineReader struct {
	// contains filtered or unexported fields
}

LineReader reads lines from an io.Reader while tracking whether each line ended with a newline character. This is essential for preserving the exact structure of input files during redaction - binary files and text files without trailing newlines should not have newlines added to them.

Unlike bufio.Scanner which strips newlines and requires the caller to add them back, LineReader explicitly tracks the presence of newlines so callers can conditionally restore them only when they were originally present.

func NewLineReader added in v0.123.16

func NewLineReader(r io.Reader) *LineReader

NewLineReader creates a new LineReader that reads from the given io.Reader. The reader is wrapped in a bufio.Reader for efficient byte-by-byte reading.

func (*LineReader) ReadLine added in v0.123.16

func (lr *LineReader) ReadLine() ([]byte, bool, error)

ReadLine reads the next line from the reader and returns:

  • line content (without the newline character if present)
  • whether the line ended with a newline (\n)
  • any error encountered

Return values:

  • (content, true, nil) - line ended with \n, more content may follow
  • (content, false, io.EOF) - last line without \n (file doesn't end with newline)
  • (nil, false, io.EOF) - reached EOF with no content (empty file or end of file)
  • (content, false, error) - encountered a non-EOF error

The function respects constants.SCANNER_MAX_SIZE and returns an error if a single line exceeds this limit. This prevents memory exhaustion on files with extremely long lines or binary files without newlines that are larger than the limit.

Example usage:

lr := NewLineReader(input)
for {
    line, hadNewline, err := lr.ReadLine()
    if err == io.EOF && len(line) == 0 {
        break // End of file, no more content
    }

    // Process line...
    fmt.Print(string(line))
    if hadNewline {
        fmt.Print("\n")
    }

    if err == io.EOF {
        break // Last line processed
    }
    if err != nil {
        return err
    }
}

type LineRedactor added in v0.71.0

type LineRedactor struct {
	// contains filtered or unexported fields
}

type MultiLineRedactor

type MultiLineRedactor struct {
	// contains filtered or unexported fields
}

func NewMultiLineRedactor

func NewMultiLineRedactor(re1 LineRedactor, re2 string, maskText, path, name string, isDefault bool) (*MultiLineRedactor, error)

func (*MultiLineRedactor) Redact

func (r *MultiLineRedactor) Redact(input io.Reader, path string) io.Reader

Redact processes the input reader in pairs of lines, applying redaction patterns. Unlike the previous implementation using bufio.Reader with readLine(), this now uses LineReader to preserve the exact newline structure of the input file.

The MultiLineRedactor works by: 1. Reading pairs of lines (line1, line2) 2. If line1 matches the selector pattern (re1), redact line2 using re2 3. Write both lines with their original newline structure preserved

This ensures binary files and text files without trailing newlines are not corrupted.

type Redaction added in v0.9.34

type Redaction struct {
	RedactorName      string `json:"redactorName" yaml:"redactorName"`
	CharactersRemoved int    `json:"charactersRemoved" yaml:"charactersRemoved"`
	Line              int    `json:"line" yaml:"line"`
	File              string `json:"file" yaml:"file"`
	IsDefaultRedactor bool   `json:"isDefaultRedactor" yaml:"isDefaultRedactor"`
}

type RedactionList added in v0.9.34

type RedactionList struct {
	ByRedactor map[string][]Redaction `json:"byRedactor" yaml:"byRedactor"`
	ByFile     map[string][]Redaction `json:"byFile" yaml:"byFile"`
}

Redactions are indexed both by the file affected and by the name of the redactor

func GetRedactionList added in v0.9.34

func GetRedactionList() RedactionList

type RedactionMap added in v0.123.0

type RedactionMap struct {
	Tokens        map[string]string   `json:"tokens"`       // token -> original value
	Stats         RedactionStats      `json:"stats"`        // redaction statistics
	Timestamp     time.Time           `json:"timestamp"`    // when redaction was performed
	Profile       string              `json:"profile"`      // profile used
	BundleID      string              `json:"bundleId"`     // unique bundle identifier
	SecretRefs    map[string][]string `json:"secretRefs"`   // token -> list of file paths where found
	Duplicates    []DuplicateGroup    `json:"duplicates"`   // groups of identical secrets
	Correlations  []CorrelationGroup  `json:"correlations"` // correlated secret patterns
	EncryptionKey []byte              `json:"-"`            // encryption key (not serialized)
	IsEncrypted   bool                `json:"isEncrypted"`  // whether the mapping is encrypted
}

RedactionMap represents the mapping between tokens and original values

func LoadRedactionMapFile added in v0.123.0

func LoadRedactionMapFile(filePath string, encryptionKey []byte) (RedactionMap, error)

LoadRedactionMapFile loads and optionally decrypts a redaction mapping file

type RedactionStats added in v0.123.0

type RedactionStats struct {
	TotalSecrets      int                  `json:"totalSecrets"`
	UniqueSecrets     int                  `json:"uniqueSecrets"`
	TokensGenerated   int                  `json:"tokensGenerated"`
	SecretsByType     map[string]int       `json:"secretsByType"`
	ProcessingTimeMs  int64                `json:"processingTimeMs"`
	FilesCovered      int                  `json:"filesCovered"`
	DuplicateCount    int                  `json:"duplicateCount"`
	CorrelationCount  int                  `json:"correlationCount"`
	NormalizationHits int                  `json:"normalizationHits"`
	CacheHits         int                  `json:"cacheHits"`
	CacheMisses       int                  `json:"cacheMisses"`
	FileCoverage      map[string]FileStats `json:"fileCoverage"`
}

RedactionStats contains statistics about the redaction process

type Redactor

type Redactor interface {
	Redact(input io.Reader, path string) io.Reader
}

type SingleLineRedactor

type SingleLineRedactor struct {
	// contains filtered or unexported fields
}

func NewSingleLineRedactor

func NewSingleLineRedactor(re LineRedactor, maskText, path, name string, isDefault bool) (*SingleLineRedactor, error)

func (*SingleLineRedactor) Redact

func (r *SingleLineRedactor) Redact(input io.Reader, path string) io.Reader

Unlike the previous implementation using bufio.Scanner, this now uses LineReader to preserve the exact newline structure of the input file. Lines that originally ended with \n will have \n added back, while lines without \n (like the last line of a file without a trailing newline, or binary files) will not have \n added. This ensures binary files and text files without trailing newlines are not corrupted.

type TokenPrefix added in v0.123.0

type TokenPrefix string

TokenPrefix represents different types of secrets for token generation

const (
	TokenPrefixPassword   TokenPrefix = "PASSWORD"
	TokenPrefixAPIKey     TokenPrefix = "APIKEY"
	TokenPrefixDatabase   TokenPrefix = "DATABASE"
	TokenPrefixEmail      TokenPrefix = "EMAIL"
	TokenPrefixIP         TokenPrefix = "IP"
	TokenPrefixToken      TokenPrefix = "TOKEN"
	TokenPrefixSecret     TokenPrefix = "SECRET"
	TokenPrefixKey        TokenPrefix = "KEY"
	TokenPrefixCredential TokenPrefix = "CREDENTIAL"
	TokenPrefixAuth       TokenPrefix = "AUTH"
	TokenPrefixGeneric    TokenPrefix = "GENERIC"
)

type Tokenizer added in v0.123.0

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer handles deterministic secret tokenization

func GetGlobalTokenizer added in v0.123.0

func GetGlobalTokenizer() *Tokenizer

GetGlobalTokenizer returns the global tokenizer instance

func NewTokenizer added in v0.123.0

func NewTokenizer(config TokenizerConfig) *Tokenizer

NewTokenizer creates a new tokenizer with the given configuration

func (*Tokenizer) GenerateRedactionMapFile added in v0.123.0

func (t *Tokenizer) GenerateRedactionMapFile(profile, outputPath string, encrypt bool) error

GenerateRedactionMapFile creates a redaction mapping file with optional encryption

func (*Tokenizer) GetBundleID added in v0.123.0

func (t *Tokenizer) GetBundleID() string

GetBundleID returns the unique bundle identifier

func (*Tokenizer) GetCacheStats added in v0.123.0

func (t *Tokenizer) GetCacheStats() CacheStats

GetCacheStats returns cache performance statistics

func (*Tokenizer) GetDuplicateGroups added in v0.123.0

func (t *Tokenizer) GetDuplicateGroups() []DuplicateGroup

GetDuplicateGroups returns all duplicate secret groups

func (*Tokenizer) GetFileStats added in v0.123.0

func (t *Tokenizer) GetFileStats(filePath string) (FileStats, bool)

GetFileStats returns statistics for a specific file

func (*Tokenizer) GetRedactionMap added in v0.123.0

func (t *Tokenizer) GetRedactionMap(profile string) RedactionMap

GetRedactionMap returns the current redaction map

func (*Tokenizer) GetTokenCount added in v0.123.0

func (t *Tokenizer) GetTokenCount() int

GetTokenCount returns the number of tokens generated

func (*Tokenizer) IsEnabled added in v0.123.0

func (t *Tokenizer) IsEnabled() bool

IsEnabled returns whether tokenization is enabled

func (*Tokenizer) Reset added in v0.123.0

func (t *Tokenizer) Reset()

Reset clears all tokens and mappings (useful for testing)

func (*Tokenizer) TokenizeValue added in v0.123.0

func (t *Tokenizer) TokenizeValue(value, context string) string

TokenizeValue generates or retrieves a token for a secret value

func (*Tokenizer) TokenizeValueWithPath added in v0.123.0

func (t *Tokenizer) TokenizeValueWithPath(value, context, filePath string) string

TokenizeValueWithPath generates or retrieves a token for a secret value with file path tracking

func (*Tokenizer) ValidateToken added in v0.123.0

func (t *Tokenizer) ValidateToken(token string) bool

ValidateToken checks if a token matches the expected format

type TokenizerConfig added in v0.123.0

type TokenizerConfig struct {
	// Enable tokenization (defaults to checking TROUBLESHOOT_TOKENIZATION env var)
	Enabled bool

	// Salt for deterministic token generation per bundle
	Salt []byte

	// Default token prefix when type cannot be determined
	DefaultPrefix TokenPrefix

	// Token format template (must include %s for prefix and %s for hash)
	TokenFormat string

	// Hash length in characters (default 6)
	HashLength int
}

TokenizerConfig holds configuration for the tokenizer

type YamlRedactor added in v0.9.31

type YamlRedactor struct {
	// contains filtered or unexported fields
}

func NewYamlRedactor added in v0.9.31

func NewYamlRedactor(yamlPath, filePath, name string) *YamlRedactor

func (*YamlRedactor) Redact added in v0.9.31

func (r *YamlRedactor) Redact(input io.Reader, path string) io.Reader

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL