Documentation
¶
Index ¶
- Constants
- Variables
- func DisableTokenization()
- func EnableTokenization()
- func Redact(input io.Reader, path string, ...) (io.Reader, error)
- func ResetGlobalTokenizer()
- func ResetRedactionList()
- func ValidateRedactionMapFile(filePath string) error
- type CacheStats
- type CorrelationGroup
- type DuplicateGroup
- type FileStats
- type LineReader
- type LineRedactor
- type MultiLineRedactor
- type Redaction
- type RedactionList
- type RedactionMap
- type RedactionStats
- type Redactor
- type SingleLineRedactor
- type TokenPrefix
- type Tokenizer
- func (t *Tokenizer) GenerateRedactionMapFile(profile, outputPath string, encrypt bool) error
- func (t *Tokenizer) GetBundleID() string
- func (t *Tokenizer) GetCacheStats() CacheStats
- func (t *Tokenizer) GetDuplicateGroups() []DuplicateGroup
- func (t *Tokenizer) GetFileStats(filePath string) (FileStats, bool)
- func (t *Tokenizer) GetRedactionMap(profile string) RedactionMap
- func (t *Tokenizer) GetTokenCount() int
- func (t *Tokenizer) IsEnabled() bool
- func (t *Tokenizer) Reset()
- func (t *Tokenizer) TokenizeValue(value, context string) string
- func (t *Tokenizer) TokenizeValueWithPath(value, context, filePath string) string
- func (t *Tokenizer) ValidateToken(token string) bool
- type TokenizerConfig
- type YamlRedactor
Constants ¶
const (
MASK_TEXT = "***HIDDEN***"
)
Variables ¶
var NEW_LINE = []byte{'\n'}
Functions ¶
func DisableTokenization ¶ added in v0.123.0
func DisableTokenization()
DisableTokenization disables tokenization on the global tokenizer
func EnableTokenization ¶ added in v0.123.0
func EnableTokenization()
EnableTokenization enables tokenization on the global tokenizer
func ResetGlobalTokenizer ¶ added in v0.123.0
func ResetGlobalTokenizer()
ResetGlobalTokenizer resets the global tokenizer instance (useful for testing)
func ResetRedactionList ¶ added in v0.9.34
func ResetRedactionList()
func ValidateRedactionMapFile ¶ added in v0.123.0
ValidateRedactionMapFile validates the structure and integrity of a redaction map file
Types ¶
type CacheStats ¶ added in v0.123.0
type CacheStats struct {
Hits int64 `json:"hits"` // cache hits
Misses int64 `json:"misses"` // cache misses
Total int64 `json:"total"` // total lookups
}
CacheStats tracks tokenizer cache performance
type CorrelationGroup ¶ added in v0.123.0
type CorrelationGroup struct {
Pattern string `json:"pattern"` // correlation pattern identifier
Description string `json:"description"` // human-readable description
Tokens []string `json:"tokens"` // tokens involved in correlation
Files []string `json:"files"` // files where correlation was found
Confidence float64 `json:"confidence"` // confidence score (0.0-1.0)
DetectedAt time.Time `json:"detectedAt"` // when correlation was detected
}
CorrelationGroup represents correlated secret patterns across files
type DuplicateGroup ¶ added in v0.123.0
type DuplicateGroup struct {
SecretHash string `json:"secretHash"` // hash of the normalized secret
Token string `json:"token"` // the token used for this secret
SecretType string `json:"secretType"` // classified type of the secret
Locations []string `json:"locations"` // file paths where this secret was found
Count int `json:"count"` // total occurrences
FirstSeen time.Time `json:"firstSeen"` // when first detected
LastSeen time.Time `json:"lastSeen"` // when last detected
}
DuplicateGroup represents a group of identical secrets found in different locations
type FileStats ¶ added in v0.123.0
type FileStats struct {
FilePath string `json:"filePath"`
SecretsFound int `json:"secretsFound"`
TokensUsed int `json:"tokensUsed"`
SecretTypes map[string]int `json:"secretTypes"`
ProcessedAt time.Time `json:"processedAt"`
}
FileStats tracks statistics per file
type LineReader ¶ added in v0.123.16
type LineReader struct {
// contains filtered or unexported fields
}
LineReader reads lines from an io.Reader while tracking whether each line ended with a newline character. This is essential for preserving the exact structure of input files during redaction - binary files and text files without trailing newlines should not have newlines added to them.
Unlike bufio.Scanner which strips newlines and requires the caller to add them back, LineReader explicitly tracks the presence of newlines so callers can conditionally restore them only when they were originally present.
func NewLineReader ¶ added in v0.123.16
func NewLineReader(r io.Reader) *LineReader
NewLineReader creates a new LineReader that reads from the given io.Reader. The reader is wrapped in a bufio.Reader for efficient byte-by-byte reading.
func (*LineReader) ReadLine ¶ added in v0.123.16
func (lr *LineReader) ReadLine() ([]byte, bool, error)
ReadLine reads the next line from the reader and returns:
- line content (without the newline character if present)
- whether the line ended with a newline (\n)
- any error encountered
Return values:
- (content, true, nil) - line ended with \n, more content may follow
- (content, false, io.EOF) - last line without \n (file doesn't end with newline)
- (nil, false, io.EOF) - reached EOF with no content (empty file or end of file)
- (content, false, error) - encountered a non-EOF error
The function respects constants.SCANNER_MAX_SIZE and returns an error if a single line exceeds this limit. This prevents memory exhaustion on files with extremely long lines or binary files without newlines that are larger than the limit.
Example usage:
lr := NewLineReader(input)
for {
line, hadNewline, err := lr.ReadLine()
if err == io.EOF && len(line) == 0 {
break // End of file, no more content
}
// Process line...
fmt.Print(string(line))
if hadNewline {
fmt.Print("\n")
}
if err == io.EOF {
break // Last line processed
}
if err != nil {
return err
}
}
type LineRedactor ¶ added in v0.71.0
type LineRedactor struct {
// contains filtered or unexported fields
}
type MultiLineRedactor ¶
type MultiLineRedactor struct {
// contains filtered or unexported fields
}
func NewMultiLineRedactor ¶
func NewMultiLineRedactor(re1 LineRedactor, re2 string, maskText, path, name string, isDefault bool) (*MultiLineRedactor, error)
func (*MultiLineRedactor) Redact ¶
Redact processes the input reader in pairs of lines, applying redaction patterns. Unlike the previous implementation using bufio.Reader with readLine(), this now uses LineReader to preserve the exact newline structure of the input file.
The MultiLineRedactor works by: 1. Reading pairs of lines (line1, line2) 2. If line1 matches the selector pattern (re1), redact line2 using re2 3. Write both lines with their original newline structure preserved
This ensures binary files and text files without trailing newlines are not corrupted.
type Redaction ¶ added in v0.9.34
type Redaction struct {
RedactorName string `json:"redactorName" yaml:"redactorName"`
CharactersRemoved int `json:"charactersRemoved" yaml:"charactersRemoved"`
Line int `json:"line" yaml:"line"`
File string `json:"file" yaml:"file"`
IsDefaultRedactor bool `json:"isDefaultRedactor" yaml:"isDefaultRedactor"`
}
type RedactionList ¶ added in v0.9.34
type RedactionList struct {
ByRedactor map[string][]Redaction `json:"byRedactor" yaml:"byRedactor"`
ByFile map[string][]Redaction `json:"byFile" yaml:"byFile"`
}
Redactions are indexed both by the file affected and by the name of the redactor
func GetRedactionList ¶ added in v0.9.34
func GetRedactionList() RedactionList
type RedactionMap ¶ added in v0.123.0
type RedactionMap struct {
Tokens map[string]string `json:"tokens"` // token -> original value
Stats RedactionStats `json:"stats"` // redaction statistics
Timestamp time.Time `json:"timestamp"` // when redaction was performed
Profile string `json:"profile"` // profile used
BundleID string `json:"bundleId"` // unique bundle identifier
SecretRefs map[string][]string `json:"secretRefs"` // token -> list of file paths where found
Duplicates []DuplicateGroup `json:"duplicates"` // groups of identical secrets
Correlations []CorrelationGroup `json:"correlations"` // correlated secret patterns
EncryptionKey []byte `json:"-"` // encryption key (not serialized)
IsEncrypted bool `json:"isEncrypted"` // whether the mapping is encrypted
}
RedactionMap represents the mapping between tokens and original values
func LoadRedactionMapFile ¶ added in v0.123.0
func LoadRedactionMapFile(filePath string, encryptionKey []byte) (RedactionMap, error)
LoadRedactionMapFile loads and optionally decrypts a redaction mapping file
type RedactionStats ¶ added in v0.123.0
type RedactionStats struct {
TotalSecrets int `json:"totalSecrets"`
UniqueSecrets int `json:"uniqueSecrets"`
TokensGenerated int `json:"tokensGenerated"`
SecretsByType map[string]int `json:"secretsByType"`
ProcessingTimeMs int64 `json:"processingTimeMs"`
FilesCovered int `json:"filesCovered"`
DuplicateCount int `json:"duplicateCount"`
CorrelationCount int `json:"correlationCount"`
NormalizationHits int `json:"normalizationHits"`
CacheHits int `json:"cacheHits"`
CacheMisses int `json:"cacheMisses"`
FileCoverage map[string]FileStats `json:"fileCoverage"`
}
RedactionStats contains statistics about the redaction process
type SingleLineRedactor ¶
type SingleLineRedactor struct {
// contains filtered or unexported fields
}
func NewSingleLineRedactor ¶
func NewSingleLineRedactor(re LineRedactor, maskText, path, name string, isDefault bool) (*SingleLineRedactor, error)
func (*SingleLineRedactor) Redact ¶
Unlike the previous implementation using bufio.Scanner, this now uses LineReader to preserve the exact newline structure of the input file. Lines that originally ended with \n will have \n added back, while lines without \n (like the last line of a file without a trailing newline, or binary files) will not have \n added. This ensures binary files and text files without trailing newlines are not corrupted.
type TokenPrefix ¶ added in v0.123.0
type TokenPrefix string
TokenPrefix represents different types of secrets for token generation
const ( TokenPrefixPassword TokenPrefix = "PASSWORD" TokenPrefixAPIKey TokenPrefix = "APIKEY" TokenPrefixDatabase TokenPrefix = "DATABASE" TokenPrefixEmail TokenPrefix = "EMAIL" TokenPrefixIP TokenPrefix = "IP" TokenPrefixToken TokenPrefix = "TOKEN" TokenPrefixSecret TokenPrefix = "SECRET" TokenPrefixKey TokenPrefix = "KEY" TokenPrefixCredential TokenPrefix = "CREDENTIAL" TokenPrefixAuth TokenPrefix = "AUTH" TokenPrefixGeneric TokenPrefix = "GENERIC" )
type Tokenizer ¶ added in v0.123.0
type Tokenizer struct {
// contains filtered or unexported fields
}
Tokenizer handles deterministic secret tokenization
func GetGlobalTokenizer ¶ added in v0.123.0
func GetGlobalTokenizer() *Tokenizer
GetGlobalTokenizer returns the global tokenizer instance
func NewTokenizer ¶ added in v0.123.0
func NewTokenizer(config TokenizerConfig) *Tokenizer
NewTokenizer creates a new tokenizer with the given configuration
func (*Tokenizer) GenerateRedactionMapFile ¶ added in v0.123.0
GenerateRedactionMapFile creates a redaction mapping file with optional encryption
func (*Tokenizer) GetBundleID ¶ added in v0.123.0
GetBundleID returns the unique bundle identifier
func (*Tokenizer) GetCacheStats ¶ added in v0.123.0
func (t *Tokenizer) GetCacheStats() CacheStats
GetCacheStats returns cache performance statistics
func (*Tokenizer) GetDuplicateGroups ¶ added in v0.123.0
func (t *Tokenizer) GetDuplicateGroups() []DuplicateGroup
GetDuplicateGroups returns all duplicate secret groups
func (*Tokenizer) GetFileStats ¶ added in v0.123.0
GetFileStats returns statistics for a specific file
func (*Tokenizer) GetRedactionMap ¶ added in v0.123.0
func (t *Tokenizer) GetRedactionMap(profile string) RedactionMap
GetRedactionMap returns the current redaction map
func (*Tokenizer) GetTokenCount ¶ added in v0.123.0
GetTokenCount returns the number of tokens generated
func (*Tokenizer) Reset ¶ added in v0.123.0
func (t *Tokenizer) Reset()
Reset clears all tokens and mappings (useful for testing)
func (*Tokenizer) TokenizeValue ¶ added in v0.123.0
TokenizeValue generates or retrieves a token for a secret value
func (*Tokenizer) TokenizeValueWithPath ¶ added in v0.123.0
TokenizeValueWithPath generates or retrieves a token for a secret value with file path tracking
func (*Tokenizer) ValidateToken ¶ added in v0.123.0
ValidateToken checks if a token matches the expected format
type TokenizerConfig ¶ added in v0.123.0
type TokenizerConfig struct {
// Enable tokenization (defaults to checking TROUBLESHOOT_TOKENIZATION env var)
Enabled bool
// Salt for deterministic token generation per bundle
Salt []byte
// Default token prefix when type cannot be determined
DefaultPrefix TokenPrefix
// Token format template (must include %s for prefix and %s for hash)
TokenFormat string
// Hash length in characters (default 6)
HashLength int
}
TokenizerConfig holds configuration for the tokenizer
type YamlRedactor ¶ added in v0.9.31
type YamlRedactor struct {
// contains filtered or unexported fields
}
func NewYamlRedactor ¶ added in v0.9.31
func NewYamlRedactor(yamlPath, filePath, name string) *YamlRedactor