secretscan

package
v3.32.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 14, 2026 License: AGPL-3.0 Imports: 16 Imported by: 0

Documentation

Overview

Package secretscan provides binary file inspection and git history scanning helpers used by the secrets detection stage.

The high-fidelity secrets scanner needs to look for credentials everywhere they can hide: source files, compiled binaries, container images, embedded strings, document metadata (EXIF, PDF info, Office core properties), and the entire git history of a repository. This package provides the supporting primitives for those scans.

Index

Constants

View Source
const (
	// PathBinaryPrefix marks a file path whose value is the printable-strings
	// extraction from a binary file.
	PathBinaryPrefix = "__binary_strings__/"
	// PathEXIFPrefix marks a file path whose value is the EXIF/IPTC/XMP
	// metadata of an image file.
	PathEXIFPrefix = "__exif__/"
	// PathGitHistoryPrefix marks a file path whose value is the contents of
	// a file as it appeared in a past git commit.
	PathGitHistoryPrefix = "__git_history__/"
)

SyntheticPathPrefixes are the prefixes we use when injecting non-text file content into the secrets scan input. Anything matching a rule that begins with one of these came from a binary, EXIF, or git history source — not from the live working tree.

Rules can use these prefixes to filter ("only fire on real source code") or to enrich reporting ("this came from .git history of a deleted file").

View Source
const StringMin = 4

StringMin is the minimum length of a printable run we consider a "string" in the unix `strings` sense. 4 is the default of GNU `strings`.

Variables

This section is empty.

Functions

func ExtractStrings

func ExtractStrings(data []byte, min int) string

ExtractStrings returns the printable ASCII / UTF-8 runs of at least min characters from data, joined with newlines. This is a faithful re-implementation of the unix `strings` command's --print-file-name-free default behaviour and is intended to surface credentials embedded in compiled binaries, container image layers, or opaque blobs.

We deliberately do NOT bound the output: a 50MB binary can contain tens of thousands of strings. Callers that need to bound memory should pre-truncate the input. The return is a string suitable for feeding to OPA as a file content slice.

func IsBinary

func IsBinary(data []byte) bool

IsBinary reports whether data appears to be a binary file. A file is considered binary if it contains NUL bytes within the first probe bytes or has a high ratio of non-printable characters.

The probe is intentionally small (8 KiB) so we do not have to read the whole file just to decide.

Types

type BinaryInsight

type BinaryInsight struct {
	// StringsKey → printable-strings extraction (newlines joined).
	StringsKey string
	StringsVal string
	// EXIFKey → EXIF/IPTC/XMP key=value extraction.
	EXIFKey string
	EXIFVal string
	// HadEXIF reports whether the file actually carried EXIF metadata.
	HadEXIF bool
}

BinaryInsight holds the synthetic file-content values that InspectBinary produced for a given file. Both fields are optional and may be empty.

The keys are the synthetic paths that should be used in the secrets scan input.file_contents map. They are constructed from the original file path so that rules can correlate the artifact back to the source file.

func InspectBinary

func InspectBinary(path string, data []byte, opts InspectOptions) BinaryInsight

InspectBinary analyses a binary file and returns the synthetic file-content entries that should be added to the secrets scan input. The file extension is used to decide whether EXIF extraction applies — only images and TIFF containers are processed.

type EXIF

type EXIF struct {
	Make        string
	Model       string
	Software    string
	DateTime    string
	Artist      string
	Copyright   string
	ImageDescr  string
	UserComment string
	GPS         map[string]string
	Other       map[string]string
	// Raw returns the full serialised key=value list, useful for regex rules
	// that want to match against any metadata field.
	Raw string
}

EXIF is the subset of EXIF/IPTC/XMP metadata relevant to secrets detection. It is intentionally permissive: anything that can carry a credential is surfaced as a key, even if the underlying tag is not formally recognised.

func ExtractEXIF

func ExtractEXIF(data []byte) (exif EXIF, ok bool)

ExtractEXIF returns a best-effort EXIF extraction from data. It supports JPEG (APP1 EXIF, IPTC, XMP) and TIFF containers. Other formats return a zero-value EXIF and ok=false.

This is intentionally a minimal parser — we do not need the full spec, we need to surface the strings that secrets detection can match against.

type GitHistoryEntry

type GitHistoryEntry struct {
	// Key is the synthetic path that should be added to the secrets scan
	// input.file_contents map. Format: "__git_history__/<short-sha>/<path>".
	Key string
	// Value is the file contents as a string.
	Value string
	// CommitHash is the full commit hash the version was taken from.
	CommitHash string
	// FilePath is the file path inside the commit tree.
	FilePath string
	// IsDelete reports whether this entry represents the deletion of a file
	// (i.e. the commit removed it). We still emit a sentinel so the rule
	// can correlate, but Value is empty.
	IsDelete bool
}

GitHistoryEntry is a single file version extracted from git history.

func ScanGitHistory

func ScanGitHistory(repoRoot string, opts GitHistoryOptions) ([]GitHistoryEntry, error)

ScanGitHistory walks the git history of the repository rooted at repoRoot and returns a list of file versions suitable for the secrets scan input.

The walk is breadth-first newest-first, bounded by opts. The function returns immediately with an empty slice (and a nil error) when repoRoot is not a git repository, or when the .git directory is unreadable — callers that want to treat "no history" as a fatal condition should inspect err separately.

func ScanLooseGitObjects

func ScanLooseGitObjects(repoRoot string, maxBytes int64) ([]GitHistoryEntry, error)

ScanLooseGitObjects is a fallback for the case where the repository's pack files are corrupt or unavailable: walk the loose objects in .git/objects and decompress any that are blobs. This is slower than ScanGitHistory but does not depend on the go-git walker.

The returned entries use the synthetic path "__git_history__/loose/<sha>". The file path inside the commit/tree cannot be reconstructed for a loose blob on its own, so we surface the blob as a single best-effort extraction.

type GitHistoryOptions

type GitHistoryOptions struct {
	// MaxCommits caps how many commits are walked. Zero means no cap. The
	// walk is performed newest-first.
	MaxCommits int
	// MaxFileBytes caps the size of any single file extracted from history.
	// Files larger than this are skipped. Defaults to 4 MiB.
	MaxFileBytes int64
	// MaxFiles caps how many file versions are extracted in total. Zero
	// means no cap.
	MaxFiles int
}

GitHistoryOptions configures the git history scan.

type InspectOptions

type InspectOptions struct {
	// IncludeStrings controls whether printable strings are extracted from
	// the binary. When false, only EXIF (for image files) is extracted.
	IncludeStrings bool
	// MinStringLength is the minimum length of a printable run to surface
	// from strings extraction. Defaults to 4 (unix `strings` default).
	MinStringLength int
}

InspectOptions configures InspectBinary.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL