Documentation
¶
Overview ¶
Package secretscan provides binary file inspection and git history scanning helpers used by the secrets detection stage.
The high-fidelity secrets scanner needs to look for credentials everywhere they can hide: source files, compiled binaries, container images, embedded strings, document metadata (EXIF, PDF info, Office core properties), and the entire git history of a repository. This package provides the supporting primitives for those scans.
Index ¶
Constants ¶
const ( // PathBinaryPrefix marks a file path whose value is the printable-strings // extraction from a binary file. PathBinaryPrefix = "__binary_strings__/" // PathEXIFPrefix marks a file path whose value is the EXIF/IPTC/XMP // metadata of an image file. PathEXIFPrefix = "__exif__/" // PathGitHistoryPrefix marks a file path whose value is the contents of // a file as it appeared in a past git commit. PathGitHistoryPrefix = "__git_history__/" )
SyntheticPathPrefixes are the prefixes we use when injecting non-text file content into the secrets scan input. Anything matching a rule that begins with one of these came from a binary, EXIF, or git history source — not from the live working tree.
Rules can use these prefixes to filter ("only fire on real source code") or to enrich reporting ("this came from .git history of a deleted file").
const StringMin = 4
StringMin is the minimum length of a printable run we consider a "string" in the unix `strings` sense. 4 is the default of GNU `strings`.
Variables ¶
This section is empty.
Functions ¶
func ExtractStrings ¶
ExtractStrings returns the printable ASCII / UTF-8 runs of at least min characters from data, joined with newlines. This is a faithful re-implementation of the unix `strings` command's --print-file-name-free default behaviour and is intended to surface credentials embedded in compiled binaries, container image layers, or opaque blobs.
We deliberately do NOT bound the output: a 50MB binary can contain tens of thousands of strings. Callers that need to bound memory should pre-truncate the input. The return is a string suitable for feeding to OPA as a file content slice.
func IsBinary ¶
IsBinary reports whether data appears to be a binary file. A file is considered binary if it contains NUL bytes within the first probe bytes or has a high ratio of non-printable characters.
The probe is intentionally small (8 KiB) so we do not have to read the whole file just to decide.
Types ¶
type BinaryInsight ¶
type BinaryInsight struct {
// StringsKey → printable-strings extraction (newlines joined).
StringsKey string
StringsVal string
// EXIFKey → EXIF/IPTC/XMP key=value extraction.
EXIFKey string
EXIFVal string
// HadEXIF reports whether the file actually carried EXIF metadata.
HadEXIF bool
}
BinaryInsight holds the synthetic file-content values that InspectBinary produced for a given file. Both fields are optional and may be empty.
The keys are the synthetic paths that should be used in the secrets scan input.file_contents map. They are constructed from the original file path so that rules can correlate the artifact back to the source file.
func InspectBinary ¶
func InspectBinary(path string, data []byte, opts InspectOptions) BinaryInsight
InspectBinary analyses a binary file and returns the synthetic file-content entries that should be added to the secrets scan input. The file extension is used to decide whether EXIF extraction applies — only images and TIFF containers are processed.
type EXIF ¶
type EXIF struct {
Make string
Model string
Software string
DateTime string
Artist string
Copyright string
ImageDescr string
UserComment string
GPS map[string]string
Other map[string]string
// Raw returns the full serialised key=value list, useful for regex rules
// that want to match against any metadata field.
Raw string
}
EXIF is the subset of EXIF/IPTC/XMP metadata relevant to secrets detection. It is intentionally permissive: anything that can carry a credential is surfaced as a key, even if the underlying tag is not formally recognised.
func ExtractEXIF ¶
ExtractEXIF returns a best-effort EXIF extraction from data. It supports JPEG (APP1 EXIF, IPTC, XMP) and TIFF containers. Other formats return a zero-value EXIF and ok=false.
This is intentionally a minimal parser — we do not need the full spec, we need to surface the strings that secrets detection can match against.
type GitHistoryEntry ¶
type GitHistoryEntry struct {
// Key is the synthetic path that should be added to the secrets scan
// input.file_contents map. Format: "__git_history__/<short-sha>/<path>".
Key string
// Value is the file contents as a string.
Value string
// CommitHash is the full commit hash the version was taken from.
CommitHash string
// FilePath is the file path inside the commit tree.
FilePath string
// IsDelete reports whether this entry represents the deletion of a file
// (i.e. the commit removed it). We still emit a sentinel so the rule
// can correlate, but Value is empty.
IsDelete bool
}
GitHistoryEntry is a single file version extracted from git history.
func ScanGitHistory ¶
func ScanGitHistory(repoRoot string, opts GitHistoryOptions) ([]GitHistoryEntry, error)
ScanGitHistory walks the git history of the repository rooted at repoRoot and returns a list of file versions suitable for the secrets scan input.
The walk is breadth-first newest-first, bounded by opts. The function returns immediately with an empty slice (and a nil error) when repoRoot is not a git repository, or when the .git directory is unreadable — callers that want to treat "no history" as a fatal condition should inspect err separately.
func ScanLooseGitObjects ¶
func ScanLooseGitObjects(repoRoot string, maxBytes int64) ([]GitHistoryEntry, error)
ScanLooseGitObjects is a fallback for the case where the repository's pack files are corrupt or unavailable: walk the loose objects in .git/objects and decompress any that are blobs. This is slower than ScanGitHistory but does not depend on the go-git walker.
The returned entries use the synthetic path "__git_history__/loose/<sha>". The file path inside the commit/tree cannot be reconstructed for a loose blob on its own, so we surface the blob as a single best-effort extraction.
type GitHistoryOptions ¶
type GitHistoryOptions struct {
// MaxCommits caps how many commits are walked. Zero means no cap. The
// walk is performed newest-first.
MaxCommits int
// MaxFileBytes caps the size of any single file extracted from history.
// Files larger than this are skipped. Defaults to 4 MiB.
MaxFileBytes int64
// MaxFiles caps how many file versions are extracted in total. Zero
// means no cap.
MaxFiles int
}
GitHistoryOptions configures the git history scan.
type InspectOptions ¶
type InspectOptions struct {
// IncludeStrings controls whether printable strings are extracted from
// the binary. When false, only EXIF (for image files) is extracted.
IncludeStrings bool
// MinStringLength is the minimum length of a printable run to surface
// from strings extraction. Defaults to 4 (unix `strings` default).
MinStringLength int
}
InspectOptions configures InspectBinary.