hashutil

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 11, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func HashBytes

func HashBytes(b []byte) [32]byte

HashBytes returns the raw 32-byte digest of b.

func HashFile

func HashFile(path string) (string, error)

HashFile is a convenience wrapper that opens path, streams it through HashReader, and returns the hex digest. Returns the underlying os error on open failure (do not wrap — oracle.ContentHash's current callers check os.IsNotExist on the return).

func HashHex

func HashHex(b []byte) string

HashHex returns the digest of b as a lowercase hex string. Use this for on-disk cache keys, fingerprints, and any user-visible identifier.

func HashReader

func HashReader(r io.Reader) (string, error)

HashReader streams r into the active hasher and returns the lowercase hex digest. Used by the oracle content-hash path to avoid reading whole files into memory.

func HasherName

func HasherName() string

HasherName returns the Name() of the active ContentHasher. Embedded in cache version tokens so an algorithm swap auto-invalidates every cache that keys on content hash.

func ResetDefault

func ResetDefault()

ResetDefault clears the shared Memo. The CLI calls this at the start of each scan invocation so memoized hashes do not bleed into a subsequent run where files may have changed.

func SetContentHasher

func SetContentHasher(h ContentHasher)

SetContentHasher swaps the active hasher. Intended for benchmarks and future algorithm experiments; production code should leave the default (xxh3-256) in place. The swap is NOT safe for concurrent use — call it during process init or from a test's SetUp before any hashing runs.

Types

type ContentHasher

type ContentHasher interface {
	// Name is a stable identifier embedded in cache version tokens so
	// an algorithm swap automatically invalidates prior entries.
	Name() string
	// Sum returns the 32-byte digest of b.
	Sum(b []byte) [32]byte
	// New returns a streaming hash.Hash whose Sum(nil) matches
	// Sum(b)[:] when fed the same bytes. Size() is 32.
	New() hash.Hash
}

ContentHasher abstracts the hash function used for content fingerprints (parse cache, cross-file cache, oracle cache, incremental cache). Implementations must return a 32-byte digest so existing store.Key FileHash [32]byte layouts keep working.

The default is xxh3-128 widened to 256 bits via two distinct seeds — non-crypto but SIMD-accelerated on both amd64 (AVX / AVX-512) and arm64 (NEON), so it stays ahead of hardware SHA-256 on Apple Silicon (~5×) and beats software SHA-256 on typical Linux CI hardware by ~10×. The interface leaves room for swapping in blake3, wyhash, or a successor via SetContentHasher without disturbing any callers.

func Hasher

func Hasher() ContentHasher

Hasher returns the currently installed ContentHasher. Subsystems that need streaming semantics (e.g. oracle closureFingerprint) call Hasher().New() instead of importing a specific algorithm.

type Memo

type Memo struct {
	// contains filtered or unexported fields
}

Memo memoizes file content hashes for the duration of a single run. The cache is keyed by (path, size, mtime); a file whose stat fingerprint changes is re-hashed on the next lookup. A Memo is safe for concurrent use.

Memo is deliberately scoped to a single invocation — callers build one, pass it to every subsystem that would otherwise hash the same file independently, and discard it when the run completes. Persisting a Memo across runs would return stale hashes if a file was modified between invocations under the same (path, size, mtime) triple.

The nil *Memo is a valid disabled memo: every method falls through to the unmemoized hashutil helpers and no entries are retained. This keeps callers that haven't yet been wired to a shared Memo working unchanged.

func Default

func Default() *Memo

Default returns the process-scoped shared Memo. Subsystems that need to cooperate on file hashing should use Default() rather than instantiating a private Memo, so all redundant content-hash computations within a single run collapse to one per unique file.

func NewMemo

func NewMemo() *Memo

NewMemo returns an empty Memo.

func (*Memo) Clear

func (m *Memo) Clear()

Clear drops all entries and resets hit/miss counters.

func (*Memo) HashContent

func (m *Memo) HashContent(path string, content []byte) string

HashContent returns the hex digest of content under the active ContentHasher and, if path is non-empty and a stat succeeds, memoizes the result so subsequent HashFile(path) calls within this Memo return the same digest without re-reading or re-hashing. Use this from callers that already hold the file bytes (e.g. after reading once into memory for parsing).

func (*Memo) HashFile

func (m *Memo) HashFile(path string, provider func() ([]byte, error)) (string, error)

HashFile returns the lowercase hex digest of the file at path using the active ContentHasher. The returned digest is memoized against the file's (size, mtime); a later call for the same unchanged file hits the cache. If provider is non-nil and a hash actually needs to be computed, it is invoked to obtain the bytes instead of re-reading from disk — useful for callers that already have the file content in memory (e.g. the parse / cross-file caches).

A nil *Memo falls through to an unmemoized hash.

func (*Memo) HashFileRaw

func (m *Memo) HashFileRaw(path string, provider func() ([]byte, error)) ([32]byte, error)

HashFileRaw is like HashFile but returns the raw 32-byte digest.

func (*Memo) Len

func (m *Memo) Len() int

Len returns the number of cached entries.

func (*Memo) Stats

func (m *Memo) Stats() (hits, misses uint64)

Stats returns the current hit and miss counts.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL