hashutil

package

v0.2.0 Latest Latest Go to latest Published: May 11, 2026 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kaeawc/krit

Links

Open Source Insights

Documentation ¶

Index ¶

func HashBytes(b []byte) [32]byte
func HashFile(path string) (string, error)
func HashHex(b []byte) string
func HashReader(r io.Reader) (string, error)
func HasherName() string
func ResetDefault()
func SetContentHasher(h ContentHasher)
type ContentHasher
- func Hasher() ContentHasher
type Memo
- func Default() *Memo
- func NewMemo() *Memo

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func HashBytes ¶

func HashBytes(b []byte) [32]byte

HashBytes returns the raw 32-byte digest of b.

func HashFile ¶

func HashFile(path string) (string, error)

HashFile is a convenience wrapper that opens path, streams it through HashReader, and returns the hex digest. Returns the underlying os error on open failure (do not wrap — oracle.ContentHash's current callers check os.IsNotExist on the return).

func HashHex ¶

func HashHex(b []byte) string

HashHex returns the digest of b as a lowercase hex string. Use this for on-disk cache keys, fingerprints, and any user-visible identifier.

func HashReader ¶

func HashReader(r io.Reader) (string, error)

HashReader streams r into the active hasher and returns the lowercase hex digest. Used by the oracle content-hash path to avoid reading whole files into memory.

func HasherName ¶

func HasherName() string

HasherName returns the Name() of the active ContentHasher. Embedded in cache version tokens so an algorithm swap auto-invalidates every cache that keys on content hash.

func ResetDefault ¶

func ResetDefault()

ResetDefault clears the shared Memo. The CLI calls this at the start of each scan invocation so memoized hashes do not bleed into a subsequent run where files may have changed.

func SetContentHasher ¶

func SetContentHasher(h ContentHasher)

SetContentHasher swaps the active hasher. Intended for benchmarks and future algorithm experiments; production code should leave the default (xxh3-256) in place. The swap is NOT safe for concurrent use — call it during process init or from a test's SetUp before any hashing runs.

Types ¶

type ContentHasher ¶

type ContentHasher interface {
	// Name is a stable identifier embedded in cache version tokens so
	// an algorithm swap automatically invalidates prior entries.
	Name() string
	// Sum returns the 32-byte digest of b.
	Sum(b []byte) [32]byte
	// New returns a streaming hash.Hash whose Sum(nil) matches
	// Sum(b)[:] when fed the same bytes. Size() is 32.
	New() hash.Hash
}

ContentHasher abstracts the hash function used for content fingerprints (parse cache, cross-file cache, oracle cache, incremental cache). Implementations must return a 32-byte digest so existing store.Key FileHash [32]byte layouts keep working.

The default is xxh3-128 widened to 256 bits via two distinct seeds — non-crypto but SIMD-accelerated on both amd64 (AVX / AVX-512) and arm64 (NEON), so it stays ahead of hardware SHA-256 on Apple Silicon (~5×) and beats software SHA-256 on typical Linux CI hardware by ~10×. The interface leaves room for swapping in blake3, wyhash, or a successor via SetContentHasher without disturbing any callers.

func Hasher ¶

func Hasher() ContentHasher

Hasher returns the currently installed ContentHasher. Subsystems that need streaming semantics (e.g. oracle closureFingerprint) call Hasher().New() instead of importing a specific algorithm.

type Memo ¶

type Memo struct {
	// contains filtered or unexported fields
}

Memo memoizes file content hashes for the duration of a single run. The cache is keyed by (path, size, mtime); a file whose stat fingerprint changes is re-hashed on the next lookup. A Memo is safe for concurrent use.

Memo is deliberately scoped to a single invocation — callers build one, pass it to every subsystem that would otherwise hash the same file independently, and discard it when the run completes. Persisting a Memo across runs would return stale hashes if a file was modified between invocations under the same (path, size, mtime) triple.

The nil *Memo is a valid disabled memo: every method falls through to the unmemoized hashutil helpers and no entries are retained. This keeps callers that haven't yet been wired to a shared Memo working unchanged.

func Default ¶

func Default() *Memo

Default returns the process-scoped shared Memo. Subsystems that need to cooperate on file hashing should use Default() rather than instantiating a private Memo, so all redundant content-hash computations within a single run collapse to one per unique file.

func NewMemo ¶

func NewMemo() *Memo

NewMemo returns an empty Memo.

func (*Memo) Clear ¶

func (m *Memo) Clear()

Clear drops all entries and resets hit/miss counters.

func (*Memo) HashContent ¶

func (m *Memo) HashContent(path string, content []byte) string

HashContent returns the hex digest of content under the active ContentHasher and, if path is non-empty and a stat succeeds, memoizes the result so subsequent HashFile(path) calls within this Memo return the same digest without re-reading or re-hashing. Use this from callers that already hold the file bytes (e.g. after reading once into memory for parsing).

func (*Memo) HashFile ¶

func (m *Memo) HashFile(path string, provider func() ([]byte, error)) (string, error)

HashFile returns the lowercase hex digest of the file at path using the active ContentHasher. The returned digest is memoized against the file's (size, mtime); a later call for the same unchanged file hits the cache. If provider is non-nil and a hash actually needs to be computed, it is invoked to obtain the bytes instead of re-reading from disk — useful for callers that already have the file content in memory (e.g. the parse / cross-file caches).

A nil *Memo falls through to an unmemoized hash.

func (*Memo) HashFileRaw ¶

func (m *Memo) HashFileRaw(path string, provider func() ([]byte, error)) ([32]byte, error)

HashFileRaw is like HashFile but returns the raw 32-byte digest.

func (*Memo) Len ¶

func (m *Memo) Len() int

Len returns the number of cached entries.

func (*Memo) Stats ¶

func (m *Memo) Stats() (hits, misses uint64)

Stats returns the current hit and miss counts.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL