Documentation
¶
Overview ¶
Package weightsource is the pluggable source layer for weight imports.
A Source is a stateful provider bound at construction time to a specific URI. It exposes two capabilities: Inventory lists the files the source offers (with sizes, per-file digests, and a source-identity Fingerprint), and Open streams one file's bytes. The packer drives the import one file at a time so that sources larger than local disk can be imported without full materialization.
Implementations exist for file:// (local directory) and hf:// (HuggingFace Hub).
Index ¶
Constants ¶
const FileScheme = "file"
FileScheme is the URI scheme for local filesystem sources.
const HFScheme = "hf"
HFScheme is the short URI scheme for HuggingFace Hub sources.
const HFSchemeLong = "huggingface"
HFSchemeLong is the long-form URI scheme alias for HuggingFace Hub.
Variables ¶
This section is empty.
Functions ¶
func DirHash ¶
func DirHash[F Dirhashable](files []F) string
DirHash computes a content-addressable digest of a file set per spec §2.4:
sha256(join(sort("<hex> <path>"), "\n"))
where each line is the file's sha256 hex digest and relative path joined by two spaces (matching sha256sum output). DirHash sorts the lines itself, so the caller's input order does not affect the result.
The result is the "sha256:<hex>" form. This formula computes the weight set digest stored in weights.lock (WeightLockEntry.SetDigest), and is also used by file:// sources specifically as their Fingerprint — content-addressable stores happen to match their fingerprint to their dirhash. Other schemes (hf://, s3://, http://) use scheme-native identifiers (commit SHA, ETag, etc.) for their Fingerprint instead.
func NormalizeURI ¶
NormalizeURI returns the canonical form of a weight source URI.
Each scheme has its own normalization rules:
- file:// and bare paths → canonical file:// form (see normalizeFileURI)
- hf:// and huggingface:// → canonical hf:// form (see normalizeHFURI)
Empty strings and unsupported schemes return an error.
Types ¶
type DirhashPart ¶
DirhashPart is the atomic input to DirHash: the pair of fields that uniquely identify a file's contribution to the dirhash. Path is the relative path (forward slashes) and Digest is the file's sha256 content digest in "sha256:<hex>" form.
func (DirhashPart) String ¶
func (p DirhashPart) String() string
String returns the canonical identity of a single file: "path\x00digest". This is the primitive that any code comparing files across layers, plans, or lockfile entries should use. DirHash composes over this (sorted, then hashed); layer keys join these (preserving individual file identity so two files with identical content but different paths remain distinguishable).
type Dirhashable ¶
type Dirhashable interface {
DirhashParts() DirhashPart
}
Dirhashable is implemented by types that can participate in DirHash. Both weightsource.InventoryFile and lockfile.WeightLockFile implement it, letting the two call sites share one digest implementation.
type FileSource ¶
type FileSource struct {
// contains filtered or unexported fields
}
FileSource is the Source implementation for file:// URIs and bare paths.
URIs take one of these forms:
file:///abs/path — absolute path file://./rel/path — canonical relative path (explicit ./) /abs/path — bare absolute path (normalized to file://) ./rel/path — bare relative path (normalized to file://) rel/path — bare relative path, no ./ prefix (normalized)
The lockfile stores only the normalized form (see NormalizeURI); the absolute on-disk path is resolved once at construction time so the Source methods do not re-resolve on every call.
func NewFileSource ¶
func NewFileSource(uri, projectDir string) (*FileSource, error)
NewFileSource constructs a FileSource bound to uri, resolving relative URIs against projectDir. It validates that the resolved path exists and is a directory.
func (*FileSource) Inventory ¶
func (s *FileSource) Inventory(ctx context.Context) (Inventory, error)
Inventory walks the source directory and returns per-file path / size / content digest plus the source fingerprint (sha256 of the sorted file set, spec §2.4).
The .cog state directory is skipped. Non-regular entries (symlinks, devices, FIFOs, sockets) are rejected per spec §1.3 — silently dropping them would let a user ship a model missing files they expected. Resolve to regular files before importing.
func (*FileSource) Open ¶
func (s *FileSource) Open(ctx context.Context, path string) (io.ReadCloser, error)
Open returns a reader for a single file in the source, identified by its inventory path (relative to the source root, using forward slashes). The caller closes the returned reader.
type Fingerprint ¶
type Fingerprint string
Fingerprint is a source's version identity, carrying its algorithm (or source-native identifier type) as a scheme prefix.
Examples:
sha256:<hex> — content hash (file:// sources)
commit:<sha> — git commit (hf:// repos pinned to a commit)
etag:<value> — HTTP ETag (http:// sources)
md5:<hex> — MD5 hash (s3:// objects)
timestamp:<rfc3339> — last-modified timestamp (fallback for systems
that expose nothing stronger)
The prefix makes two fingerprints from different sources unambiguously unequal even when the opaque values happen to collide. The empty string is not a valid Fingerprint — callers that want to express "no fingerprint known" should use a separate sentinel.
func (Fingerprint) Scheme ¶
func (f Fingerprint) Scheme() string
Scheme returns the fingerprint's algorithm or identifier prefix (the part before the first colon). Returns "" if the fingerprint is malformed (no colon).
func (Fingerprint) String ¶
func (f Fingerprint) String() string
String returns the fingerprint in its canonical "<scheme>:<value>" form.
type HFSource ¶
type HFSource struct {
// contains filtered or unexported fields
}
HFSource is the Source implementation for hf:// URIs.
URI forms:
hf://org/repo — follows main branch hf://org/repo@ref — ref is a branch, tag, or 40-char commit sha
The source resolves the ref to a full commit sha at Inventory time and uses that pinned sha for all subsequent Open calls. Callers must call Inventory before Open to ensure content is pinned to a specific commit.
func NewHFSource ¶
NewHFSource constructs an HFSource bound to the given hf:// URI. It parses the URI and looks up auth from env vars but does not make any network calls — validation happens at Inventory time.
func (*HFSource) Inventory ¶
Inventory calls the HuggingFace Hub API to list files and resolve the ref to a pinned commit sha. For LFS/xet-tracked files the sha256 digest comes from the API response (free, no download). Inline files (small, git-tracked) are fetched and hashed.
The fingerprint is "commit:<full-sha>".
func (*HFSource) Open ¶
Open returns a reader that streams the file from the HuggingFace CDN. It follows the redirect from the resolve endpoint to the appropriate backend (LFS CDN, xet cas-bridge, or inline git blob).
Open uses the commit sha resolved during Inventory, so file content is pinned to the same revision that was inventoried. If Inventory has not been called, Open falls back to the original ref.
type Inventory ¶
type Inventory struct {
Files []InventoryFile
Fingerprint Fingerprint
}
Inventory is the result of Source.Inventory: everything needed to plan an import without transferring payload bytes.
Fingerprint is the source's version identity for the currently bound URI. Files is the list of content-addressed entries that make up the source; the packer consumes this list to produce tar layers.
func FilterInventory ¶
FilterInventory applies include/exclude glob patterns to an inventory's file list and returns a new inventory with only the matching files. The returned inventory shares the original's Fingerprint (which is the upstream version identity, not affected by filtering).
Semantics:
- If include is non-empty, a file must match at least one include pattern.
- If a file matches any exclude pattern, it is excluded (even if it also matches an include pattern — exclude wins).
- If both lists are empty/nil, all files pass through unchanged.
Pattern matching uses gitignore-style globs via go-gitignore: bare patterns float across directories ("*.bin" matches any depth), path-shaped patterns anchor ("onnx/*.bin" matches direct children of onnx/), and "**" matches any number of path segments.
Returns an error if the filter yields zero files — an empty weight set is almost always a mistake and should surface immediately.
type InventoryFile ¶
type InventoryFile struct {
// Path is the file path relative to the source root, using forward
// slashes regardless of the host OS.
Path string
// Size is the uncompressed file size in bytes.
Size int64
// Digest is the SHA-256 content digest with the "sha256:" prefix.
Digest string
}
InventoryFile is one entry in an Inventory: a file's relative path, size, and content digest. For file:// the digest is computed by walking and hashing; for remote sources it is read from a source-side index.
func (InventoryFile) DirhashParts ¶
func (f InventoryFile) DirhashParts() DirhashPart
DirhashParts implements Dirhashable so InventoryFile slices can be passed directly to DirHash.
type Source ¶
type Source interface {
// Inventory returns the file list and version identity for the
// bound source. For file:// this walks and hashes (unavoidable for
// a local directory). For future remote sources it is expected to
// be cheap — HuggingFace Hub exposes per-file sha256 via its API,
// OCI sources read them from the source manifest's config blob.
Inventory(ctx context.Context) (Inventory, error)
// Open returns a reader for a single file in the source, identified
// by its inventory path (relative to the source root). Called on
// demand during packing. The caller closes the returned reader.
Open(ctx context.Context, path string) (io.ReadCloser, error)
}
Source is the provider for a weight-source scheme, bound at construction time to a specific URI.
Implementations translate a scheme-specific URI (file://, hf://, s3://, http://, ...) into (a) an inventory of what the source contains, and (b) an on-demand byte stream for any one file in that inventory. The weights subsystem drives the import pipeline off these two capabilities — there is deliberately no "materialize the whole source to disk" step, so sources whose contents do not fit on local disk can still flow through the packer one file at a time.
A Source instance is bound to one URI for its entire lifetime. Callers construct a Source via For(uri, projectDir). Methods are expected to be context-cancellable and safe to call concurrently for different paths.
func For ¶
For returns the Source implementation for the given URI's scheme, bound to uri and projectDir.
The scheme is the substring before the first "://". Bare paths (no scheme) are treated as file:// — this accepts both absolute ("/data") and relative ("./weights") forms as a convenience at the interface boundary.
Unknown schemes return a clear error listing the currently supported schemes. This is the only place where scheme → implementation dispatch happens; adding s3:// or http:// is a single case here plus the matching Source implementation.
For validates that the source exists and is usable. A file:// URI that points at a missing path or at a non-directory returns an error here, not at Inventory time.
type ZeroSurvivorsError ¶
ZeroSurvivorsError is returned when include/exclude filtering removes all files from an inventory.
func (*ZeroSurvivorsError) Error ¶
func (e *ZeroSurvivorsError) Error() string