weightsource

package
v0.19.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2026 License: Apache-2.0 Imports: 20 Imported by: 0

Documentation

Overview

Package weightsource is the pluggable source layer for weight imports.

A Source is a stateful provider bound at construction time to a specific URI. It exposes two capabilities: Inventory lists the files the source offers (with sizes, per-file digests, and a source-identity Fingerprint), and Open streams one file's bytes. The packer drives the import one file at a time so that sources larger than local disk can be imported without full materialization.

Implementations exist for file:// (local directory) and hf:// (HuggingFace Hub).

Index

Constants

View Source
const FileScheme = "file"

FileScheme is the URI scheme for local filesystem sources.

View Source
const HFScheme = "hf"

HFScheme is the short URI scheme for HuggingFace Hub sources.

View Source
const HFSchemeLong = "huggingface"

HFSchemeLong is the long-form URI scheme alias for HuggingFace Hub.

Variables

This section is empty.

Functions

func DirHash

func DirHash[F Dirhashable](files []F) string

DirHash computes a content-addressable digest of a file set per spec §2.4:

sha256(join(sort("<hex>  <path>"), "\n"))

where each line is the file's sha256 hex digest and relative path joined by two spaces (matching sha256sum output). DirHash sorts the lines itself, so the caller's input order does not affect the result.

The result is the "sha256:<hex>" form. This formula computes the weight set digest stored in weights.lock (WeightLockEntry.SetDigest), and is also used by file:// sources specifically as their Fingerprint — content-addressable stores happen to match their fingerprint to their dirhash. Other schemes (hf://, s3://, http://) use scheme-native identifiers (commit SHA, ETag, etc.) for their Fingerprint instead.

func NormalizeURI

func NormalizeURI(uri string) (string, error)

NormalizeURI returns the canonical form of a weight source URI.

Each scheme has its own normalization rules:

  • file:// and bare paths → canonical file:// form (see normalizeFileURI)
  • hf:// and huggingface:// → canonical hf:// form (see normalizeHFURI)

Empty strings and unsupported schemes return an error.

Types

type DirhashPart

type DirhashPart struct {
	Path   string
	Digest string
}

DirhashPart is the atomic input to DirHash: the pair of fields that uniquely identify a file's contribution to the dirhash. Path is the relative path (forward slashes) and Digest is the file's sha256 content digest in "sha256:<hex>" form.

func (DirhashPart) String

func (p DirhashPart) String() string

String returns the canonical identity of a single file: "path\x00digest". This is the primitive that any code comparing files across layers, plans, or lockfile entries should use. DirHash composes over this (sorted, then hashed); layer keys join these (preserving individual file identity so two files with identical content but different paths remain distinguishable).

type Dirhashable

type Dirhashable interface {
	DirhashParts() DirhashPart
}

Dirhashable is implemented by types that can participate in DirHash. Both weightsource.InventoryFile and lockfile.WeightLockFile implement it, letting the two call sites share one digest implementation.

type FileSource

type FileSource struct {
	// contains filtered or unexported fields
}

FileSource is the Source implementation for file:// URIs and bare paths.

URIs take one of these forms:

file:///abs/path      — absolute path
file://./rel/path     — canonical relative path (explicit ./)
/abs/path             — bare absolute path (normalized to file://)
./rel/path            — bare relative path (normalized to file://)
rel/path              — bare relative path, no ./ prefix (normalized)

The lockfile stores only the normalized form (see NormalizeURI); the absolute on-disk path is resolved once at construction time so the Source methods do not re-resolve on every call.

func NewFileSource

func NewFileSource(uri, projectDir string) (*FileSource, error)

NewFileSource constructs a FileSource bound to uri, resolving relative URIs against projectDir. It validates that the resolved path exists and is a directory.

func (*FileSource) Inventory

func (s *FileSource) Inventory(ctx context.Context) (Inventory, error)

Inventory walks the source directory and returns per-file path / size / content digest plus the source fingerprint (sha256 of the sorted file set, spec §2.4).

The .cog state directory is skipped. Non-regular entries (symlinks, devices, FIFOs, sockets) are rejected per spec §1.3 — silently dropping them would let a user ship a model missing files they expected. Resolve to regular files before importing.

func (*FileSource) Open

func (s *FileSource) Open(ctx context.Context, path string) (io.ReadCloser, error)

Open returns a reader for a single file in the source, identified by its inventory path (relative to the source root, using forward slashes). The caller closes the returned reader.

type Fingerprint

type Fingerprint string

Fingerprint is a source's version identity, carrying its algorithm (or source-native identifier type) as a scheme prefix.

Examples:

sha256:<hex>            — content hash (file:// sources)
commit:<sha>            — git commit (hf:// repos pinned to a commit)
etag:<value>            — HTTP ETag (http:// sources)
md5:<hex>               — MD5 hash (s3:// objects)
timestamp:<rfc3339>     — last-modified timestamp (fallback for systems
                           that expose nothing stronger)

The prefix makes two fingerprints from different sources unambiguously unequal even when the opaque values happen to collide. The empty string is not a valid Fingerprint — callers that want to express "no fingerprint known" should use a separate sentinel.

func (Fingerprint) Scheme

func (f Fingerprint) Scheme() string

Scheme returns the fingerprint's algorithm or identifier prefix (the part before the first colon). Returns "" if the fingerprint is malformed (no colon).

func (Fingerprint) String

func (f Fingerprint) String() string

String returns the fingerprint in its canonical "<scheme>:<value>" form.

type HFSource

type HFSource struct {
	// contains filtered or unexported fields
}

HFSource is the Source implementation for hf:// URIs.

URI forms:

hf://org/repo         — follows main branch
hf://org/repo@ref     — ref is a branch, tag, or 40-char commit sha

The source resolves the ref to a full commit sha at Inventory time and uses that pinned sha for all subsequent Open calls. Callers must call Inventory before Open to ensure content is pinned to a specific commit.

func NewHFSource

func NewHFSource(uri string) (*HFSource, error)

NewHFSource constructs an HFSource bound to the given hf:// URI. It parses the URI and looks up auth from env vars but does not make any network calls — validation happens at Inventory time.

func (*HFSource) Inventory

func (s *HFSource) Inventory(ctx context.Context) (Inventory, error)

Inventory calls the HuggingFace Hub API to list files and resolve the ref to a pinned commit sha. For LFS/xet-tracked files the sha256 digest comes from the API response (free, no download). Inline files (small, git-tracked) are fetched and hashed.

The fingerprint is "commit:<full-sha>".

func (*HFSource) Open

func (s *HFSource) Open(ctx context.Context, path string) (io.ReadCloser, error)

Open returns a reader that streams the file from the HuggingFace CDN. It follows the redirect from the resolve endpoint to the appropriate backend (LFS CDN, xet cas-bridge, or inline git blob).

Open uses the commit sha resolved during Inventory, so file content is pinned to the same revision that was inventoried. If Inventory has not been called, Open falls back to the original ref.

type Inventory

type Inventory struct {
	Files       []InventoryFile
	Fingerprint Fingerprint
}

Inventory is the result of Source.Inventory: everything needed to plan an import without transferring payload bytes.

Fingerprint is the source's version identity for the currently bound URI. Files is the list of content-addressed entries that make up the source; the packer consumes this list to produce tar layers.

func FilterInventory

func FilterInventory(inv Inventory, include, exclude []string) (Inventory, error)

FilterInventory applies include/exclude glob patterns to an inventory's file list and returns a new inventory with only the matching files. The returned inventory shares the original's Fingerprint (which is the upstream version identity, not affected by filtering).

Semantics:

  • If include is non-empty, a file must match at least one include pattern.
  • If a file matches any exclude pattern, it is excluded (even if it also matches an include pattern — exclude wins).
  • If both lists are empty/nil, all files pass through unchanged.

Pattern matching uses gitignore-style globs via go-gitignore: bare patterns float across directories ("*.bin" matches any depth), path-shaped patterns anchor ("onnx/*.bin" matches direct children of onnx/), and "**" matches any number of path segments.

Returns an error if the filter yields zero files — an empty weight set is almost always a mistake and should surface immediately.

type InventoryFile

type InventoryFile struct {
	// Path is the file path relative to the source root, using forward
	// slashes regardless of the host OS.
	Path string
	// Size is the uncompressed file size in bytes.
	Size int64
	// Digest is the SHA-256 content digest with the "sha256:" prefix.
	Digest string
}

InventoryFile is one entry in an Inventory: a file's relative path, size, and content digest. For file:// the digest is computed by walking and hashing; for remote sources it is read from a source-side index.

func (InventoryFile) DirhashParts

func (f InventoryFile) DirhashParts() DirhashPart

DirhashParts implements Dirhashable so InventoryFile slices can be passed directly to DirHash.

type Source

type Source interface {
	// Inventory returns the file list and version identity for the
	// bound source. For file:// this walks and hashes (unavoidable for
	// a local directory). For future remote sources it is expected to
	// be cheap — HuggingFace Hub exposes per-file sha256 via its API,
	// OCI sources read them from the source manifest's config blob.
	Inventory(ctx context.Context) (Inventory, error)

	// Open returns a reader for a single file in the source, identified
	// by its inventory path (relative to the source root). Called on
	// demand during packing. The caller closes the returned reader.
	Open(ctx context.Context, path string) (io.ReadCloser, error)
}

Source is the provider for a weight-source scheme, bound at construction time to a specific URI.

Implementations translate a scheme-specific URI (file://, hf://, s3://, http://, ...) into (a) an inventory of what the source contains, and (b) an on-demand byte stream for any one file in that inventory. The weights subsystem drives the import pipeline off these two capabilities — there is deliberately no "materialize the whole source to disk" step, so sources whose contents do not fit on local disk can still flow through the packer one file at a time.

A Source instance is bound to one URI for its entire lifetime. Callers construct a Source via For(uri, projectDir). Methods are expected to be context-cancellable and safe to call concurrently for different paths.

func For

func For(uri, projectDir string) (Source, error)

For returns the Source implementation for the given URI's scheme, bound to uri and projectDir.

The scheme is the substring before the first "://". Bare paths (no scheme) are treated as file:// — this accepts both absolute ("/data") and relative ("./weights") forms as a convenience at the interface boundary.

Unknown schemes return a clear error listing the currently supported schemes. This is the only place where scheme → implementation dispatch happens; adding s3:// or http:// is a single case here plus the matching Source implementation.

For validates that the source exists and is usable. A file:// URI that points at a missing path or at a non-directory returns an error here, not at Inventory time.

type ZeroSurvivorsError

type ZeroSurvivorsError struct {
	InventorySize int
	Include       []string
	Exclude       []string
}

ZeroSurvivorsError is returned when include/exclude filtering removes all files from an inventory.

func (*ZeroSurvivorsError) Error

func (e *ZeroSurvivorsError) Error() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL