lfs

package
v0.7.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 3, 2026 License: MIT Imports: 27 Imported by: 0

Documentation

Overview

Package lfs provides a pure-HTTP client for the Git LFS Batch API. No git-lfs binary required. Uses the batch API for blob upload/download.

The client implements the Git LFS Batch API spec: https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md

Package lfs implements Git LFS pointer handling and Batch API transfers in pure Go.

Architectural rule (read before editing any file in this package)

ox never depends on the `git-lfs` binary being installed on the user's machine. All LFS operations — pointer detection, pointer parsing, pointer writing, blob upload, blob download, hydration — are implemented natively in this package. Do NOT shell out to `git lfs ...`. Do NOT call exec.LookPath("git-lfs"). Do NOT write .gitattributes files that would trigger git-lfs smudge/clean filters on checkout.

The rationale:

  • Users of ox almost never have the git-lfs binary installed; requiring it would break the CLI for the majority of coworkers.
  • Committing .gitattributes with filter=lfs entries would auto-hydrate LFS content on every `git checkout`, which we explicitly do not want — dehydrated clones are the default.
  • ox uses the LFS *concept* (content-addressed pointer files referencing blobs stored out-of-band) but implements the transport via pure-HTTP calls to GitLab's Git LFS Batch API in client.go and transfer.go.

What lives where

  • pointer.go — FormatPointer / ParsePointer / IsPointerFile / WritePointerFile (canonical spec-compliant pointer I/O, no binary dep)
  • client.go — Pure-HTTP Git LFS Batch API client (upload/download)
  • transfer.go — Upload/download flows using the Batch API
  • meta.go — SessionMeta / FileRef: the parallel OID manifest stored in meta.json alongside pointer files

If you need to work with LFS from another package

  • Detect a pointer file: lfs.IsPointerFile(path)
  • Parse a pointer file: lfs.ParsePointer(content) / lfs.ReadPointerFile(path)
  • Write a pointer file: lfs.WritePointerFile(path, FileRef{...})
  • Upload/download blob content: lfs.NewClient(repoURL, user, token).Batch(...)

If you find yourself writing `exec.Command("git", "lfs", ...)` anywhere in the ox codebase — stop and come back here. The answer is always "use this package directly." See .claude/rules/lfs-no-git-lfs-binary.md for the full rationale and the list of banned patterns.

Index

Constants

View Source
const (
	StorageLFS = "lfs"
	StorageGit = "git"
)

FileRef storage backends. Use these constants rather than string literals.

View Source
const DefaultMaxObjectSize int64 = 5 * 1024 * 1024 * 1024 // 5 GiB

DefaultMaxObjectSize is the upper bound for LFS objects we accept. Prevents malicious pointers from triggering unbounded disk writes. Override with OX_LFS_MAX_OBJECT_SIZE env var for legitimate large files.

View Source
const MaxSummaryAttempts = 3

MaxSummaryAttempts caps how many failure-stub-producing daemon LLM summarization passes will run for a single session before the daemon flips SummaryStatus to "unrecoverable" and stops retrying. Three is enough to absorb a transient LLM hiccup without burning unbounded tokens on a structurally-broken session.

Variables

ContentFiles lists the session content files eligible for LFS upload. These are the files that get uploaded to LFS blob storage and replaced with pointer files in the git commit.

Derived from pipeline.LedgerContentFiles — the canonical source of truth for "what counts as a session artifact." Adding a new artifact there automatically makes it eligible for LFS upload; the two call sites cannot drift out of sync.

View Source
var LeakySummaryPrefixes = []string{
	"Summary failed content validation",
	"Summary failed richness validation",
	"Summary generation failed",
}

LeakySummaryPrefixes lists the known sentinel strings that have, at various points, leaked from validators / fallback stubs into the user-visible meta.Summary or meta.Title fields. ValidateUserVisible rejects any meta whose Title or Summary begins with one of these.

Why this exists as a hard guard:

The ox-qqka audit found 14 sessions on the SageOx Internal ledger where a validator failure had been written verbatim into meta.Summary, then surfaced through the api-go list handler and rendered by the web UI as the row title. The bug slipped past every per-layer test because no test asserted the cross-layer invariant "user-visible fields never carry an internal error message". This list is that invariant, encoded.

Belt-and-suspenders: even after the producer-side fixes (ox-qqka, ox-wstd) ship, the writer rejects any future regression at the boundary. New leak shapes get appended here as we find them.

Functions

func ComputeOID

func ComputeOID(content []byte) string

ComputeOID computes the SHA256 hex digest of content (the LFS OID).

func DownloadAndVerifyObject added in v0.6.0

func DownloadAndVerifyObject(action *Action, expectedOID string) ([]byte, error)

DownloadAndVerifyObject downloads a blob and verifies its SHA256 matches the expected OID. Accepts both bare hex and canonical "sha256:<hex>" OID formats.

func DownloadObject

func DownloadObject(action *Action) ([]byte, error)

DownloadObject downloads a single blob using the action href.

func DownloadToFile added in v0.6.1

func DownloadToFile(action *Action, dst io.Writer, verify bool, expectedOID string) error

DownloadToFile streams a blob directly to dst, hashing incrementally when verify is true. Avoids buffering the entire object in memory.

func EnsureSessionsGitignore added in v0.6.0

func EnsureSessionsGitignore(sessionsDir string) error

EnsureSessionsGitignore ensures the sessions/.gitignore exists in the ledger. LFS pointer files and meta.json are committed to git; pointer files (~130 bytes) reference uploaded LFS objects by OID to prevent garbage collection. Overwrites legacy .gitignore that excluded content file extensions.

func FindPointerStubsWithMissingBlobs added in v0.6.1

func FindPointerStubsWithMissingBlobs(client *Client, sessionPath string, logger *slog.Logger) []string

FindPointerStubsWithMissingBlobs checks which content files in sessionPath are LFS pointer stubs whose backing blobs do NOT exist in the remote LFS store. Returns the filenames of files whose blobs are missing. Returns nil if all blobs exist or if no pointer stubs are present.

Use this before committing pointer stubs to the ledger — committing stubs with missing remote blobs causes GitLab's pre-receive hook to reject the entire push, blocking all sessions until the bad commits are removed.

func FormatPointer added in v0.3.0

func FormatPointer(oid string, size int64) string

FormatPointer returns canonical LFS pointer file content for the given OID and size. OID must include the "sha256:" prefix (matching FileRef.OID convention).

Per spec: version line first, then remaining keys in alphabetical order. "oid" < "size" lexicographically, so the ordering is: version, oid, size. Each line is "key SP value LF" with Unix line endings (\n, not \r\n).

func IsLeakySummaryString added in v0.7.0

func IsLeakySummaryString(s string) bool

IsLeakySummaryString reports whether s matches one of the known validator/error string patterns that leaked into user-visible fields. Exported so the retro-cleanup tool (ox-l4mj) can use the same definition the writer rejects on. Empty strings are NOT leaky (an empty user-visible field is legal — see ValidateUserVisible).

func IsPointerFile added in v0.3.0

func IsPointerFile(path string) bool

IsPointerFile reports whether the file at path is an LFS pointer. Returns false for missing files, content files, or read errors. Detection is by content format (version + oid + size), not by filename or .gitattributes — matching how git-lfs itself identifies pointers.

func MaxObjectSize added in v0.7.1

func MaxObjectSize() int64

MaxObjectSize returns the configured maximum LFS object size. Reads OX_LFS_MAX_OBJECT_SIZE env var, falling back to DefaultMaxObjectSize.

func MutateSessionMeta added in v0.7.0

func MutateSessionMeta(ctx context.Context, sessionPath string, mutate func(*SessionMeta) (*SessionMeta, error)) error

MutateSessionMeta runs an exclusive read-modify-write under an advisory flock on meta.json. Any code path where the daemon and CLI both mutate the manifest (session_finalize summary write, session_upload artifact registration) MUST go through this so they serialize at the FS level.

The mutator is given a fresh copy of the on-disk SessionMeta to mutate in place. If the file does not exist, mutator receives nil and may return a freshly-constructed *SessionMeta to write; returning nil without writing is a no-op (useful for "only update if exists" guards). Returning an error aborts the write.

func ParsePointer added in v0.3.0

func ParsePointer(content string) (oid string, size int64, err error)

ParsePointer parses LFS pointer file content and returns the OID and size. Returns an error if the content is not a valid LFS pointer.

Per spec: version line must appear first; remaining keys ("oid", "size") are in alphabetical order. Unknown keys (e.g. "ext-0-*") are silently ignored, allowing forward compatibility with spec extensions.

func PreservedSessionID added in v0.7.0

func PreservedSessionID(sessionDir string) (string, error)

PreservedSessionID reads meta.json at sessionDir and returns the SessionID found there. It is the canonical way for any republish path (CLI session stop, daemon recovery, orphan retry) to look up a previously-stamped ID before building a fresh meta.

Three return shapes:

  • ("ses_...", nil): meta.json exists with a populated SessionID. Caller MUST chain .SessionID(returned) onto its builder so the republish does not rotate the ID.
  • ("", nil): meta.json is genuinely absent (NotExist) OR exists but has no SessionID (legacy pre-rollout file). Caller may mint fresh via sessionid.GenerateSessionID() (already done by sessionMetaBase).
  • ("", non-nil err): meta.json exists but cannot be read or parsed (corrupted, IO failure, permission). Caller MUST treat this as fatal and surface the error — silently minting a fresh SessionID here would rotate an ID that may already be cached by the server or by other coworkers, breaking dedup.

The strict "non-NotExist error is fatal" rule exists because there is no safe heuristic for "meta.json exists but I couldn't read it" — we don't know whether it had a SessionID we'd overwrite. Refusing to proceed is the only conservative choice.

func ResolveContentPath added in v0.7.0

func ResolveContentPath(sessionDir, cacheDir, filename string) string

ResolveContentPath returns a path that holds REAL session content (not an LFS pointer stub) for filename, choosing in this order:

  1. cacheDir/filename — the canonical hydrated location for content owned by other team members. Cache files are full content by definition.
  2. sessionDir/filename — only when it exists as real content (not a pointer). This case applies to a coworker's own freshly-recorded session before LFS upload; for any session synced from the ledger, the in-place file MUST be a pointer.

Returns "" when neither location has hydrated content (caller must hydrate).

CACHE-ONLY DESIGN — DO NOT WRITE TO sessionDir/filename

This resolver enforces a load-bearing invariant: the in-place git-tracked file MUST stay as an LFS pointer for any session synced from the ledger. The cache is where hydrated content lives. Two failure modes if this invariant is broken:

  • commitAndPushLedger globs *.jsonl/*.html/*.md inside the session dir and stages whatever is there. A hydrated in-place raw.jsonl gets committed as a regular git blob, replacing the LFS pointer reference and breaking LFS linkage. The ledger then rejects future pushes for any session whose meta.json references the now-orphaned OID.

  • The daemon's session-finalize anti-entropy skips sessions whose raw.jsonl IS a pointer (internal/daemon/agentwork/session_finalize.go). When in-place is full content, the skip doesn't apply and the daemon can re-finalize already-finalized sessions, racing with concurrent regen and clobbering good summaries with failure-marker stubs.

Both failures were observed in the 2026-04-25 Phase 2 regen: 31 of 71 sessions had their summaries clobbered, 2 had raw.jsonl committed as full git blobs. See bd ox-4ncz for the post-mortem.

All readers (regenerate, view, lint, token-optimize) MUST consult this resolver. Hydration paths (downloadFileFromLFS, hydrateFromLedger) MUST write only to cacheDir.

func UpdateMetaSummary added in v0.2.0

func UpdateMetaSummary(sessionPath, title string) error

UpdateMetaSummary reads meta.json from sessionPath, updates BOTH the Title and Summary fields with the given string, and re-writes atomically. The caller always passes an AI-generated title — both fields get set so meta.json stays consistent regardless of which consumer reads which field.

Why both fields

meta.Title is the canonical short descriptor (5-10 word session name). meta.Summary historically held a short descriptive string too, before meta.Title existed. Some consumers (older ox versions, tools downstream) read meta.Summary for display; newer ones read meta.Title. Setting both closes the ox-g5zw distiller bug where meta.Title was left empty because this function only touched Summary despite its callers always passing a Title. Result: 91/155 sessions shipped with empty titles on the ox team's ledger before a mass backfill. Fixing at source ensures the bug can't recur on new sessions.

If a separate short-summary string is ever needed alongside the title, add a distinct UpdateMetaSummaryOnly function; don't reintroduce the single-field ambiguity.

func UploadObject

func UploadObject(action *Action, content []byte) error

UploadObject uploads a single blob using the action href from the batch response.

func UploadSessionFiles added in v0.6.0

func UploadSessionFiles(client *Client, sessionPath string, logger *slog.Logger) (map[string]FileRef, error)

UploadSessionFiles uploads session content files to LFS and returns the filename->FileRef manifest for inclusion in meta.json.

The caller provides the LFS client (credential resolution is caller's responsibility — CLI and daemon resolve credentials differently).

Flow:

  1. Read all content files from session dir
  2. Compute SHA256 OIDs + sizes
  3. Call LFS batch API to get upload actions
  4. Upload all blobs in parallel
  5. Return filename->FileRef map for meta.json

func ValidateRelativePath added in v0.7.1

func ValidateRelativePath(name string) error

ValidateRelativePath rejects filenames that could escape a directory boundary. Call this on any filename from meta.json, import manifests, or other trust-boundary-crossing paths before using it in filepath.Join.

func ValidateUserVisible added in v0.7.0

func ValidateUserVisible(meta *SessionMeta) error

ValidateUserVisible reports an error when meta carries a known leaky string in a user-visible field (Title or Summary). It is the invariant we want enforced at every write — see WriteSessionMeta / WriteSessionMetaOnly which both call it.

Empty Title and Summary are LEGAL (a session with no successful summary yet). What is illegal is a non-empty value that is actually a validator/diagnostic string disguised as a title.

nil meta is reported as nil so callers can chain validation without a separate nil-check; nil-meta is rejected later in WriteSessionMeta.

func VerifyObject added in v0.6.0

func VerifyObject(action *Action, oid string, size int64) error

VerifyObject confirms a blob was received by the server by POSTing to the verify action href per the Git LFS batch API spec. The server responds 200 if the object exists with matching OID and size.

func WritePointerFile added in v0.3.0

func WritePointerFile(path string, ref FileRef) error

WritePointerFile writes a standard LFS pointer file at path. Replaces any existing file (content is already uploaded to LFS).

func WritePointerFiles added in v0.3.0

func WritePointerFiles(dir string, files map[string]FileRef) ([]string, error)

WritePointerFiles writes LFS pointer files for each LFS-stored entry in files. Keys are filenames written as dir/<key>. Returns sorted absolute paths of written files. Both sessions and imports use this to create the standard git-lfs pointer files that prevent garbage collection.

Entries with Storage=git (committed directly to git, e.g. summary.json) are skipped — writing a pointer file there would clobber the real content with empty bytes. Legacy entries (no Storage field) are treated as LFS per FileRef.EffectiveStorage().

func WriteSessionMeta

func WriteSessionMeta(sessionPath string, meta *SessionMeta) error

WriteSessionMeta writes meta.json to the given session directory. When meta.Files is populated, also replaces content files with LFS pointer files (standard git-lfs naming). Pointer write failures are non-fatal — session data is safe in LFS + meta.json regardless.

Callers that need to push content files to git BEFORE replacing them with pointer stubs should use WriteSessionMetaOnly followed by WritePointerFiles after a successful push.

func WriteSessionMetaOnly added in v0.6.0

func WriteSessionMetaOnly(sessionPath string, meta *SessionMeta) error

WriteSessionMetaOnly writes meta.json without replacing content files with LFS pointer stubs. Use this when content files must remain intact until a successful git push — call WritePointerFiles separately after the push so that push failure never leaves a session with pointer stubs but no remote copy.

The on-disk write is atomic via fileutil.AtomicWriteBytes (random temp + fsync + rename + parent dir fsync). The previous implementation used the literal "meta.json.tmp" as the temp path, which raced with concurrent writers — both rename'd the same temp inode and one writer saw ENOENT. Random suffix per write closes that loophole.

Callers that mutate meta.json (read → modify → write) MUST do the entire RMW under MutateSessionMeta so the daemon and CLI don't lose each other's fields. WriteSessionMetaOnly itself is unlocked for backwards compat; MutateSessionMeta is the safe path.

Types

type Action

type Action struct {
	Href      string            `json:"href"`
	Header    map[string]string `json:"header,omitempty"`
	ExpiresIn int               `json:"expires_in,omitempty"` // seconds
	ExpiresAt string            `json:"expires_at,omitempty"` // RFC3339
}

Action is a single LFS action with an href and optional headers.

type Actions

type Actions struct {
	Upload   *Action `json:"upload,omitempty"`
	Download *Action `json:"download,omitempty"`
	Verify   *Action `json:"verify,omitempty"`
}

Actions contains the upload/download actions returned by the batch API.

type BatchObject

type BatchObject struct {
	OID  string `json:"oid"`  // SHA256 hex digest
	Size int64  `json:"size"` // bytes
}

BatchObject identifies a single LFS object by its SHA256 OID and size.

type BatchResponse

type BatchResponse struct {
	Transfer string                `json:"transfer"` // "basic"
	Objects  []BatchResponseObject `json:"objects"`
}

BatchResponse is the server response from the batch API.

type BatchResponseObject

type BatchResponseObject struct {
	OID           string       `json:"oid"`
	Size          int64        `json:"size"`
	Authenticated bool         `json:"authenticated,omitempty"`
	Actions       *Actions     `json:"actions,omitempty"`
	Error         *ObjectError `json:"error,omitempty"`
}

BatchResponseObject is a single object in the batch response.

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client communicates with a Git LFS Batch API server (e.g., GitLab).

func NewClient

func NewClient(repoURL, username, token string) *Client

NewClient creates an LFS client for the given git repo URL. repoURL should be the git clone URL (e.g., https://git.sageox.io/sageox/ledger.git). Auth uses HTTP Basic per the Git LFS spec: username:token base64-encoded.

func NewClientFromLedger added in v0.6.0

func NewClientFromLedger(ledgerPath, endpointURL string) (*Client, error)

NewClientFromLedger creates an LFS client using the ledger's git remote URL and credentials loaded for the given endpoint. This is a convenience constructor for callers that already have the ledger path (e.g., daemon session finalization).

func (*Client) BatchDownload

func (c *Client) BatchDownload(objects []BatchObject) (*BatchResponse, error)

BatchDownload requests download URLs for the given objects.

func (*Client) BatchUpload

func (c *Client) BatchUpload(objects []BatchObject) (*BatchResponse, error)

BatchUpload requests upload URLs for the given objects.

type DownloadResult

type DownloadResult struct {
	OID     string
	Content []byte
	Error   error
}

DownloadResult tracks the outcome of a single download.

func DownloadAll

func DownloadAll(resp *BatchResponse, maxConcurrent int) []DownloadResult

DownloadAll downloads multiple blobs in parallel. Returns results for every object so callers can see all errors, not just the first.

type FileRef

type FileRef struct {
	Storage string `json:"storage,omitempty"` // "lfs" | "git"; empty == "lfs" for legacy reads
	OID     string `json:"oid,omitempty"`     // "sha256:<hex>" — populated only for Storage=="lfs"
	Size    int64  `json:"size"`              // bytes (always populated)
}

FileRef identifies a session content file by storage backend, OID (for LFS files), and size.

Storage tag

The Storage field declares which backend holds the bytes:

  • StorageLFS — content is in the LFS blob store, identified by OID. The in-place git-tracked file is a ~130-byte pointer.
  • StorageGit — content is committed directly to git as a regular blob (small JSON, e.g. summary.json). OID is empty.

Backwards compatibility

Pre-Storage meta.json files have FileRef{OID, Size} and no Storage field. JSON unmarshalling leaves Storage="" on those entries; the reader's canonical helper FileRef.EffectiveStorage() promotes empty to StorageLFS (the only legal value at the time those files were written). All call sites MUST go through EffectiveStorage() rather than reading f.Storage directly. Writers set Storage explicitly for new entries; legacy entries stay untouched on disk until something rewrites the manifest.

See ADR-016 (delegation) and meta.json manifest refactor (bd ox-9mrk).

func NewFileRef

func NewFileRef(content []byte) FileRef

NewFileRef creates a FileRef for an LFS-stored file from its content bytes. Computes the OID and stamps Storage=lfs explicitly so future readers don't need to fall back to the empty-means-lfs legacy rule.

func NewGitFileRef added in v0.7.0

func NewGitFileRef(size int64) FileRef

NewGitFileRef creates a FileRef for a file committed directly to git (no LFS). Used for small artifacts like summary.json that are not worth indirecting through LFS. OID is intentionally empty.

func ReadPointerFile added in v0.3.0

func ReadPointerFile(path string) (FileRef, error)

ReadPointerFile reads and parses an LFS pointer file, returning the FileRef.

func (FileRef) BareOID

func (f FileRef) BareOID() string

BareOID returns the hex digest without the "sha256:" prefix.

func (FileRef) EffectiveStorage added in v0.7.0

func (f FileRef) EffectiveStorage() string

EffectiveStorage returns the storage backend for this FileRef, promoting empty (legacy meta.json with no storage tag) to StorageLFS. All readers that branch on storage MUST use this helper.

func (FileRef) IsGit added in v0.7.0

func (f FileRef) IsGit() bool

IsGit reports whether this FileRef is stored directly in git (no LFS).

func (FileRef) IsLFS added in v0.7.0

func (f FileRef) IsLFS() bool

IsLFS reports whether this FileRef is stored in LFS (including legacy entries with no Storage field).

type HydrationStatus

type HydrationStatus string

HydrationStatus describes whether a session's content files are present locally.

const (
	// HydrationStatusHydrated means all content files are present locally.
	HydrationStatusHydrated HydrationStatus = "hydrated"
	// HydrationStatusDehydrated means no content files are present (only meta.json).
	HydrationStatusDehydrated HydrationStatus = "dehydrated"
	// HydrationStatusPartial means some content files are present.
	HydrationStatusPartial HydrationStatus = "partial"
)

func CheckHydrationStatus

func CheckHydrationStatus(sessionPath string, meta *SessionMeta) HydrationStatus

CheckHydrationStatus checks which content files are present as real content (not LFS pointers) for a session. Files that are missing or contain only an LFS pointer are considered dehydrated.

func CheckHydrationStatusWithCache added in v0.6.1

func CheckHydrationStatusWithCache(sessionPath, cachePath string, meta *SessionMeta) HydrationStatus

CheckHydrationStatusWithCache checks hydration across the primary session path and a cache path. A file counts as hydrated if it exists as real content (not a pointer) in the primary path OR exists in the cache path (cache never has pointers).

type MetaRepairOutcome added in v0.7.1

type MetaRepairOutcome struct {
	SessionName       string
	Skipped           bool   // meta.json was already healthy or terminal
	RecoveredFromJSON bool   // pulled a clean title out of summary.json
	BumpedAttempts    bool   // no recovery available; SummaryAttempts incremented
	FlippedTerminal   bool   // hit MaxSummaryAttempts; status set to unrecoverable
	Error             string // non-fatal; meta.json was not modified
}

MetaRepairOutcome reports what RecoverEmptyTitleMeta did to a single session's meta.json. Designed to be cheap to produce on every call — the common path (nothing to repair) returns Skipped=true and the caller can ignore the rest.

func RecoverEmptyTitleMeta added in v0.7.1

func RecoverEmptyTitleMeta(sessionDir string, dryRun bool) MetaRepairOutcome

RecoverEmptyTitleMeta inspects one session's meta.json for the post-Apr-27 empty-title failure shape (meta.title=="" with status != unrecoverable). When summary.json carries a real title, promotes it back into meta and stamps SummaryStatus=ok. Otherwise increments SummaryAttempts and, at MaxSummaryAttempts, flips status to unrecoverable so future calls early-exit.

This is the daemon-safe core of the empty-title repair flow. Both the CLI `ox session repair-meta-summary` tool and the autofix scheduler delegate to this so the on-disk behavior is identical.

Idempotency contract: running this repeatedly on a healthy meta is a no-op (Skipped=true, no write). Running it repeatedly on an unrecoverable meta is also a no-op. Running it on a fixable meta applies the fix once and then early-exits on subsequent calls.

dryRun=true returns the outcome without writing meta.json.

type ObjectError

type ObjectError struct {
	Code    int    `json:"code"`
	Message string `json:"message"`
}

ObjectError is returned when the server cannot process an object.

type ReconcileResult added in v0.6.2

type ReconcileResult struct {
	ScannedPointers int      // total pointer files found in sessions/
	MissingOnRemote int      // pointers whose LFS OIDs are not in the remote store
	Replaced        int      // pointers replaced with empty stubs
	Squashed        bool     // whether unpushed history was squashed
	ReplacedFiles   []string // relative paths of replaced files
}

ReconcileResult describes what ReconcileUnpushedPointers found and fixed.

func ReconcileUnpushedPointers added in v0.6.2

func ReconcileUnpushedPointers(ctx context.Context, ledgerPath, endpointURL string, logger *slog.Logger) (*ReconcileResult, error)

ReconcileUnpushedPointers scans the working tree under sessions/ for LFS pointer files whose backing blobs are missing from the remote LFS store, replaces them with empty stubs, and squashes all unpushed commits into one so the poisoned pointer blobs no longer appear in the push pack.

This is the repair mechanism for ledgers whose push is blocked by "GitLab: LFS objects are missing" — regardless of HOW the bad pointer got committed (daemon murmur, user manual commit, unscoped git add, etc.).

Safe to call on clean repos — returns immediately if no pointer files exist or all pointers have valid backing objects.

The squash is necessary because GitLab's pre-receive hook scans ALL commits in the push pack, not just HEAD. Replacing pointers in the working tree and committing on top is not sufficient — the old commits still reference the missing OIDs.

type SessionMeta

type SessionMeta struct {
	Version     string `json:"version"` // "1.0"
	SessionName string `json:"session_name"`
	Username    string `json:"username"` // privacy-safe display name — via identity.AttributionDisplayName(). Shared in ledger. NOT an email.
	UserID      string `json:"user_id,omitempty"`
	AgentID     string `json:"agent_id"`

	// SessionID is the globally unique, content-bound identifier for THIS
	// specific recording. Format: "ses_<UUIDv7>". Populated at session
	// creation time and never regenerated. Independent of path/name so
	// renames, moves, and re-imports do not change identity.
	//
	// Do NOT confuse with OxSID (per-agent-instance, reused across many
	// recordings during a 24h prime window) or AgentID (per-agent, reused
	// across all of that agent's recordings).
	//
	// # Backwards compatibility
	//
	// Pre-existing meta.json files do not carry this field. The compat model:
	//
	//   - JSON tag is `omitempty` — older readers see no schema change;
	//     newer readers see "" for legacy sessions.
	//   - Version is NOT bumped — additive optional field, no breaking
	//     change to the on-disk format.
	//   - Legacy sessions on disk are NEVER backfilled automatically. The
	//     deterministic EffectiveSessionID() helper synthesizes a stable
	//     ses_<UUIDv5> from (RepoID, SessionName) on every read, so
	//     consumers always get a ses_-prefixed value without writing to
	//     old meta.json. Doctor offers an opt-in backfill (FixLevelSuggested).
	//   - All consumers MUST go through EffectiveSessionID() rather than
	//     reading SessionID directly. Direct reads return "" for legacy
	//     and silently break dedup/lookup.
	SessionID string `json:"session_id,omitempty"`

	AgentType           string    `json:"agent_type"` // "claude-code", "cursor", etc.
	Model               string    `json:"model,omitempty"`
	Title               string    `json:"title,omitempty"`
	CreatedAt           time.Time `json:"created_at"`
	EntryCount          int       `json:"entry_count,omitempty"`
	Summary             string    `json:"summary,omitempty"`
	StopReason          string    `json:"stop_reason,omitempty"` // how session ended: "stopped", "aborted", "recovered", ""
	RepoID              string    `json:"repo_id,omitempty"`
	SageoxScore         *float64  `json:"sageox_score,omitempty"`          // agent's self-reported contribution score (0.0-1.0)
	SageoxScoreCategory string    `json:"sageox_score_category,omitempty"` // named category: none, minor, moderate, significant, critical
	SageoxScoreReason   string    `json:"sageox_score_reason,omitempty"`   // detailed explanation of SageOx influence

	// SummaryStatus and ValidationError mirror the same-named fields on
	// pkg/sessionsummary.SummarizeResponse. SummaryStatus is the
	// structured signal — readers should prefer it over sniffing
	// Summary for sentinel error strings. ValidationError is ops-only;
	// it MUST NEVER be rendered as a user-visible session title or
	// summary. See ox-qqka for the leak this prevents.
	//
	// Both are omitempty so older readers and older on-disk meta.json
	// files keep working unchanged.
	SummaryStatus   string `json:"summary_status,omitempty"`
	ValidationError string `json:"validation_error,omitempty"`

	// SummaryAttempts counts how many daemon-side LLM summarization
	// attempts have produced a failure stub for this session. Used by
	// the daemon to cap retries: after MaxSummaryAttempts the status
	// is flipped to "unrecoverable" and the daemon stops re-finalizing.
	// Without this, an LLM that consistently fails on a given session
	// (e.g. raw.jsonl is corrupt, prompt is too large, model is having
	// a bad day) ends up burning tokens on every anti-entropy cycle
	// and overwriting whatever local state existed with the same
	// failure-stub shape. omitempty so older meta.json files keep
	// working unchanged.
	SummaryAttempts int `json:"summary_attempts,omitempty"`

	Files map[string]FileRef `json:"files"` // OID manifest: filename -> ref
}

SessionMeta is the git-tracked metadata + OID manifest for a session. Stored as meta.json in each session folder. When Files is populated, WriteSessionMeta also writes LFS pointer files (standard git-lfs naming) to replace content files, preventing LFS garbage collection.

func ReadSessionMeta

func ReadSessionMeta(sessionPath string) (*SessionMeta, error)

ReadSessionMeta reads meta.json from the given session directory.

func (*SessionMeta) EffectiveSessionID added in v0.7.0

func (m *SessionMeta) EffectiveSessionID() string

EffectiveSessionID returns the canonical "ses_"-prefixed identifier for this recording, regardless of whether the recording predates the SessionID field.

  • If meta.SessionID is non-empty (post-rollout), it is returned verbatim.
  • Otherwise the result is a deterministic ses_<UUIDv5> derived from (RepoID, SessionName) using legacySessionNamespace.

Deterministic: calling EffectiveSessionID twice on the same legacy session always returns the same value.

Why UUIDv5 over (RepoID, SessionName) and not OxSID

OxSID is per-prime, not per-recording (cmd/ox/agent_prime.go:514; reused at 540, 746, 1614, 1630, 1650). Two recordings produced by the same prime share an OxSID and would collide. SessionName is the only per-recording entropy already present in meta.json; using it here also avoids LFS hydration of raw.jsonl on dehydrated clones.

All call sites that need a stable per-recording handle MUST go through this helper. Reading m.SessionID directly returns "" for legacy recordings and silently breaks dedup/lookup.

func (*SessionMeta) Validate added in v0.7.0

func (m *SessionMeta) Validate() error

Validate runs the full structural invariant check on a SessionMeta. Today that's just ValidateUserVisible; if more invariants are added, this is where they go.

type SessionMetaBuilder added in v0.2.0

type SessionMetaBuilder struct {
	// contains filtered or unexported fields
}

SessionMetaBuilder constructs SessionMeta with required fields and optional setters.

func NewSessionMeta added in v0.2.0

func NewSessionMeta(sessionName, username, agentID, agentType string, createdAt time.Time) *SessionMetaBuilder

NewSessionMeta creates a builder with required fields pre-filled.

func (*SessionMetaBuilder) Build added in v0.2.0

func (b *SessionMetaBuilder) Build() *SessionMeta

Build returns the constructed SessionMeta.

func (*SessionMetaBuilder) EntryCount added in v0.2.0

func (b *SessionMetaBuilder) EntryCount(n int) *SessionMetaBuilder

func (*SessionMetaBuilder) Model added in v0.2.0

func (*SessionMetaBuilder) RepoID added in v0.2.0

func (*SessionMetaBuilder) SageoxScore added in v0.6.2

func (b *SessionMetaBuilder) SageoxScore(score float64, category, reason string) *SessionMetaBuilder

func (*SessionMetaBuilder) SessionID added in v0.7.0

func (b *SessionMetaBuilder) SessionID(id string) *SessionMetaBuilder

SessionID stamps the per-recording ses_<UUIDv7>. Caller is expected to pass sessionid.GenerateSessionID() at session creation time. Never regenerated: MutateSessionMeta-based RMW paths preserve it via JSON round-trip.

func (*SessionMetaBuilder) StopReason added in v0.5.0

func (b *SessionMetaBuilder) StopReason(reason string) *SessionMetaBuilder

func (*SessionMetaBuilder) Summary added in v0.2.0

func (*SessionMetaBuilder) SummaryAttempts added in v0.7.1

func (b *SessionMetaBuilder) SummaryAttempts(n int) *SessionMetaBuilder

SummaryAttempts stamps the daemon's failure-stub retry counter. Used by the daemon path to cap how many times a structurally-broken session is re-finalized before being marked unrecoverable. Inline (CLI) writers should leave this at zero.

func (*SessionMetaBuilder) SummaryStatus added in v0.7.0

func (b *SessionMetaBuilder) SummaryStatus(status string) *SessionMetaBuilder

SummaryStatus stamps the lifecycle status (ok / pending / failed_validation / unrecoverable). Use the SummaryStatus* constants from pkg/sessionsummary, mirrored here.

func (*SessionMetaBuilder) Title added in v0.2.0

func (*SessionMetaBuilder) UserID added in v0.2.0

func (*SessionMetaBuilder) ValidationError added in v0.7.0

func (b *SessionMetaBuilder) ValidationError(msg string) *SessionMetaBuilder

ValidationError records the ops-facing validator diagnostic. Callers must never put this string into Title or Summary — it is engineer- visible only. See ox-qqka for the leak this prevents.

func (*SessionMetaBuilder) WithFiles added in v0.2.0

func (b *SessionMetaBuilder) WithFiles(f map[string]FileRef) *SessionMetaBuilder

type UploadResult

type UploadResult struct {
	OID   string
	Error error
}

UploadResult tracks the outcome of a single upload.

func UploadAll

func UploadAll(resp *BatchResponse, files map[string][]byte, maxConcurrent int) []UploadResult

UploadAll uploads multiple blobs in parallel. files maps OID -> content. Uses objects from the batch response to find upload actions.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL