corpus

package
v2.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 18, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package corpus defines the neutral record unit that Stroma indexes.

Index

Constants

View Source
const (
	// DefaultKind is used when callers omit a more specific record kind.
	DefaultKind = "artifact"

	// FormatMarkdown stores Markdown bodies that can be heading-chunked.
	FormatMarkdown = "markdown"

	// FormatPlaintext stores plain text bodies as a single chunk.
	FormatPlaintext = "plaintext"
)

Variables

This section is empty.

Functions

func Fingerprint

func Fingerprint(records []Record) (string, error)

Fingerprint summarizes a set of records deterministically and returns an error if any record fails Normalize. Prior versions silently skipped invalid records, which meant "corpus with an invalid record" produced the same digest as "same corpus without that record" — masking silent reuse of a snapshot that was missing data the caller thought they had. The loud failure forces callers to surface the problem instead.

func FingerprintFromPairs

func FingerprintFromPairs(pairs []RefHash) (string, error)

FingerprintFromPairs returns the same digest as Fingerprint([]Record) for already-normalized (Ref, ContentHash) inputs: non-empty after trimming, with ContentHash already computed. Pairs with an empty Ref or ContentHash return an error — unlike Fingerprint([]Record), this helper cannot apply Normalize() defaults or regenerate ContentHash via HashRecord from other fields, so its output only matches Fingerprint when the inputs already satisfy that invariant. Callers reading persisted rows must enforce the invariant at read time (as index.loadCurrentRefHashes does) or use Fingerprint instead.

func HashRecord

func HashRecord(r Record) string

HashRecord returns a deterministic content hash for a normalized record.

Each field contributes a `%q=%q` pair so the encoding is injective: no combination of key/value strings (including ones that contain `=` or newline characters) can produce the same serialized prefix as a different field. Serialized parts are then sorted and SHA-256-joined in fingerprint. Changing this encoding requires a schema_version bump and a migration that rewrites persisted content_hash values (see migrateV3ToV4).

Types

type Record

type Record struct {
	Ref         string
	Kind        string
	Title       string
	SourceRef   string
	BodyFormat  string
	BodyText    string
	ContentHash string
	Metadata    map[string]string
}

Record is the neutral corpus unit Stroma indexes.

func NewRecord

func NewRecord(ref, title, body string) Record

NewRecord constructs a Record for the common path: a reference, a human title, and a Markdown body. All other fields default through Normalize at the point the record enters a Rebuild / Update call. Callers with non-default kinds, custom metadata, or plaintext bodies should build the struct directly and pass it through Normalize.

func (Record) Normalize

func (r Record) Normalize() (Record, error)

Normalize returns a trimmed, validated record with safe defaults applied: missing Kind defaults to DefaultKind, Title and SourceRef default to Ref, BodyFormat defaults to FormatMarkdown, and ContentHash is regenerated from HashRecord when empty. The returned record satisfies Validate by construction.

This is the entry point callers should use before invoking Validate. Calling Validate on a raw Record will reject records that would have been accepted after defaults — for example Record{Ref:"x"} fails Validate (no kind) but Normalize fills in DefaultKind and succeeds.

func (Record) Normalized deprecated

func (r Record) Normalized() (Record, error)

Normalized is a deprecated alias for Normalize retained so downstream callers that upgrade without switching name immediately keep compiling. New code should use Normalize. Scheduled for removal one release after adoption settles.

Deprecated: use Normalize.

func (Record) Validate

func (r Record) Validate() error

Validate reports whether the record is complete enough to persist. Callers constructing a Record by hand should prefer Normalize, which applies safe defaults before validation — calling Validate directly on a raw struct will reject records that Normalize would have accepted (e.g. a missing Kind).

type RefHash

type RefHash struct {
	Ref         string
	ContentHash string
}

RefHash is the minimal (Ref, ContentHash) pair needed to compute a corpus fingerprint without materializing full record bodies.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL