corpus

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 18, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package corpus defines the neutral record unit that Stroma indexes.

Index

Constants

View Source
const (
	// DefaultKind is used when callers omit a more specific record kind.
	DefaultKind = "artifact"

	// FormatMarkdown stores Markdown bodies that can be heading-chunked.
	FormatMarkdown = "markdown"

	// FormatPlaintext stores plain text bodies as a single chunk.
	FormatPlaintext = "plaintext"
)

Variables

This section is empty.

Functions

func Fingerprint

func Fingerprint(records []Record) (string, error)

Fingerprint summarizes a set of records deterministically and returns an error if any record fails Normalized(). Prior versions silently skipped invalid records, which meant "corpus with an invalid record" produced the same digest as "same corpus without that record" — masking silent reuse of a snapshot that was missing data the caller thought they had. The loud failure forces callers to surface the problem instead.

func FingerprintFromPairs added in v1.0.0

func FingerprintFromPairs(pairs []RefHash) (string, error)

FingerprintFromPairs returns the same digest as Fingerprint([]Record) for already-normalized (Ref, ContentHash) inputs: non-empty after trimming, with ContentHash already computed. Pairs with an empty Ref or ContentHash return an error — unlike Fingerprint([]Record), this helper cannot apply Normalized() defaults or regenerate ContentHash via HashRecord from other fields, so its output only matches Fingerprint when the inputs already satisfy that invariant. Callers reading persisted rows must enforce the invariant at read time (as index.loadCurrentRefHashes does) or use Fingerprint instead.

func HashRecord

func HashRecord(r Record) string

HashRecord returns a deterministic content hash for a normalized record.

Each field contributes a `%q=%q` pair so the encoding is injective: no combination of key/value strings (including ones that contain `=` or newline characters) can produce the same serialized prefix as a different field. Serialized parts are then sorted and SHA-256-joined in fingerprint. Changing this encoding requires a schema_version bump and a migration that rewrites persisted content_hash values (see migrateV3ToV4).

Types

type Record

type Record struct {
	Ref         string
	Kind        string
	Title       string
	SourceRef   string
	BodyFormat  string
	BodyText    string
	ContentHash string
	Metadata    map[string]string
}

Record is the neutral corpus unit Stroma indexes.

func (Record) Normalized

func (r Record) Normalized() (Record, error)

Normalized returns a trimmed, validated record with safe defaults applied.

func (Record) Validate

func (r Record) Validate() error

Validate reports whether the record is complete enough to persist.

type RefHash added in v1.0.0

type RefHash struct {
	Ref         string
	ContentHash string
}

RefHash is the minimal (Ref, ContentHash) pair needed to compute a corpus fingerprint without materializing full record bodies.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL