descriptor

package

v0.11.0 Latest Latest Go to latest Published: May 27, 2026 License: MIT Imports: 17 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/frankbardon/pulse

Links

Open Source Insights

Documentation ¶

Overview ¶

Package descriptor provides self-description, manifest, and predict functionality for pulse.

Index ¶

Constants
func ValidateLabels(env *Envelope, bindings []*types.LabelBinding, schema *encoding.Schema, ...) (augmentNames map[string]bool)
type ChainValidationResult
type CohortFieldType
type Command
type CommandAnnotations
type Components
type DefaultApplied
- func ResolveDefaults(req *types.Request, schema *encoding.Schema) []DefaultApplied
type DictionaryInfo
type DistributionMeta
type Envelope
- func Inspect(fileData io.ReadSeeker, opts *InspectOptions) *Envelope
- func InspectFromBytes(data []byte, opts *InspectOptions) *Envelope
- func NewEnvelope(data any) *Envelope
- func Predict(fileData io.ReadSeeker, req *types.Request, opts *PredictOptions) *Envelope
- func PredictFromBytes(data []byte, req *types.Request, opts *PredictOptions) *Envelope
- func ValidateChain(fileData io.ReadSeeker, req *types.ChainRequest) *Envelope
- func ValidateChainFromBytes(data []byte, req *types.ChainRequest) *Envelope
- func ValidateFacet(fileData io.ReadSeeker, req *types.FacetRequest) *Envelope
- func ValidateFacetFromBytes(data []byte, req *types.FacetRequest) *Envelope
- func ValidateFacetWithExtensions(fileData io.ReadSeeker, req *types.FacetRequest, snap *ExtensionsSnapshot) *Envelope
- func ValidateJoin(leftData, rightData io.ReadSeeker, req *types.Request) *Envelope
- func ValidateJoinFromBytes(left, right []byte, req *types.Request) *Envelope
- func (e *Envelope) AddError(code, message string, details map[string]any)
- func (e *Envelope) AddWarning(code, message string, details map[string]any)
- func (e *Envelope) MarshalJSON() ([]byte, error)
type EnvelopeEntry
- func CategoricalAggregationIssues(req *types.Request, schema *encoding.Schema) []*EnvelopeEntry
type ExprFunctionMeta
type ExtensionsManifest
type ExtensionsSnapshot
type FacetCapability
type FacetValidationResult
type InspectField
type InspectOptions
type InspectResult
type JoinCapability
type JoinValidationResult
type LabelTableMeta
type LookupTableMeta
type MCPTool
type Manifest
- func BuildManifest() *Manifest
- func BuildManifestWithExtensions(snap *ExtensionsSnapshot) *Manifest
- func SlimManifest(m *Manifest) *Manifest
type Operator
type OperatorMeta
type OperatorParamMeta
type Param
type PredictOptions
type PredictResult
type PredictSchemaInfo
type ProcessChainCapability
type RegressionMeta
type RegressionModifier
type ShardInfo
type SkillMeta
type Suggestion
type TestMeta

Constants ¶

View Source

const DefaultDictionaryLimit = 100

DefaultDictionaryLimit is the default max entries shown for categorical dictionaries.

Variables ¶

This section is empty.

Functions ¶

func ValidateLabels ¶ added in v0.10.1

func ValidateLabels(
	env *Envelope,
	bindings []*types.LabelBinding,
	schema *encoding.Schema,
	snap *ExtensionsSnapshot,
	extraFields map[string]bool,
) (augmentNames map[string]bool)

ValidateLabels walks a slice of LabelBindings against the resolved schema and the extension snapshot, surfacing per-binding failures as envelope errors. ValidateLabels never reads record data — it is safe to call from any no-execute predict path.

Validation order is deterministic: bindings are inspected in input order; the first failure per binding is recorded but later bindings are still examined so callers see every problem in one pass.

`extraFields` lists output-only field names that count as occupying the schema namespace for augment-mode collision detection (e.g. aggregation output labels, attribute labels). Pass nil when only raw schema fields matter.

Returns the set of (output) field names occupied by augment-mode sibling columns so callers can fold them into downstream collision checks (group keys, sort keys against the augmented schema).

Types ¶

type ChainValidationResult ¶ added in v0.10.0

type ChainValidationResult struct {
	// Valid mirrors envelope.Errors emptiness.
	Valid bool `json:"valid"`

	// Request echoes the input chain request unchanged.
	Request *types.ChainRequest `json:"request"`

	// SchemaInfo summarises the source cohort schema used for stage 0.
	SchemaInfo *PredictSchemaInfo `json:"schema_info,omitempty"`

	// StageSchemas exposes the inferred output schema per stage as a
	// list of field names. Index i carries the columns produced by
	// stage i (input to stage i+1). Useful for debugging chain breaks
	// when stage N+1 cannot find a field stage N was expected to emit.
	StageSchemas [][]string `json:"stage_schemas,omitempty"`
}

ChainValidationResult is the structured output of ValidateChain. Mirrors FacetValidationResult: the resolved request echoes back so callers can forward it after inspection, and per-stage schemas expose the inferred lineage for debugging.

type CohortFieldType ¶

type CohortFieldType struct {
	Name                  string   `json:"name"`
	Categorical           bool     `json:"categorical"`
	ShardedCapable        bool     `json:"sharded_capable"`
	CompatibleAggregators []string `json:"compatible_aggregators,omitempty"`
	CompatibleAttributes  []string `json:"compatible_attributes,omitempty"`
	CompatibleFilterers   []string `json:"compatible_filterers,omitempty"`
	CompatibleGroupers    []string `json:"compatible_groupers,omitempty"`
	CompatibleWindows     []string `json:"compatible_windows,omitempty"`
	CompatibleFeatures    []string `json:"compatible_features,omitempty"`
}

CohortFieldType describes a field type available in .pulse files and the operator catalog that accepts it. The Compatible* slices are derived from the per-operator AcceptsTypes declarations and let an LLM look up "what can I do with a date field" in one place.

ShardedCapable reports whether the type participates in a shard archive without restriction. Every built-in field type is sharded- capable today; the flag exists for forward compatibility with future types that might not work across the union of shards (e.g. types whose semantics depend on per-shard locality). Embedders should treat the flag as advisory.

type Command ¶

type Command struct {
	Name        string `json:"name"`
	Description string `json:"description"`

	// Annotations carries the three per-command capability hints
	// (streamable / deterministic / expensive). Embedders use these to
	// decide whether to wrap a command in caching or invoke it directly.
	// Always populated for built-in commands.
	Annotations CommandAnnotations `json:"annotations"`
}

Command describes a CLI leaf command in the manifest.

type CommandAnnotations ¶ added in v0.11.0

type CommandAnnotations struct {
	Streamable    bool `json:"streamable"`
	Deterministic bool `json:"deterministic"`
	Expensive     bool `json:"expensive"`
}

CommandAnnotations carries three capability flags per command.

Streamable: the command has a streaming variant. Callers can invoke the streaming form for incremental output.
Deterministic: the command produces the same output given the same inputs (including the source file's content hash). Callers can safely cache results keyed by the request hash.
Expensive: the command is worth caching. Cheap operations may not be worth the cache machinery; expensive ones (regression, filter-to-file, profile) typically are. Hint to consumers, not a hard constraint.

type Components ¶

type Components struct {
	Aggregators []Operator `json:"aggregators"`
	Attributes  []Operator `json:"attributes"`
	Filterers   []Operator `json:"filterers"`
	Groupers    []Operator `json:"groupers"`
	Windows     []Operator `json:"windows"`
	Features    []Operator `json:"features"`
}

Components lists every registered processing component grouped by category. Each slice carries one Operator entry per component so LLM-side authoring has access to per-operator params, accepted field types, emit type, and streamability without further discovery round-trips.

type DefaultApplied ¶ added in v0.5.0

type DefaultApplied struct {
	Path     []string `json:"path"`
	Field    string   `json:"field"`
	Type     string   `json:"type"`
	Category string   `json:"category"`
	Reason   string   `json:"reason"`
}

DefaultApplied describes a single defaulted operator slot. Returned by ResolveDefaults so callers (predict, service, MCP transcripts) can echo exactly which slots had their Type inferred from the schema. The shape is JSON-serializable and lands on PredictResult.DefaultsApplied.

Path uses JSON-style segments e.g. ["Aggregations", "0", "Type"] so the caller can address the slot in the original request. Field names the schema column that drove inference; Type is the operator string that was filled in; Category is "aggregation" or "grouper"; Reason carries a short human-readable rule trace, e.g. "f64 → AGG_SUM (rule: numeric default)".

func ResolveDefaults ¶ added in v0.5.0

func ResolveDefaults(req *types.Request, schema *encoding.Schema) []DefaultApplied

ResolveDefaults inspects every aggregation and grouper slot in req, fills in the Type for slots that name a field but omit the Type, and returns one DefaultApplied entry per change made. Mutates req in place.

Rules:

Defaults never override an explicit Type.
Defaults never cross categories (a missing aggregator does not insert a grouper, and vice versa).
Tests (req.Tests, req.PostTests) are not defaulted; hypothesis tests are intent-bearing.
Field types absent from defaultRules (or with no rule for the slot's category) are left untouched and contribute no entry.

Returns a nil slice when nothing changed.

type DictionaryInfo ¶

type DictionaryInfo struct {
	TotalEntries int      `json:"total_entries"`
	Truncated    bool     `json:"truncated"`
	Values       []string `json:"values"`
}

DictionaryInfo describes the categorical dictionary for a field.

type DistributionMeta ¶ added in v0.5.0

type DistributionMeta struct {
	// Name is the distribution kind identifier (e.g. "lognormal").
	Name string `json:"name"`

	// Description is a one-sentence prose summary.
	Description string `json:"description"`

	// AppliesTo lists the FieldSpec types the distribution can drive.
	// Values are coarse families: "numeric", "categorical", "date",
	// "bool", "any".
	AppliesTo []string `json:"applies_to"`

	// Params lists the distribution-specific parameters.
	Params []Param `json:"params"`
}

DistributionMeta describes a synth distribution entry. One entry per synth.AllDistributions() value.

type Envelope ¶

type Envelope struct {
	FormatVersion string           `json:"format_version"`
	Data          any              `json:"data"`
	Errors        []*EnvelopeEntry `json:"errors"`
	Warnings      []*EnvelopeEntry `json:"warnings"`
}

Envelope is the standard JSON output wrapper for all descriptor operations. All --json output follows this shape.

func Inspect ¶

func Inspect(fileData io.ReadSeeker, opts *InspectOptions) *Envelope

Inspect reads a .pulse file header and schema, returning structured field information. It never reads record data. This entry point handles single-file cohorts only; archive-backed cohorts route through InspectFromBytes which can magic-detect the zip container.

func InspectFromBytes ¶

func InspectFromBytes(data []byte, opts *InspectOptions) *Envelope

InspectFromBytes inspects either a single-file .pulse cohort or a Pulse shard archive. Detection is by the first four bytes: zip magic PK\x03\x04 → archive path; PULSE magic → single-file path. Other prefixes return the standard ENCODING_INVALID envelope from the single-file path.

For archive-backed cohorts, the canonical schema and dictionaries come from the reserved _schema.pulse entry; Shards enumerates every non-reserved entry in central-directory order with per-shard RecordCount populated by peeking each shard's header; the envelope-level RecordCount is the cumulative sum across shards.

func NewEnvelope ¶

func NewEnvelope(data any) *Envelope

NewEnvelope creates an envelope with the given data and no errors/warnings.

func Predict ¶

func Predict(fileData io.ReadSeeker, req *types.Request, opts *PredictOptions) *Envelope

Predict validates a request against a .pulse file without executing it. It reads only the header and schema, never record data. The returned Envelope contains the PredictResult in Data and any errors/warnings encountered.

func PredictFromBytes ¶

func PredictFromBytes(data []byte, req *types.Request, opts *PredictOptions) *Envelope

PredictFromBytes routes data to either the single-file predict path or the archive predict path based on the first four bytes. Archive detection is by the zip magic PK\x03\x04; single-file (or any other prefix) falls through to Predict, which surfaces the standard ENCODING_INVALID envelope on malformed input.

For archive-backed cohorts: the canonical schema is read from the reserved _schema.pulse entry and used for every validation step (field existence, type compatibility, streamability). The cumulative record total and per-shard list are added to the PredictResult; the request is validated against the canonical schema and inherits the same streamability gates that a single-file cohort with the same schema would produce.

func ValidateChain ¶ added in v0.10.0

func ValidateChain(fileData io.ReadSeeker, req *types.ChainRequest) *Envelope

ValidateChain validates a ChainRequest against a .pulse cohort header + schema. It never reads record data — the no-execute contract holds. Errors land in the envelope as SERVICE_VALIDATION or PULSE_CHAIN_NOT_MERGEABLE / PULSE_CHAIN_EMPTY; each stage's inferred output schema is propagated forward so the next stage's field references can be checked.

The validator does not import service/processing — predict's structural ban applies to the broader descriptor surface in spirit.

func ValidateChainFromBytes ¶ added in v0.10.0

func ValidateChainFromBytes(data []byte, req *types.ChainRequest) *Envelope

ValidateChainFromBytes is a convenience wrapper that creates a reader from bytes.

func ValidateFacet ¶ added in v0.7.0

func ValidateFacet(fileData io.ReadSeeker, req *types.FacetRequest) *Envelope

ValidateFacet validates a FacetRequest against a .pulse cohort header + schema. It never reads record data — predict's "no execution" contract applies here too. Errors land in the envelope as SERVICE_VALIDATION; advisory issues (percentiles on a non-numeric field, a low DiscreteTopK, IncludeHistogram on a discrete-only request) become warnings.

Callers can use this to gate a UI before issuing the (potentially expensive) FacetSchema call.

func ValidateFacetFromBytes ¶ added in v0.7.0

func ValidateFacetFromBytes(data []byte, req *types.FacetRequest) *Envelope

ValidateFacetFromBytes is a convenience wrapper that creates a reader from bytes.

func ValidateFacetWithExtensions ¶ added in v0.10.1

func ValidateFacetWithExtensions(fileData io.ReadSeeker, req *types.FacetRequest, snap *ExtensionsSnapshot) *Envelope

ValidateFacetWithExtensions extends ValidateFacet with the embedder-registered extension snapshot so label-table references can be resolved. Pass nil to opt out (same behavior as ValidateFacet).

func ValidateJoin ¶ added in v0.10.0

func ValidateJoin(leftData, rightData io.ReadSeeker, req *types.Request) *Envelope

ValidateJoin validates a Request whose Joins slot carries one JoinSpec against the headers + schemas of the left and right cohorts. It never reads record data — descriptor's no-execute contract applies. Errors land as SERVICE_VALIDATION or PULSE_JOIN_*; emits JoinedFields when the join would succeed at runtime.

func ValidateJoinFromBytes ¶ added in v0.10.0

func ValidateJoinFromBytes(left, right []byte, req *types.Request) *Envelope

ValidateJoinFromBytes is a convenience wrapper around byte buffers.

func (*Envelope) AddError ¶

func (e *Envelope) AddError(code, message string, details map[string]any)

AddError appends a structured error to the envelope.

func (*Envelope) AddWarning ¶

func (e *Envelope) AddWarning(code, message string, details map[string]any)

AddWarning appends a structured warning to the envelope.

func (*Envelope) MarshalJSON ¶

func (e *Envelope) MarshalJSON() ([]byte, error)

MarshalJSON produces deterministic JSON output.

type EnvelopeEntry ¶

type EnvelopeEntry struct {
	Code    string         `json:"code"`
	Message string         `json:"message"`
	Details map[string]any `json:"details,omitempty"`
}

EnvelopeEntry represents a single error or warning in the envelope.

func CategoricalAggregationIssues ¶ added in v0.8.4

func CategoricalAggregationIssues(req *types.Request, schema *encoding.Schema) []*EnvelopeEntry

CategoricalAggregationIssues returns one EnvelopeEntry per (Aggregation slot referencing a categorical field, aggregator from numericAggregations) pair found in req. Each entry carries the PULSE_AGG_NOT_MEANINGFUL_FOR_CATEGORICAL code, a human-readable message, and the {field, aggregation} details map. Strict-mode promotion is the caller's responsibility — the helper itself is non-judgmental and returns nil when the request, schema, or Aggregations slice is empty / nil.

Used by:

validateRequestFields (predict path) — emits to env.Warnings / env.Errors based on PredictOptions.Strict.
service.Process — wraps the first entry as a coded error when Service is configured strict; non-strict emission flows through the envelope at the CLI boundary.

type ExprFunctionMeta ¶ added in v0.7.0

type ExprFunctionMeta struct {
	Name        string `json:"name"`
	Description string `json:"description,omitempty"`
	Signature   string `json:"signature,omitempty"`
	Pure        bool   `json:"pure,omitempty"`
}

ExprFunctionMeta is the manifest projection of an embedder- registered expression function.

type ExtensionsManifest ¶ added in v0.7.0

type ExtensionsManifest struct {
	Aggregators        []OperatorMeta     `json:"aggregators"`
	Attributes         []OperatorMeta     `json:"attributes"`
	Filterers          []OperatorMeta     `json:"filterers"`
	Groupers           []OperatorMeta     `json:"groupers"`
	Windows            []OperatorMeta     `json:"windows"`
	Features           []OperatorMeta     `json:"features"`
	Tests              []OperatorMeta     `json:"tests"`
	SynthDistributions []OperatorMeta     `json:"synth_distributions"`
	ExprFunctions      []ExprFunctionMeta `json:"expr_functions"`
	LookupTables       []LookupTableMeta  `json:"lookup_tables"`
	LabelTables        []LabelTableMeta   `json:"label_tables"`
}

ExtensionsManifest is the manifest-side projection of embedder- registered operators and expression-side state. Built-in operators continue to live in Manifest.Components; this block lists everything the host process adds on top, so a reviewer can see at a glance which subset is Pulse-shipped vs registered by the embedder.

All slices are sorted by Name for deterministic output. An empty Extensions block emits as nested empty slices (not null) for JSON-stability across releases.

type ExtensionsSnapshot ¶ added in v0.7.0

type ExtensionsSnapshot struct {
	Aggregators        []OperatorMeta
	Attributes         []OperatorMeta
	Filterers          []OperatorMeta
	Groupers           []OperatorMeta
	Windows            []OperatorMeta
	Features           []OperatorMeta
	Tests              []OperatorMeta
	SynthDistributions []OperatorMeta
	ExprFunctions      []ExprFunctionMeta
	LookupTables       []LookupTableMeta
	LabelTables        []LabelTableMeta
}

ExtensionsSnapshot is the immutable read-only view that the service hands to descriptor.BuildManifestWithExtensions and predict. It carries everything the manifest + predict surfaces need without importing the live processing registry — keeping TestPredictNoExecutionImports satisfied.

type FacetCapability ¶ added in v0.7.0

type FacetCapability struct {
	// Name is the canonical entry identifier ("facet_schema").
	Name string `json:"name"`

	// SupportsDiscrete reports whether per-value count summaries are
	// available for categorical / boolean / geo fields.
	SupportsDiscrete bool `json:"supports_discrete"`

	// SupportsNumeric reports whether streaming statistics
	// (count, sum, min, max, mean, stddev) are produced for numeric
	// fields.
	SupportsNumeric bool `json:"supports_numeric"`

	// SupportsPercentiles reports whether NumericPercentiles are
	// computed via a buffered second-stage sort.
	SupportsPercentiles bool `json:"supports_percentiles"`

	// SupportsHistogram reports whether IncludeHistogram +
	// HistogramRange + HistogramBins are honoured.
	SupportsHistogram bool `json:"supports_histogram"`

	// SupportsAdditive reports whether AdditiveFields contribution
	// counts are computed.
	SupportsAdditive bool `json:"supports_additive"`

	// StreamableConditions lists the human-readable rules that govern
	// which requests run in a single pass vs. force the buffered
	// secondary sort. The order is stable and intended for LLM-side
	// reasoning, not strict parsing.
	StreamableConditions []string `json:"streamable_conditions"`
}

FacetCapability describes the rich-facet endpoint surfaced by pulse.FacetSchema / pulse_facet_schema. The manifest carries one FacetCapability entry under Manifest.Facet so LLM clients can detect the endpoint's feature set without inspecting the source.

type FacetValidationResult ¶ added in v0.7.0

type FacetValidationResult struct {
	// Valid mirrors envelope.Errors emptiness.
	Valid bool `json:"valid"`

	// Request echoes the input request unchanged.
	Request *types.FacetRequest `json:"request"`

	// SchemaInfo summarises the cohort schema used for validation.
	SchemaInfo *PredictSchemaInfo `json:"schema_info,omitempty"`
}

FacetValidationResult is the structured ValidateFacet output. Mirrors the predict surface: the resolved request is echoed back verbatim (callers may want to forward it to FacetSchema after inspection) and Warnings carry advisory issues that do not block execution.

type InspectField ¶

type InspectField struct {
	Name              string          `json:"name"`
	Type              string          `json:"type"`
	ByteOffset        int             `json:"byte_offset"`
	BitPosition       int             `json:"bit_position"`
	Description       string          `json:"description"`
	DescriptionSource string          `json:"description_source"`
	Categorical       bool            `json:"categorical"`
	Dictionary        *DictionaryInfo `json:"dictionary,omitempty"`
	// Precision is the decimal128 precision (1-38). Present only for
	// decimal128 / nullable_decimal128 fields.
	Precision *uint8 `json:"precision,omitempty"`
	// Scale is the decimal128 scale (0-precision). Present only for
	// decimal128 / nullable_decimal128 fields.
	Scale *uint8 `json:"scale,omitempty"`
}

InspectField describes a single field in the inspect output.

type InspectOptions ¶

type InspectOptions struct {
	// FullDict disables dictionary truncation when true.
	FullDict bool
	// DictionaryLimit overrides the default truncation limit.
	// Zero means use DefaultDictionaryLimit.
	DictionaryLimit int
}

InspectOptions controls inspect behavior.

type InspectResult ¶

type InspectResult struct {
	FieldCount  int             `json:"field_count"`
	Fields      []*InspectField `json:"fields"`
	Shards      []ShardInfo     `json:"shards"`
	RecordCount int64           `json:"record_count"`
}

InspectResult holds the schema inspection output.

Shards is populated when the inspected file is a Pulse shard archive (first four bytes match the zip magic PK\x03\x04). Entries are listed in zip central-directory order, which equals shard insertion order. Single-file cohorts leave Shards as an empty slice (never nil) so the JSON envelope emits "shards": [] rather than null.

For archive-backed cohorts, FieldCount and Fields reflect the canonical schema carried in the reserved _schema.pulse entry, and RecordCount (when present in metadata) plus aggregate counting via per-shard headers populate the cumulative total — see ShardInfo.

type JoinCapability ¶ added in v0.10.0

type JoinCapability struct {
	// Name is the canonical identifier ("hash_join").
	Name string `json:"name"`

	// MaxJoinsPerRequest is the upper bound on Request.Joins length.
	// 1 today; lifts when multi-join chains land.
	MaxJoinsPerRequest int `json:"max_joins_per_request"`

	// Kinds lists supported JoinSpec.Kind values. "inner" today;
	// "left", "outer", "anti" land once the null-bitmap correctness
	// path is fully wired.
	Kinds []string `json:"kinds"`

	// SpillBytes reports the in-memory threshold beyond which the
	// build side would spill. Zero today (no spill — the build side
	// is always materialised in RAM); set when the spill path lands.
	SpillBytes int64 `json:"spill_bytes"`

	// SpillEnv names the environment variable that overrides the
	// spill threshold. Set when the spill path lands.
	SpillEnv string `json:"spill_env,omitempty"`

	// Limitations lists the v1 envelope notes for LLM clients.
	Limitations []string `json:"limitations"`
}

JoinCapability describes the pushdown hash-join endpoint surfaced by Request.Joins. The manifest carries one JoinCapability entry under Manifest.Join so LLM clients can detect the v1 envelope (inner-only, single-join) without inspecting the source.

type JoinValidationResult ¶ added in v0.10.0

type JoinValidationResult struct {
	Valid        bool               `json:"valid"`
	Request      *types.Request     `json:"request"`
	LeftSchema   *PredictSchemaInfo `json:"left_schema,omitempty"`
	RightSchema  *PredictSchemaInfo `json:"right_schema,omitempty"`
	JoinedFields []string           `json:"joined_fields,omitempty"`
}

JoinValidationResult is the structured output of ValidateJoin. Echoes the request unchanged and exposes the inferred output schema as a list of field names.

type LabelTableMeta ¶ added in v0.10.1

type LabelTableMeta struct {
	Name        string `json:"name"`
	Description string `json:"description,omitempty"`
	HasRowsData bool   `json:"has_rows_data"`
}

LabelTableMeta is the manifest projection of a registered string- valued label table. HasRowsData distinguishes the static Rows-backed table from the function-driven Lookup table without exposing the embedder's data.

type LookupTableMeta ¶ added in v0.7.0

type LookupTableMeta struct {
	Name        string `json:"name"`
	Description string `json:"description,omitempty"`
	HasRowsData bool   `json:"has_rows_data"`
}

LookupTableMeta is the manifest projection of a registered lookup table. HasRowsData distinguishes the static Rows-backed table from the function-driven Lookup table without exposing the embedder's data.

type MCPTool ¶ added in v0.5.0

type MCPTool struct {
	// Name is the tool identifier (e.g. "pulse_predict").
	Name string `json:"name"`

	// Description mirrors the description string registered with the
	// MCP server.
	Description string `json:"description"`
}

MCPTool describes a single registered MCP tool. One entry per internal/mcp.RegisteredTools() value.

type Manifest ¶

type Manifest struct {
	FormatVersion string    `json:"format_version"`
	Commands      []Command `json:"commands"`

	// Operations enumerates library-only entry points that do not back a
	// CLI leaf (today: filter_to_file, watch, process_stream). Each entry
	// carries the same CommandAnnotations as a CLI leaf so consumers can
	// reason uniformly about caching / streaming. The slice is sorted by
	// Name for determinism.
	Operations         []Command          `json:"operations"`
	Components         Components         `json:"components"`
	Tests              []TestMeta         `json:"tests"`
	PostTests          []TestMeta         `json:"post_tests"`
	Regressions        []RegressionMeta   `json:"regressions"`
	SynthDistributions []DistributionMeta `json:"synth_distributions"`
	// ErrorCodesCount is the total number of registered error codes.
	ErrorCodesCount int `json:"error_codes_count"`
	// ErrorDomains is the alphabetized list of distinct domain prefixes
	// (e.g. "CLI", "DATA", "ENCODING", "PROCESSING", "PULSE",
	// "SERVICE"). One entry per domain, six entries in v1.
	ErrorDomains []string `json:"error_domains"`
	// ErrorCodes is the alphabetized list of code identifiers.
	// Per-code Message + Fixup prose lives behind the
	// `pulse_errors_lookup` MCP tool / `pulse errors lookup CODE` CLI
	// leaf — depth-on-demand, not common-path.
	ErrorCodes        []string          `json:"error_codes"`
	MCPTools          []MCPTool         `json:"mcp_tools"`
	CohortTypes       []CohortFieldType `json:"cohort_types"`
	Skills            []SkillMeta       `json:"skills"`
	ExamplesCount     int               `json:"examples_count"`
	ExampleCategories []string          `json:"example_categories"`
	ExampleTags       []string          `json:"example_tags"`
	// Extensions enumerates embedder-registered operators + expression
	// state. Built-in operators continue to live in Components; this
	// block is the additive layer registered via
	// pulse.Options.Extensions. Empty slices on every field for a
	// host with no extensions.
	Extensions ExtensionsManifest `json:"extensions"`

	// Facet is the rich-facet endpoint capability descriptor. One
	// entry today (facet_schema); future variants land under a slice
	// when added.
	Facet FacetCapability `json:"facet"`

	// ProcessChain is the source-rooted linear chain endpoint
	// capability descriptor (one entry today: process_chain).
	// Carries the mergeable-operator allowlist and rejection rules
	// so LLM clients can route between chain and per-stage fallback.
	ProcessChain ProcessChainCapability `json:"process_chain"`

	// Join is the pushdown hash-join capability descriptor (one
	// entry today: hash_join). Carries the kind allowlist, spill
	// envelope, and v1 limitations.
	Join JoinCapability `json:"join"`
}

Manifest is the root self-description of the Pulse system. One bootstrap call returns every fact an LLM needs to author a valid Pulse request: CLI command list, per-operator capabilities, per-test metadata (tier-1 and tier-2 as peer slices), synth distribution catalog, MCP tool list, cohort field-type catalog with operator cross-references, and embedded skill index. Error coverage is name-only — fetch per-code prose via the `pulse_errors_lookup` MCP tool or `pulse errors lookup CODE` CLI leaf on demand to keep the bootstrap payload lean.

The payload is deterministic and free of cohort data. Clients cache it for a session.

func BuildManifest ¶

func BuildManifest() *Manifest

BuildManifest constructs a deterministic Manifest from the current registries and capability tables. The result is safe to cache and share across goroutines; callers do not mutate the returned slices.

func BuildManifestWithExtensions ¶ added in v0.7.0

func BuildManifestWithExtensions(snap *ExtensionsSnapshot) *Manifest

BuildManifestWithExtensions constructs a Manifest that includes the embedder-registered extension surface. A nil snapshot is equivalent to BuildManifest — the Extensions block becomes the empty manifest (every category is `[]`, not `null`).

descriptor stays free of service / processing imports; the snapshot is the only way the live ExtensionRegistry reaches this layer.

func SlimManifest ¶ added in v0.5.0

func SlimManifest(m *Manifest) *Manifest

SlimManifest returns a copy of m with every prose Description field blanked. Structural metadata (names, params, types, tiers, accept-type lists, streamability flags, cross-references, fixups) is preserved.

The slim variant is intended for size-sensitive MCP/CLI clients that would rather pay a few extra discovery round-trips than transit the full ~70 kB descriptive payload at session start. The default manifest surface remains the full one; --slim is opt-in.

SlimManifest never mutates the input; it shallow-copies the top-level struct, then per-slice shallow-copies and zeroes the Description fields.

type Operator ¶ added in v0.5.0

type Operator struct {
	// Name is the operator identifier (e.g. "AGG_PERCENTILE").
	Name string `json:"name"`

	// Category is the family this operator belongs to. One of:
	// "aggregator", "attribute", "filterer", "grouper", "window",
	// "feature".
	Category string `json:"category"`

	// Description is a one-sentence prose summary for LLM-side selection.
	Description string `json:"description"`

	// Params lists every parameter the operator reads from its Params
	// blob in a request. Required and optional are both listed.
	Params []Param `json:"params"`

	// AcceptsTypes lists the cohort field types this operator can be
	// applied to. Values are field type name strings (e.g. "f64",
	// "categorical_u16", "date"). Empty means "no field input".
	AcceptsTypes []string `json:"accepts_types"`

	// EmitsType is the field type produced for single-output operators.
	// Empty when the operator's emit type is conditional on input or
	// when it does not emit a typed column (e.g. an aggregator emits a
	// scalar).
	EmitsType string `json:"emits_type,omitempty"`

	// EmitsTypeNote provides context when EmitsType is empty or
	// conditional (e.g. "matches input field type", "scalar float64").
	EmitsTypeNote string `json:"emits_type_note,omitempty"`

	// Streamable mirrors types.X.Streamable() for the operator's type.
	// Source of truth for the runtime gate.
	Streamable bool `json:"streamable"`

	// StreamableHint suggests the closest streaming-capable alternative
	// when Streamable is false (e.g. AGG_MEDIAN -> "Use AGG_AVERAGE for
	// streaming, or accept the buffered path."). Empty when Streamable
	// is true or no near-equivalent exists.
	StreamableHint string `json:"streamable_hint,omitempty"`
}

Operator describes a single registered processing component (aggregator, attribute, filterer, grouper, window operator, or feature operator). The manifest exposes one Operator entry per registered component. LLM clients use this metadata at session start to author valid requests without further discovery round-trips.

type OperatorMeta ¶ added in v0.7.0

type OperatorMeta struct {
	Name        string              `json:"name"`
	Namespace   string              `json:"namespace"`
	Description string              `json:"description,omitempty"`
	Streamable  bool                `json:"streamable"`
	Accepts     []string            `json:"accepts,omitempty"`
	Emits       string              `json:"emits,omitempty"`
	Mode        string              `json:"mode,omitempty"`
	Tier        string              `json:"tier,omitempty"`
	Params      []OperatorParamMeta `json:"params,omitempty"`
}

OperatorMeta is the manifest projection for an embedder-registered operator. The shape mirrors descriptor.Operator but adds Namespace (parsed from the registered name) and Mode (attribute-only) so reviewers can group operators by source without re-parsing.

type OperatorParamMeta ¶ added in v0.7.0

type OperatorParamMeta struct {
	Name        string `json:"name"`
	Description string `json:"description,omitempty"`
	JSONType    string `json:"json_type"`
	Required    bool   `json:"required,omitempty"`
	Default     any    `json:"default,omitempty"`
}

OperatorParamMeta is the manifest-friendly mirror of pulse.ParamMeta.

type Param ¶ added in v0.5.0

type Param struct {
	// Name is the JSON key inside the operator's Params blob.
	Name string `json:"name"`

	// Type is the parameter's value type. One of:
	//   "float", "int", "string", "bool", "field", "enum", "list", "object".
	// "field" means the value names a cohort field; "enum" means the
	// value is one of EnumValues; "list" means a JSON array.
	Type string `json:"type"`

	// Required is true when the parameter must be supplied. Optional
	// parameters with a default carry Required=false and Default set.
	Required bool `json:"required"`

	// Default is the operator's default value when Required is false.
	// Omitted from JSON when nil.
	Default any `json:"default,omitempty"`

	// Description is a one-sentence prose explanation.
	Description string `json:"description"`

	// EnumValues lists the allowed values when Type=="enum".
	EnumValues []string `json:"enum_values,omitempty"`

	// FieldFilter constrains acceptable field types when Type=="field".
	// One of: "numeric", "categorical", "date", "any".
	FieldFilter string `json:"field_filter,omitempty"`
}

Param describes a single parameter accepted by an operator, test, or synth distribution.

type PredictOptions ¶

type PredictOptions struct {
	// Strict upgrades warnings to errors.
	Strict bool

	// Extensions is the read-only snapshot of embedder-registered
	// operators + expression-side state. Nil takes the built-in-only
	// path. The snapshot adds every custom operator name to the
	// validator's known-types set so predict does not flag
	// embedder-registered ops as unknown, and feeds streamability
	// overrides into computeStreamable.
	Extensions *ExtensionsSnapshot
}

PredictOptions controls predict behavior.

type PredictResult ¶

type PredictResult struct {
	Valid      bool               `json:"valid"`
	Request    *types.Request     `json:"request"`
	SchemaInfo *PredictSchemaInfo `json:"schema_info,omitempty"`
	// RecordCount reports the cumulative record total across all shards
	// when the cohort is a shard archive, and zero for single-file
	// cohorts (predict reads only headers/schemas, so single-file
	// counts are not computed by this no-execute path). The count is
	// derived by peeking each shard's own header.
	RecordCount int64 `json:"record_count"`
	// Shards mirrors InspectResult.Shards for archive-backed cohorts.
	// Empty (but non-nil) for single-file cohorts. Listed in zip
	// central-directory order.
	Shards []ShardInfo `json:"shards"`
	// Streamable reports whether ProcessStream / process --stream can
	// emit rows without buffering the entire result. False whenever the
	// request uses groups, attributes, windows, decimal fields, or any
	// non-streamable operator. Computed via per-type Streamable() methods
	// plus schema-aware checks.
	//
	// Streamability is a property of the request and the canonical
	// schema, not of the cohort shape — archive-backed cohorts inherit
	// the same streamability as a single-file cohort with the same
	// schema.
	Streamable bool `json:"streamable"`
	// StreamableReasons lists the gates that forced Streamable=false. Empty
	// when Streamable=true. Useful for users debugging why their request
	// is buffering.
	StreamableReasons []string `json:"streamable_reasons,omitempty"`
	// Suggestions enumerates structured next-actions the caller can apply
	// to repair (or improve) the request. Suggestions fire on validation
	// issues — field-name typos, operator/type mismatches, date misuse,
	// missing required params — and on non-streamable but otherwise valid
	// requests (streamable-substitute hints). May be empty; never nil in
	// JSON output.
	Suggestions []Suggestion `json:"suggestions"`
	// DefaultsApplied lists every operator slot whose Type was inferred
	// from the named field's schema type. Predict computes this on a
	// clone of the request, so the echoed Request reflects exactly what
	// the engine would run; the DefaultsApplied list shows what would
	// have been filled in. Empty when no defaults fire; never nil in
	// JSON output.
	DefaultsApplied []DefaultApplied `json:"defaults_applied"`
}

PredictResult holds the validated request and any diagnostics.

type PredictSchemaInfo ¶

type PredictSchemaInfo struct {
	FieldCount int      `json:"field_count"`
	Fields     []string `json:"fields"`
}

PredictSchemaInfo summarizes the schema used for prediction.

type ProcessChainCapability ¶ added in v0.10.0

type ProcessChainCapability struct {
	// Name is the canonical entry identifier ("process_chain").
	Name string `json:"name"`

	// MaxStages is the upper bound on Stages length. Zero indicates
	// no compile-time cap; runtime memory is bounded by the largest
	// intermediate response.Data slice.
	MaxStages int `json:"max_stages"`

	// MergeableAggregators lists the aggregator names that pass the
	// chain gate (mergeable + single-scalar emit). Alphabetically
	// sorted, deterministic across calls.
	MergeableAggregators []string `json:"mergeable_aggregators"`

	// MergeableGroupers lists the grouper names that pass the chain
	// gate.
	MergeableGroupers []string `json:"mergeable_groupers"`

	// RowLocalAttributes lists the attribute names that pass the
	// chain gate (row-local only).
	RowLocalAttributes []string `json:"row_local_attributes"`

	// RejectionRules names the operator categories the chain gate
	// rejects today. Intended for LLM-side reasoning and fallback
	// routing; not a strict schema.
	RejectionRules []string `json:"rejection_rules"`
}

ProcessChainCapability describes the source-rooted linear chain endpoint surfaced by pulse.ProcessChain / pulse_process_chain. The manifest carries one ProcessChainCapability entry under Manifest.ProcessChain so LLM clients can detect the chain gate and choose between the chained or per-stage fallback path.

type RegressionMeta ¶ added in v0.6.0

type RegressionMeta struct {
	// Name is the operator identifier (REG_OLS, REG_GLM,
	// REG_BAYES_LINEAR).
	Name string `json:"name"`

	// Description is a one-sentence prose summary for LLM-side
	// operator selection.
	Description string `json:"description"`

	// AcceptsTypes lists the schema field types valid as Target /
	// Predictors entries (numeric only in v1).
	AcceptsTypes []string `json:"accepts_types"`

	// EmitsTypeNote describes the structured output (RegressionResult)
	// since no scalar field type captures the fit summary.
	EmitsTypeNote string `json:"emits_type_note"`

	// Streamable mirrors types.RegressionType.Streamable(): true for
	// closed-form fits (REG_OLS, REG_BAYES_LINEAR), false for iterative
	// fits (REG_GLM). Spec-level modifiers (Resample, Selection) can
	// downgrade this further per request — see RegressionSpec.Streamable.
	Streamable bool `json:"streamable"`

	// StreamableHint flags the modifier downgrade rule so clients know
	// non-empty Resample / Selection forces the buffered path even on
	// streamable operators.
	StreamableHint string `json:"streamable_hint,omitempty"`

	// Params lists the per-operator parameter schema, including the
	// regularization, family, prior, resample, and selection knobs that
	// modify the underlying fit.
	Params []Param `json:"params"`

	// Modifiers names the spec-level orthogonal wrappers any regression
	// type supports. Each entry is a top-level RegressionSpec field
	// (Resample, Selection) plus its enum values; the field is
	// duplicated under Params for completeness but exposed at the top
	// level here so request authors can discover the composition story
	// without parsing the full param list.
	Modifiers []RegressionModifier `json:"modifiers,omitempty"`
}

RegressionMeta describes a registered REG_* operator in the manifest. The shape mirrors Operator and TestMeta enough that LLM clients can reuse the same authoring path, but regression-specific knobs (family, link, penalty, modifier slots) live alongside the shared name + streamable + params trio so the catalog stays one fetch deep.

type RegressionModifier ¶ added in v0.6.0

type RegressionModifier struct {
	Name        string   `json:"name"`
	Description string   `json:"description"`
	EnumValues  []string `json:"enum_values"`
}

RegressionModifier describes a spec-level wrapper (Resample, Selection) that composes with any regression operator. Each modifier downgrades streamability when set; clients combine its EnumValues with their chosen Type to author the request.

type ShardInfo ¶ added in v0.8.0

type ShardInfo struct {
	Filename    string `json:"filename"`
	RecordCount int64  `json:"record_count"`
}

ShardInfo is one shard inside a Pulse shard archive, surfaced by Inspect for archive-backed cohorts. Mirrors the public shape of service.ShardEntry without importing it — descriptor/ is header-only and must not depend on the execution layer.

Filename is the basename of the shard inside the archive (e.g. "20190101.pulse"). RecordCount is the number of records carried by the shard, computed by peeking that shard's own header (the per-shard headers are authoritative; the canonical _schema.pulse aggregate is only a sanity check).

type SkillMeta ¶

type SkillMeta struct {
	Name        string `json:"name"`
	Description string `json:"description"`
}

SkillMeta describes a bundled skill.

type Suggestion ¶ added in v0.5.0

type Suggestion struct {
	Path       []string `json:"path"`
	Reason     string   `json:"reason"`
	Current    any      `json:"current,omitempty"`
	Proposed   []any    `json:"proposed,omitempty"`
	Confidence float64  `json:"confidence"`
}

Suggestion is a structured next-action attached to PredictResult. Predict computes suggestions inline so callers can repair a request without an additional inspect round-trip.

Path points at the offending request location using JSON-style segments — e.g. ["Aggregations", "0", "Field"] addresses the Field of the first aggregation.

Proposed is a ranked list of candidate values. Empty when no concrete proposal applies (e.g. ATTR_PERCENTILE has no streamable peer); the caller should treat empty Proposed as advisory.

Confidence is a static heuristic in [0, 1]: 0.9 for high-certainty single-candidate swaps and Levenshtein distance 1; 0.7 for distance 2; 0.6 for multi-candidate type-class swaps; 0.5 for missing-param fallbacks that hand the user a list to pick from; 0.8 for streamability substitutes.

type TestMeta ¶ added in v0.5.0

type TestMeta struct {
	// Name is the canonical entry identifier. For tier-1 this is the
	// TestType string (e.g. "TEST_ANOVA_WELCH"). For tier-2 this is the
	// TestType plus the variant suffix
	// (e.g. "TEST_ANOVA_WELCH/welch_one_way_post").
	Name string `json:"name"`

	// Family is the canonical TestType the variant belongs to
	// (e.g. "TEST_PEARSON_R"). Tier-1 entries always have
	// Family == Name. Tier-2 entries set Family to the underlying
	// TestType so clients can pair siblings.
	Family string `json:"family"`

	// Tier is 1 for row tests in Request.Tests, 2 for post tests in
	// Request.PostTests.
	Tier int `json:"tier"`

	// Variant is the algorithm flavour for tier-2 entries
	// (e.g. "welch_one_way_post"). Empty for tier-1 entries.
	Variant string `json:"variant,omitempty"`

	// Description is a one-sentence prose summary.
	Description string `json:"description"`

	// Streamable mirrors types.TestType.Streamable() for tier-1 entries.
	// Always false for tier-2 entries.
	Streamable bool `json:"streamable"`

	// Params lists the operator-specific parameters (alpha,
	// success_value, etc.).
	Params []Param `json:"params"`

	// Requires lists the top-level Test fields that must be set for the
	// test to run (e.g. "Field", "Field2", "SplitBy", "Rows", "Cols").
	// Drives request authoring directly.
	Requires []string `json:"requires,omitempty"`
}

TestMeta describes a statistical test entry in the manifest. Tier-1 and tier-2 tests share this shape; they live in separate top-level slices (Manifest.Tests for tier-1, Manifest.PostTests for tier-2). The Family field ties variants of the same underlying test together across tiers so clients can filter by family.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL