descriptor

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2026 License: MIT Imports: 16 Imported by: 0

Documentation

Overview

Package descriptor provides self-description, manifest, and predict functionality for pulse.

Index

Constants

View Source
const DefaultDictionaryLimit = 100

DefaultDictionaryLimit is the default max entries shown for categorical dictionaries.

Variables

This section is empty.

Functions

This section is empty.

Types

type CohortFieldType

type CohortFieldType struct {
	Name                  string   `json:"name"`
	Categorical           bool     `json:"categorical"`
	CompatibleAggregators []string `json:"compatible_aggregators,omitempty"`
	CompatibleAttributes  []string `json:"compatible_attributes,omitempty"`
	CompatibleFilterers   []string `json:"compatible_filterers,omitempty"`
	CompatibleGroupers    []string `json:"compatible_groupers,omitempty"`
	CompatibleWindows     []string `json:"compatible_windows,omitempty"`
	CompatibleFeatures    []string `json:"compatible_features,omitempty"`
}

CohortFieldType describes a field type available in .pulse files and the operator catalog that accepts it. The Compatible* slices are derived from the per-operator AcceptsTypes declarations and let an LLM look up "what can I do with a date field" in one place.

type Command

type Command struct {
	Name        string `json:"name"`
	Description string `json:"description"`
}

Command describes a CLI leaf command in the manifest.

type Components

type Components struct {
	Aggregators []Operator `json:"aggregators"`
	Attributes  []Operator `json:"attributes"`
	Filterers   []Operator `json:"filterers"`
	Groupers    []Operator `json:"groupers"`
	Windows     []Operator `json:"windows"`
	Features    []Operator `json:"features"`
}

Components lists every registered processing component grouped by category. Each slice carries one Operator entry per component so LLM-side authoring has access to per-operator params, accepted field types, emit type, and streamability without further discovery round-trips.

type DefaultApplied added in v0.5.0

type DefaultApplied struct {
	Path     []string `json:"path"`
	Field    string   `json:"field"`
	Type     string   `json:"type"`
	Category string   `json:"category"`
	Reason   string   `json:"reason"`
}

DefaultApplied describes a single defaulted operator slot. Returned by ResolveDefaults so callers (predict, service, MCP transcripts) can echo exactly which slots had their Type inferred from the schema. The shape is JSON-serializable and lands on PredictResult.DefaultsApplied.

Path uses JSON-style segments e.g. ["Aggregations", "0", "Type"] so the caller can address the slot in the original request. Field names the schema column that drove inference; Type is the operator string that was filled in; Category is "aggregation" or "grouper"; Reason carries a short human-readable rule trace, e.g. "f64 → AGG_SUM (rule: numeric default)".

func ResolveDefaults added in v0.5.0

func ResolveDefaults(req *types.Request, schema *encoding.Schema) []DefaultApplied

ResolveDefaults inspects every aggregation and grouper slot in req, fills in the Type for slots that name a field but omit the Type, and returns one DefaultApplied entry per change made. Mutates req in place.

Rules:

  • Defaults never override an explicit Type.
  • Defaults never cross categories (a missing aggregator does not insert a grouper, and vice versa).
  • Tests (req.Tests, req.PostTests) are not defaulted; hypothesis tests are intent-bearing.
  • Field types absent from defaultRules (or with no rule for the slot's category) are left untouched and contribute no entry.

Returns a nil slice when nothing changed.

type DictionaryInfo

type DictionaryInfo struct {
	TotalEntries int      `json:"total_entries"`
	Truncated    bool     `json:"truncated"`
	Values       []string `json:"values"`
}

DictionaryInfo describes the categorical dictionary for a field.

type DistributionMeta added in v0.5.0

type DistributionMeta struct {
	// Name is the distribution kind identifier (e.g. "lognormal").
	Name string `json:"name"`

	// Description is a one-sentence prose summary.
	Description string `json:"description"`

	// AppliesTo lists the FieldSpec types the distribution can drive.
	// Values are coarse families: "numeric", "categorical", "date",
	// "bool", "any".
	AppliesTo []string `json:"applies_to"`

	// Params lists the distribution-specific parameters.
	Params []Param `json:"params"`
}

DistributionMeta describes a synth distribution entry. One entry per synth.AllDistributions() value.

type Envelope

type Envelope struct {
	FormatVersion string           `json:"format_version"`
	Data          any              `json:"data"`
	Errors        []*EnvelopeEntry `json:"errors"`
	Warnings      []*EnvelopeEntry `json:"warnings"`
}

Envelope is the standard JSON output wrapper for all descriptor operations. All --json output follows this shape.

func Inspect

func Inspect(fileData io.ReadSeeker, opts *InspectOptions) *Envelope

Inspect reads a .pulse file header and schema, returning structured field information. It never reads record data.

func InspectFromBytes

func InspectFromBytes(data []byte, opts *InspectOptions) *Envelope

InspectFromBytes is a convenience wrapper that creates a reader from bytes.

func NewEnvelope

func NewEnvelope(data any) *Envelope

NewEnvelope creates an envelope with the given data and no errors/warnings.

func Predict

func Predict(fileData io.ReadSeeker, req *types.Request, opts *PredictOptions) *Envelope

Predict validates a request against a .pulse file without executing it. It reads only the header and schema, never record data. The returned Envelope contains the PredictResult in Data and any errors/warnings encountered.

func PredictFromBytes

func PredictFromBytes(data []byte, req *types.Request, opts *PredictOptions) *Envelope

PredictFromBytes is a convenience wrapper that creates a reader from bytes.

func (*Envelope) AddError

func (e *Envelope) AddError(code, message string, details map[string]any)

AddError appends a structured error to the envelope.

func (*Envelope) AddWarning

func (e *Envelope) AddWarning(code, message string, details map[string]any)

AddWarning appends a structured warning to the envelope.

func (*Envelope) MarshalJSON

func (e *Envelope) MarshalJSON() ([]byte, error)

MarshalJSON produces deterministic JSON output.

type EnvelopeEntry

type EnvelopeEntry struct {
	Code    string         `json:"code"`
	Message string         `json:"message"`
	Details map[string]any `json:"details,omitempty"`
}

EnvelopeEntry represents a single error or warning in the envelope.

type InspectField

type InspectField struct {
	Name              string          `json:"name"`
	Type              string          `json:"type"`
	ByteOffset        int             `json:"byte_offset"`
	BitPosition       int             `json:"bit_position"`
	Description       string          `json:"description"`
	DescriptionSource string          `json:"description_source"`
	Categorical       bool            `json:"categorical"`
	Dictionary        *DictionaryInfo `json:"dictionary,omitempty"`
	// Precision is the decimal128 precision (1-38). Present only for
	// decimal128 / nullable_decimal128 fields.
	Precision *uint8 `json:"precision,omitempty"`
	// Scale is the decimal128 scale (0-precision). Present only for
	// decimal128 / nullable_decimal128 fields.
	Scale *uint8 `json:"scale,omitempty"`
	// H3Resolution is the native cell resolution (0-15). Present only
	// for h3_cell fields where the import recorded a resolution.
	H3Resolution *uint8 `json:"h3_resolution,omitempty"`
}

InspectField describes a single field in the inspect output.

type InspectOptions

type InspectOptions struct {
	// FullDict disables dictionary truncation when true.
	FullDict bool
	// DictionaryLimit overrides the default truncation limit.
	// Zero means use DefaultDictionaryLimit.
	DictionaryLimit int
}

InspectOptions controls inspect behavior.

type InspectResult

type InspectResult struct {
	FieldCount int             `json:"field_count"`
	Fields     []*InspectField `json:"fields"`
}

InspectResult holds the schema inspection output.

type MCPTool added in v0.5.0

type MCPTool struct {
	// Name is the tool identifier (e.g. "pulse_predict").
	Name string `json:"name"`

	// Description mirrors the description string registered with the
	// MCP server.
	Description string `json:"description"`
}

MCPTool describes a single registered MCP tool. One entry per internal/mcp.RegisteredTools() value.

type Manifest

type Manifest struct {
	FormatVersion      string             `json:"format_version"`
	Commands           []Command          `json:"commands"`
	Components         Components         `json:"components"`
	Tests              []TestMeta         `json:"tests"`
	PostTests          []TestMeta         `json:"post_tests"`
	SynthDistributions []DistributionMeta `json:"synth_distributions"`
	// ErrorCodesCount is the total number of registered error codes.
	ErrorCodesCount int `json:"error_codes_count"`
	// ErrorDomains is the alphabetized list of distinct domain prefixes
	// (e.g. "CLI", "DATA", "ENCODING", "PROCESSING", "PULSE",
	// "SERVICE"). One entry per domain, six entries in v1.
	ErrorDomains []string `json:"error_domains"`
	// ErrorCodes is the alphabetized list of code identifiers.
	// Per-code Message + Fixup prose lives behind the
	// `pulse_errors_lookup` MCP tool / `pulse errors lookup CODE` CLI
	// leaf — depth-on-demand, not common-path.
	ErrorCodes        []string          `json:"error_codes"`
	MCPTools          []MCPTool         `json:"mcp_tools"`
	CohortTypes       []CohortFieldType `json:"cohort_types"`
	Skills            []SkillMeta       `json:"skills"`
	ExamplesCount     int               `json:"examples_count"`
	ExampleCategories []string          `json:"example_categories"`
	ExampleTags       []string          `json:"example_tags"`
}

Manifest is the root self-description of the Pulse system. One bootstrap call returns every fact an LLM needs to author a valid Pulse request: CLI command list, per-operator capabilities, per-test metadata (tier-1 and tier-2 as peer slices), synth distribution catalog, MCP tool list, cohort field-type catalog with operator cross-references, and embedded skill index. Error coverage is name-only — fetch per-code prose via the `pulse_errors_lookup` MCP tool or `pulse errors lookup CODE` CLI leaf on demand to keep the bootstrap payload lean.

The payload is deterministic and free of cohort data. Clients cache it for a session.

func BuildManifest

func BuildManifest() *Manifest

BuildManifest constructs a deterministic Manifest from the current registries and capability tables. The result is safe to cache and share across goroutines; callers do not mutate the returned slices.

func SlimManifest added in v0.5.0

func SlimManifest(m *Manifest) *Manifest

SlimManifest returns a copy of m with every prose Description field blanked. Structural metadata (names, params, types, tiers, accept-type lists, streamability flags, cross-references, fixups) is preserved.

The slim variant is intended for size-sensitive MCP/CLI clients that would rather pay a few extra discovery round-trips than transit the full ~70 kB descriptive payload at session start. The default manifest surface remains the full one; --slim is opt-in.

SlimManifest never mutates the input; it shallow-copies the top-level struct, then per-slice shallow-copies and zeroes the Description fields.

type Operator added in v0.5.0

type Operator struct {
	// Name is the operator identifier (e.g. "AGG_PERCENTILE").
	Name string `json:"name"`

	// Category is the family this operator belongs to. One of:
	// "aggregator", "attribute", "filterer", "grouper", "window",
	// "feature".
	Category string `json:"category"`

	// Description is a one-sentence prose summary for LLM-side selection.
	Description string `json:"description"`

	// Params lists every parameter the operator reads from its Params
	// blob in a request. Required and optional are both listed.
	Params []Param `json:"params"`

	// AcceptsTypes lists the cohort field types this operator can be
	// applied to. Values are field type name strings (e.g. "f64",
	// "categorical_u16", "date"). Empty means "no field input".
	AcceptsTypes []string `json:"accepts_types"`

	// EmitsType is the field type produced for single-output operators.
	// Empty when the operator's emit type is conditional on input or
	// when it does not emit a typed column (e.g. an aggregator emits a
	// scalar).
	EmitsType string `json:"emits_type,omitempty"`

	// EmitsTypeNote provides context when EmitsType is empty or
	// conditional (e.g. "matches input field type", "scalar float64").
	EmitsTypeNote string `json:"emits_type_note,omitempty"`

	// Streamable mirrors types.X.Streamable() for the operator's type.
	// Source of truth for the runtime gate.
	Streamable bool `json:"streamable"`

	// StreamableHint suggests the closest streaming-capable alternative
	// when Streamable is false (e.g. AGG_MEDIAN -> "Use AGG_AVERAGE for
	// streaming, or accept the buffered path."). Empty when Streamable
	// is true or no near-equivalent exists.
	StreamableHint string `json:"streamable_hint,omitempty"`
}

Operator describes a single registered processing component (aggregator, attribute, filterer, grouper, window operator, or feature operator). The manifest exposes one Operator entry per registered component. LLM clients use this metadata at session start to author valid requests without further discovery round-trips.

type Param added in v0.5.0

type Param struct {
	// Name is the JSON key inside the operator's Params blob.
	Name string `json:"name"`

	// Type is the parameter's value type. One of:
	//   "float", "int", "string", "bool", "field", "enum", "list", "object".
	// "field" means the value names a cohort field; "enum" means the
	// value is one of EnumValues; "list" means a JSON array.
	Type string `json:"type"`

	// Required is true when the parameter must be supplied. Optional
	// parameters with a default carry Required=false and Default set.
	Required bool `json:"required"`

	// Default is the operator's default value when Required is false.
	// Omitted from JSON when nil.
	Default any `json:"default,omitempty"`

	// Description is a one-sentence prose explanation.
	Description string `json:"description"`

	// EnumValues lists the allowed values when Type=="enum".
	EnumValues []string `json:"enum_values,omitempty"`

	// FieldFilter constrains acceptable field types when Type=="field".
	// One of: "numeric", "categorical", "date", "geo", "any".
	FieldFilter string `json:"field_filter,omitempty"`
}

Param describes a single parameter accepted by an operator, test, or synth distribution.

type PredictOptions

type PredictOptions struct {
	// Strict upgrades warnings to errors.
	Strict bool
}

PredictOptions controls predict behavior.

type PredictResult

type PredictResult struct {
	Valid      bool               `json:"valid"`
	Request    *types.Request     `json:"request"`
	SchemaInfo *PredictSchemaInfo `json:"schema_info,omitempty"`
	// Streamable reports whether ProcessStream / process --stream can
	// emit rows without buffering the entire result. False whenever the
	// request uses groups, attributes, windows, geo aggregations, decimal
	// fields, or any non-streamable operator. Computed via per-type
	// Streamable() methods plus schema-aware checks.
	Streamable bool `json:"streamable"`
	// StreamableReasons lists the gates that forced Streamable=false. Empty
	// when Streamable=true. Useful for users debugging why their request
	// is buffering.
	StreamableReasons []string `json:"streamable_reasons,omitempty"`
	// Suggestions enumerates structured next-actions the caller can apply
	// to repair (or improve) the request. Suggestions fire on validation
	// issues — field-name typos, operator/type mismatches, date misuse,
	// missing required params — and on non-streamable but otherwise valid
	// requests (streamable-substitute hints). May be empty; never nil in
	// JSON output.
	Suggestions []Suggestion `json:"suggestions"`
	// DefaultsApplied lists every operator slot whose Type was inferred
	// from the named field's schema type. Predict computes this on a
	// clone of the request, so the echoed Request reflects exactly what
	// the engine would run; the DefaultsApplied list shows what would
	// have been filled in. Empty when no defaults fire; never nil in
	// JSON output.
	DefaultsApplied []DefaultApplied `json:"defaults_applied"`
}

PredictResult holds the validated request and any diagnostics.

type PredictSchemaInfo

type PredictSchemaInfo struct {
	FieldCount int      `json:"field_count"`
	Fields     []string `json:"fields"`
}

PredictSchemaInfo summarizes the schema used for prediction.

type SkillMeta

type SkillMeta struct {
	Name        string `json:"name"`
	Description string `json:"description"`
}

SkillMeta describes a bundled skill.

type Suggestion added in v0.5.0

type Suggestion struct {
	Path       []string `json:"path"`
	Reason     string   `json:"reason"`
	Current    any      `json:"current,omitempty"`
	Proposed   []any    `json:"proposed,omitempty"`
	Confidence float64  `json:"confidence"`
}

Suggestion is a structured next-action attached to PredictResult. Predict computes suggestions inline so callers can repair a request without an additional inspect round-trip.

Path points at the offending request location using JSON-style segments — e.g. ["Aggregations", "0", "Field"] addresses the Field of the first aggregation.

Proposed is a ranked list of candidate values. Empty when no concrete proposal applies (e.g. ATTR_PERCENTILE has no streamable peer); the caller should treat empty Proposed as advisory.

Confidence is a static heuristic in [0, 1]: 0.9 for high-certainty single-candidate swaps and Levenshtein distance 1; 0.7 for distance 2; 0.6 for multi-candidate type-class swaps; 0.5 for missing-param fallbacks that hand the user a list to pick from; 0.8 for streamability substitutes.

type TestMeta added in v0.5.0

type TestMeta struct {
	// Name is the canonical entry identifier. For tier-1 this is the
	// TestType string (e.g. "TEST_ANOVA_WELCH"). For tier-2 this is the
	// TestType plus the variant suffix
	// (e.g. "TEST_ANOVA_WELCH/welch_one_way_post").
	Name string `json:"name"`

	// Family is the canonical TestType the variant belongs to
	// (e.g. "TEST_PEARSON_R"). Tier-1 entries always have
	// Family == Name. Tier-2 entries set Family to the underlying
	// TestType so clients can pair siblings.
	Family string `json:"family"`

	// Tier is 1 for row tests in Request.Tests, 2 for post tests in
	// Request.PostTests.
	Tier int `json:"tier"`

	// Variant is the algorithm flavour for tier-2 entries
	// (e.g. "welch_one_way_post"). Empty for tier-1 entries.
	Variant string `json:"variant,omitempty"`

	// Description is a one-sentence prose summary.
	Description string `json:"description"`

	// Streamable mirrors types.TestType.Streamable() for tier-1 entries.
	// Always false for tier-2 entries.
	Streamable bool `json:"streamable"`

	// Params lists the operator-specific parameters (alpha,
	// success_value, etc.).
	Params []Param `json:"params"`

	// Requires lists the top-level Test fields that must be set for the
	// test to run (e.g. "Field", "Field2", "SplitBy", "Rows", "Cols").
	// Drives request authoring directly.
	Requires []string `json:"requires,omitempty"`
}

TestMeta describes a statistical test entry in the manifest. Tier-1 and tier-2 tests share this shape; they live in separate top-level slices (Manifest.Tests for tier-1, Manifest.PostTests for tier-2). The Family field ties variants of the same underlying test together across tiers so clients can filter by family.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL