lattice

package
v0.5.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2026 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Index

Constants

View Source
const MaxConcepts = 10000

MaxConcepts is the safety cap on concept enumeration. If the lattice has more concepts than this, enumeration stops early.

Variables

This section is empty.

Functions

func AnalyzeFields

func AnalyzeFields(records []any) map[string]*FieldStats

AnalyzeFields examines all sampled records to gather field statistics.

func InferGreedy added in v0.2.0

func InferGreedy(records []any, config ProjectConfig) *api.Topology

InferGreedy performs schema inference using a greedy entropy-based partitioning algorithm.

func Project

func Project(concepts []Concept, ctx *FormalContext, config ProjectConfig) *api.Topology

Project walks the concept lattice and emits an api.Topology.

Projection rules:

  1. Universal attributes (top concept intent) identify fields present in ALL records.
  2. Identifier field = highest-cardinality universal string field → directory name template.
  3. Shard levels = date-scaled attributes with 2-100 distinct groups → directory levels.
  4. Leaf files = remaining universal scalar fields → file content templates.
  5. raw.json is always included.

func ProjectAST added in v0.1.1

func ProjectAST(concepts []Concept, ctx *FormalContext, config ProjectConfig) *api.Topology

ProjectAST converts Formal Concepts from a flattened AST into a recursive schema.

Strategy: 1. Identify "Container" types: Nodes that have a 'name' (identifier) and a 'body' (block).

  • e.g. function_definition, class_definition

2. Create a Schema Node for each Container type. 3. Make the schema recursive: Every container can contain every other container type.

  • This covers classes inside functions, functions inside classes, etc.

4. Default to mapping "source" -> content of the node.

func WalkFieldPaths

func WalkFieldPaths(v any) []string

WalkFieldPaths extracts all leaf field paths from a JSON-like value. Returns sorted, unique paths using dot notation (e.g., "item.cve.id").

Types

type Attribute

type Attribute struct {
	Name  string        // e.g., "item.cve.id" or "item.published.year=2024"
	Kind  AttributeKind // Presence or ScaledValue
	Field string        // original field path (for ScaledValue, the source field)
}

Attribute is a named binary property in the formal context.

func DecideScaling

func DecideScaling(stats map[string]*FieldStats, totalRecords int) []Attribute

DecideScaling determines which attributes to create from field statistics.

type AttributeKind

type AttributeKind int

AttributeKind classifies how a JSON field is converted to binary attributes.

const (
	// Presence means the field path exists in the record.
	Presence AttributeKind = iota
	// ScaledValue means the attribute represents a specific value (e.g., year=2024).
	ScaledValue
)

type Concept

type Concept struct {
	Extent *roaring.Bitmap // object indices
	Intent *roaring.Bitmap // attribute indices
}

Concept is a maximal rectangle in the incidence table: a pair (Extent, Intent) where Extent' = Intent and Intent' = Extent.

func NextClosure

func NextClosure(ctx *FormalContext) []Concept

NextClosure enumerates all formal concepts using Ganter's algorithm. Concepts are produced in lectic order of their intents. Output-polynomial: O(|concepts| × |M| × |G|).

type FieldStats

type FieldStats struct {
	Count       int            // how many records have this field
	Cardinality int            // number of distinct values
	IsDate      bool           // whether values match ISO date pattern
	Values      map[string]int // distinct value → count
}

FieldStats holds statistics about a single field across all sampled records.

type FormalContext

type FormalContext struct {
	ObjectCount int
	Attributes  []Attribute

	Stats map[string]*FieldStats
	// contains filtered or unexported fields
}

FormalContext is a bitmap-based incidence table for Formal Concept Analysis. Column-major storage: each attribute has a bitmap of which objects possess it.

func BuildContext

func BuildContext(records []any, attrs []Attribute) *FormalContext

BuildContext constructs a FormalContext from records using the given attributes.

func BuildContextFromRecords

func BuildContextFromRecords(records []any) *FormalContext

BuildContextFromRecords is a convenience that analyzes fields and builds a context.

func NewFormalContext

func NewFormalContext(objectCount int, attrNames []string, incidence [][]bool) *FormalContext

NewFormalContext creates a FormalContext from a pre-built incidence table. Used for unit tests with known cross-tables.

func (*FormalContext) AttrDeriv

func (ctx *FormalContext) AttrDeriv(attrs *roaring.Bitmap) *roaring.Bitmap

AttrDeriv computes B' — the set of objects that have ALL attributes in B.

func (*FormalContext) Closure

func (ctx *FormalContext) Closure(attrs *roaring.Bitmap) *roaring.Bitmap

Closure computes B” = (B')'.

func (*FormalContext) ObjectDeriv

func (ctx *FormalContext) ObjectDeriv(objs *roaring.Bitmap) *roaring.Bitmap

ObjectDeriv computes A' — the set of attributes common to ALL objects in A.

type InferConfig

type InferConfig struct {
	SampleSize int               // max records to sample (default 1000)
	RootName   string            // root directory name (default "records")
	Seed       int64             // random seed for reservoir sampling (0 = deterministic)
	Method     string            // "fca" (default) or "greedy"
	MaxDepth   int               // max depth for greedy inference (default 5)
	Hints      map[string]string // user-provided type hints
	Language   string            // language hint for generated nodes (e.g., "go", "terraform")
}

InferConfig controls the schema inference pipeline.

func DefaultInferConfig

func DefaultInferConfig() InferConfig

DefaultInferConfig returns sensible defaults.

type Inferrer

type Inferrer struct {
	Config InferConfig
}

Inferrer orchestrates FCA-based schema inference.

func (*Inferrer) InferFromRecords

func (inf *Inferrer) InferFromRecords(records []any) (*api.Topology, error)

InferFromRecords infers a topology from pre-loaded records.

func (*Inferrer) InferFromSQLite

func (inf *Inferrer) InferFromSQLite(dbPath string) (*api.Topology, error)

InferFromSQLite infers a topology by streaming records from a SQLite database. Uses reservoir sampling to keep memory bounded.

func (*Inferrer) InferFromSQLiteJSON

func (inf *Inferrer) InferFromSQLiteJSON(dbPath string) (*api.Topology, error)

InferFromSQLiteJSON is like InferFromSQLite but works with raw JSON strings. Used when records need custom parsing.

func (*Inferrer) InferFromTreeSitter added in v0.1.1

func (inf *Inferrer) InferFromTreeSitter(root *sitter.Node) (*api.Topology, error)

InferFromTreeSitter infers a topology from a parsed Tree-sitter AST. Always uses the FCA path (ProjectAST) because the greedy path generates JSONPath selectors which are incompatible with tree-sitter ingestion. ProjectAST generates proper S-expression selectors for tree-sitter queries.

func (*Inferrer) InferMultiLanguage added in v0.2.0

func (inf *Inferrer) InferMultiLanguage(recordsByLang map[string][]any) (*api.Topology, error)

InferMultiLanguage infers a multi-language schema from per-language record sets. Creates a namespace node for each language and returns a unified topology.

type ProjectConfig

type ProjectConfig struct {
	RootName string            // directory name for the root node (default: "records")
	MaxDepth int               // maximum depth for recursive inference (default: 5)
	Hints    map[string]string // hints for attribute types ("id", "temporal", "reference")
	Language string            // language hint for generated nodes (e.g., "go", "terraform")
}

ProjectConfig controls how the lattice is projected into a topology.

func DefaultProjectConfig

func DefaultProjectConfig() ProjectConfig

DefaultProjectConfig returns sensible defaults.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL