Documentation
¶
Index ¶
- Constants
- func AnalyzeFields(records []any) map[string]*FieldStats
- func InferGreedy(records []any, config ProjectConfig) *api.Topology
- func Project(concepts []Concept, ctx *FormalContext, config ProjectConfig) *api.Topology
- func ProjectAST(concepts []Concept, ctx *FormalContext, config ProjectConfig) *api.Topology
- func WalkFieldPaths(v any) []string
- type Attribute
- type AttributeKind
- type Concept
- type FieldStats
- type FormalContext
- type InferConfig
- type Inferrer
- func (inf *Inferrer) InferFromRecords(records []any) (*api.Topology, error)
- func (inf *Inferrer) InferFromSQLite(dbPath string) (*api.Topology, error)
- func (inf *Inferrer) InferFromSQLiteJSON(dbPath string) (*api.Topology, error)
- func (inf *Inferrer) InferFromTreeSitter(root *sitter.Node) (*api.Topology, error)
- func (inf *Inferrer) InferMultiLanguage(recordsByLang map[string][]any) (*api.Topology, error)
- type ProjectConfig
Constants ¶
const MaxConcepts = 10000
MaxConcepts is the safety cap on concept enumeration. If the lattice has more concepts than this, enumeration stops early.
Variables ¶
This section is empty.
Functions ¶
func AnalyzeFields ¶
func AnalyzeFields(records []any) map[string]*FieldStats
AnalyzeFields examines all sampled records to gather field statistics.
func InferGreedy ¶ added in v0.2.0
func InferGreedy(records []any, config ProjectConfig) *api.Topology
InferGreedy performs schema inference using a greedy entropy-based partitioning algorithm.
func Project ¶
func Project(concepts []Concept, ctx *FormalContext, config ProjectConfig) *api.Topology
Project walks the concept lattice and emits an api.Topology.
Projection rules:
- Universal attributes (top concept intent) identify fields present in ALL records.
- Identifier field = highest-cardinality universal string field → directory name template.
- Shard levels = date-scaled attributes with 2-100 distinct groups → directory levels.
- Leaf files = remaining universal scalar fields → file content templates.
- raw.json is always included.
func ProjectAST ¶ added in v0.1.1
func ProjectAST(concepts []Concept, ctx *FormalContext, config ProjectConfig) *api.Topology
ProjectAST converts Formal Concepts from a flattened AST into a recursive schema.
Strategy: 1. Identify "Container" types: Nodes that have a 'name' (identifier) and a 'body' (block).
- e.g. function_definition, class_definition
2. Create a Schema Node for each Container type. 3. Make the schema recursive: Every container can contain every other container type.
- This covers classes inside functions, functions inside classes, etc.
4. Default to mapping "source" -> content of the node.
func WalkFieldPaths ¶
WalkFieldPaths extracts all leaf field paths from a JSON-like value. Returns sorted, unique paths using dot notation (e.g., "item.cve.id").
Types ¶
type Attribute ¶
type Attribute struct {
Name string // e.g., "item.cve.id" or "item.published.year=2024"
Kind AttributeKind // Presence or ScaledValue
Field string // original field path (for ScaledValue, the source field)
}
Attribute is a named binary property in the formal context.
func DecideScaling ¶
func DecideScaling(stats map[string]*FieldStats, totalRecords int) []Attribute
DecideScaling determines which attributes to create from field statistics.
type AttributeKind ¶
type AttributeKind int
AttributeKind classifies how a JSON field is converted to binary attributes.
const ( // Presence means the field path exists in the record. Presence AttributeKind = iota // ScaledValue means the attribute represents a specific value (e.g., year=2024). ScaledValue )
type Concept ¶
type Concept struct {
Extent *roaring.Bitmap // object indices
Intent *roaring.Bitmap // attribute indices
}
Concept is a maximal rectangle in the incidence table: a pair (Extent, Intent) where Extent' = Intent and Intent' = Extent.
func NextClosure ¶
func NextClosure(ctx *FormalContext) []Concept
NextClosure enumerates all formal concepts using Ganter's algorithm. Concepts are produced in lectic order of their intents. Output-polynomial: O(|concepts| × |M| × |G|).
type FieldStats ¶
type FieldStats struct {
Count int // how many records have this field
Cardinality int // number of distinct values
IsDate bool // whether values match ISO date pattern
Values map[string]int // distinct value → count
}
FieldStats holds statistics about a single field across all sampled records.
type FormalContext ¶
type FormalContext struct {
ObjectCount int
Attributes []Attribute
Stats map[string]*FieldStats
// contains filtered or unexported fields
}
FormalContext is a bitmap-based incidence table for Formal Concept Analysis. Column-major storage: each attribute has a bitmap of which objects possess it.
func BuildContext ¶
func BuildContext(records []any, attrs []Attribute) *FormalContext
BuildContext constructs a FormalContext from records using the given attributes.
func BuildContextFromRecords ¶
func BuildContextFromRecords(records []any) *FormalContext
BuildContextFromRecords is a convenience that analyzes fields and builds a context.
func NewFormalContext ¶
func NewFormalContext(objectCount int, attrNames []string, incidence [][]bool) *FormalContext
NewFormalContext creates a FormalContext from a pre-built incidence table. Used for unit tests with known cross-tables.
func (*FormalContext) AttrDeriv ¶
func (ctx *FormalContext) AttrDeriv(attrs *roaring.Bitmap) *roaring.Bitmap
AttrDeriv computes B' — the set of objects that have ALL attributes in B.
func (*FormalContext) Closure ¶
func (ctx *FormalContext) Closure(attrs *roaring.Bitmap) *roaring.Bitmap
Closure computes B” = (B')'.
func (*FormalContext) ObjectDeriv ¶
func (ctx *FormalContext) ObjectDeriv(objs *roaring.Bitmap) *roaring.Bitmap
ObjectDeriv computes A' — the set of attributes common to ALL objects in A.
type InferConfig ¶
type InferConfig struct {
SampleSize int // max records to sample (default 1000)
RootName string // root directory name (default "records")
Seed int64 // random seed for reservoir sampling (0 = deterministic)
Method string // "fca" (default) or "greedy"
MaxDepth int // max depth for greedy inference (default 5)
Hints map[string]string // user-provided type hints
Language string // language hint for generated nodes (e.g., "go", "terraform")
}
InferConfig controls the schema inference pipeline.
func DefaultInferConfig ¶
func DefaultInferConfig() InferConfig
DefaultInferConfig returns sensible defaults.
type Inferrer ¶
type Inferrer struct {
Config InferConfig
}
Inferrer orchestrates FCA-based schema inference.
func (*Inferrer) InferFromRecords ¶
InferFromRecords infers a topology from pre-loaded records.
func (*Inferrer) InferFromSQLite ¶
InferFromSQLite infers a topology by streaming records from a SQLite database. Uses reservoir sampling to keep memory bounded.
func (*Inferrer) InferFromSQLiteJSON ¶
InferFromSQLiteJSON is like InferFromSQLite but works with raw JSON strings. Used when records need custom parsing.
func (*Inferrer) InferFromTreeSitter ¶ added in v0.1.1
InferFromTreeSitter infers a topology from a parsed Tree-sitter AST. Always uses the FCA path (ProjectAST) because the greedy path generates JSONPath selectors which are incompatible with tree-sitter ingestion. ProjectAST generates proper S-expression selectors for tree-sitter queries.
func (*Inferrer) InferMultiLanguage ¶ added in v0.2.0
InferMultiLanguage infers a multi-language schema from per-language record sets. Creates a namespace node for each language and returns a unified topology.
type ProjectConfig ¶
type ProjectConfig struct {
RootName string // directory name for the root node (default: "records")
MaxDepth int // maximum depth for recursive inference (default: 5)
Hints map[string]string // hints for attribute types ("id", "temporal", "reference")
Language string // language hint for generated nodes (e.g., "go", "terraform")
}
ProjectConfig controls how the lattice is projected into a topology.
func DefaultProjectConfig ¶
func DefaultProjectConfig() ProjectConfig
DefaultProjectConfig returns sensible defaults.