Documentation
¶
Index ¶
- Constants
- func AnalyzeFields(records []any) map[string]*FieldStats
- func Project(concepts []Concept, ctx *FormalContext, config ProjectConfig) *api.Topology
- func WalkFieldPaths(v any) []string
- type Attribute
- type AttributeKind
- type Concept
- type FieldStats
- type FormalContext
- type InferConfig
- type Inferrer
- type ProjectConfig
Constants ¶
const MaxConcepts = 10000
MaxConcepts is the safety cap on concept enumeration. If the lattice has more concepts than this, enumeration stops early.
Variables ¶
This section is empty.
Functions ¶
func AnalyzeFields ¶
func AnalyzeFields(records []any) map[string]*FieldStats
AnalyzeFields examines all sampled records to gather field statistics.
func Project ¶
func Project(concepts []Concept, ctx *FormalContext, config ProjectConfig) *api.Topology
Project walks the concept lattice and emits an api.Topology.
Projection rules:
- Universal attributes (top concept intent) identify fields present in ALL records.
- Identifier field = highest-cardinality universal string field → directory name template.
- Shard levels = date-scaled attributes with 2-100 distinct groups → directory levels.
- Leaf files = remaining universal scalar fields → file content templates.
- raw.json is always included.
func WalkFieldPaths ¶
WalkFieldPaths extracts all leaf field paths from a JSON-like value. Returns sorted, unique paths using dot notation (e.g., "item.cve.id").
Types ¶
type Attribute ¶
type Attribute struct {
Name string // e.g., "item.cve.id" or "item.published.year=2024"
Kind AttributeKind // Presence or ScaledValue
Field string // original field path (for ScaledValue, the source field)
}
Attribute is a named binary property in the formal context.
func DecideScaling ¶
func DecideScaling(stats map[string]*FieldStats, totalRecords int) []Attribute
DecideScaling determines which attributes to create from field statistics.
type AttributeKind ¶
type AttributeKind int
AttributeKind classifies how a JSON field is converted to binary attributes.
const ( // Presence means the field path exists in the record. Presence AttributeKind = iota // ScaledValue means the attribute represents a specific value (e.g., year=2024). ScaledValue )
type Concept ¶
type Concept struct {
Extent *roaring.Bitmap // object indices
Intent *roaring.Bitmap // attribute indices
}
Concept is a maximal rectangle in the incidence table: a pair (Extent, Intent) where Extent' = Intent and Intent' = Extent.
func NextClosure ¶
func NextClosure(ctx *FormalContext) []Concept
NextClosure enumerates all formal concepts using Ganter's algorithm. Concepts are produced in lectic order of their intents. Output-polynomial: O(|concepts| × |M| × |G|).
type FieldStats ¶
type FieldStats struct {
Count int // how many records have this field
Cardinality int // number of distinct values
IsDate bool // whether values match ISO date pattern
Values map[string]int // distinct value → count
}
FieldStats holds statistics about a single field across all sampled records.
type FormalContext ¶
type FormalContext struct {
ObjectCount int
Attributes []Attribute
Stats map[string]*FieldStats
// contains filtered or unexported fields
}
FormalContext is a bitmap-based incidence table for Formal Concept Analysis. Column-major storage: each attribute has a bitmap of which objects possess it.
func BuildContext ¶
func BuildContext(records []any, attrs []Attribute) *FormalContext
BuildContext constructs a FormalContext from records using the given attributes.
func BuildContextFromRecords ¶
func BuildContextFromRecords(records []any) *FormalContext
BuildContextFromRecords is a convenience that analyzes fields and builds a context.
func NewFormalContext ¶
func NewFormalContext(objectCount int, attrNames []string, incidence [][]bool) *FormalContext
NewFormalContext creates a FormalContext from a pre-built incidence table. Used for unit tests with known cross-tables.
func (*FormalContext) AttrDeriv ¶
func (ctx *FormalContext) AttrDeriv(attrs *roaring.Bitmap) *roaring.Bitmap
AttrDeriv computes B' — the set of objects that have ALL attributes in B.
func (*FormalContext) Closure ¶
func (ctx *FormalContext) Closure(attrs *roaring.Bitmap) *roaring.Bitmap
Closure computes B” = (B')'.
func (*FormalContext) ObjectDeriv ¶
func (ctx *FormalContext) ObjectDeriv(objs *roaring.Bitmap) *roaring.Bitmap
ObjectDeriv computes A' — the set of attributes common to ALL objects in A.
type InferConfig ¶
type InferConfig struct {
SampleSize int // max records to sample (default 1000)
RootName string // root directory name (default "records")
Seed int64 // random seed for reservoir sampling (0 = deterministic)
}
InferConfig controls the schema inference pipeline.
func DefaultInferConfig ¶
func DefaultInferConfig() InferConfig
DefaultInferConfig returns sensible defaults.
type Inferrer ¶
type Inferrer struct {
Config InferConfig
}
Inferrer orchestrates FCA-based schema inference.
func (*Inferrer) InferFromRecords ¶
InferFromRecords infers a topology from pre-loaded records.
func (*Inferrer) InferFromSQLite ¶
InferFromSQLite infers a topology by streaming records from a SQLite database. Uses reservoir sampling to keep memory bounded.
type ProjectConfig ¶
type ProjectConfig struct {
RootName string // directory name for the root node (default: "records")
}
ProjectConfig controls how the lattice is projected into a topology.
func DefaultProjectConfig ¶
func DefaultProjectConfig() ProjectConfig
DefaultProjectConfig returns sensible defaults.