plan

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 2, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package plan provides the cost-based planner for the Cypher executor. This file implements selectivity estimation used by the scan-strategy selection rule (task-283).

When the underlying graph can rotate (new CSR snapshot published via generation.Publisher or a Tx commit that alters graph structure), wrap an IndexEstimator with a StatsManager. StatsManager provides generation-aware cache invalidation so that avg-out-degree estimates remain coherent with the live graph without requiring callers to manage the estimator's degree cache directly.

Index

Examples

Constants

This section is empty.

Variables

PerRowCost is the assumed relative cost per row for each scan type. These are heuristic weights; they should be tuned in production via benchmarks.

Functions

func CacheKey

func CacheKey(query string, paramTypes []string) [32]byte

CacheKey computes the SHA-256 cache key for a query string and a slice of parameter type names. The parameter names are sorted before hashing so the key is independent of the slice order.

The key layout is:

uint32 big-endian query length | query bytes | sorted param types (each
prefixed with uint32 big-endian length)

This makes collisions between a query with zero params and a query whose string happens to equal another's query+param encoding impossible.

Example

ExampleCacheKey computes a stable plan-cache key for a query and its parameter type names. The key is independent of parameter order.

package main

import (
	"fmt"

	"github.com/FlavioCFOliveira/GoGraph/cypher/plan"
)

func main() {
	a := plan.CacheKey("MATCH (n) RETURN n", []string{"Integer", "String"})
	b := plan.CacheKey("MATCH (n) RETURN n", []string{"String", "Integer"})

	// Same query and the same param types (in any order) hash identically.
	fmt.Println("order-independent:", a == b)

	c := plan.CacheKey("MATCH (n:Person) RETURN n", nil)
	fmt.Println("different query differs:", a != c)
}
Output:
order-independent: true
different query differs: true

Types

type Cache

type Cache struct {
	// contains filtered or unexported fields
}

Cache is a thread-safe LRU plan cache.

Key: a [32]byte SHA-256 digest (see CacheKey). Value: the logical plan stored as any; the cache does not inspect values.

Metrics (hits, misses, evictions) are exposed as atomic counters readable without acquiring the main lock via Cache.Stats.

Cache is safe for concurrent use.

func NewCache

func NewCache(capacity int) *Cache

NewCache creates a Cache with the given capacity. If capacity < 1 it defaults to 1024.

func (*Cache) Clear

func (c *Cache) Clear()

Clear removes all entries and resets the hit/miss/eviction counters.

func (*Cache) Get

func (c *Cache) Get(key [32]byte) (any, bool)

Get looks up key in the cache. If found it moves the entry to the most-recently-used position and returns (value, true). Otherwise it returns (nil, false).

func (*Cache) Put

func (c *Cache) Put(key [32]byte, value any)

Put inserts or updates the entry for key with the given value. If inserting causes the cache to exceed its capacity, the least-recently-used entry is evicted.

func (*Cache) Stats

func (c *Cache) Stats() CacheStats

Stats returns a point-in-time snapshot of cache metrics.

type CacheStats

type CacheStats struct {
	// Hits is the total number of successful Get lookups since the cache was
	// created (or last [Cache.Clear]ed).
	Hits uint64
	// Misses is the total number of unsuccessful Get lookups.
	Misses uint64
	// Evictions is the total number of entries expelled by the LRU policy.
	Evictions uint64
	// Size is the current number of entries held in the cache.
	Size int
	// Cap is the maximum number of entries the cache can hold.
	Cap int
}

CacheStats holds a point-in-time snapshot of Cache metrics.

type Estimator

type Estimator interface {
	// LabelCount returns the number of nodes carrying the given label.
	LabelCount(label uint32) uint64
	// HashLookupCount returns the estimated number of nodes carrying
	// (property == value) for an exact-match hash index. Returns 0 when
	// no hash index is registered for the property.
	HashLookupCount(property uint32, value any) uint64
	// BTreeRangeCount returns the estimated number of nodes in the range
	// [lo, hi] for a B+ tree index. Returns 0 when no btree index is
	// registered for the property.
	BTreeRangeCount(property uint32, lo, hi string) uint64
	// AvgOutDegree returns the average out-degree of nodes carrying
	// srcLabel, following edges of relType to nodes carrying dstLabel.
	// Returns 1.0 when insufficient data is available.
	AvgOutDegree(srcLabel, relType, dstLabel uint32) float64
}

Estimator provides cardinality estimates for physical plan operators. All methods return uint64 row-count estimates; callers treat 0 as "unknown, assume 1".

type ExpandDecision

type ExpandDecision struct {
	Dir               ExpandDirection
	EstimatedFrontier float64 // estimated intermediate rows
}

ExpandDecision records the result of SelectExpandDirection.

func SelectExpandDirection

func SelectExpandDirection(srcLabelID, relTypeID, dstLabelID uint32, est Estimator) ExpandDecision

SelectExpandDirection chooses the cheaper traversal direction for an undirected (or undirected-by-preference) relationship. When the dst-side label is more selective than the src-side, expand IN; otherwise expand OUT.

srcLabelID and dstLabelID may be 0 (no label constraint). relTypeID is the interned relationship type ID; 0 means any type.

Example

ExampleSelectExpandDirection shows the planner choosing to expand from the more selective (smaller) side of an undirected relationship. Here the dst label has far fewer nodes, so the planner expands IN.

package main

import (
	"fmt"

	"github.com/FlavioCFOliveira/GoGraph/cypher/plan"
)

// fixedEstimator is a minimal Estimator returning canned cardinalities so the
// example output is deterministic.
type fixedEstimator struct {
	labelCounts map[uint32]uint64
}

func (e fixedEstimator) LabelCount(label uint32) uint64                { return e.labelCounts[label] }
func (e fixedEstimator) HashLookupCount(uint32, any) uint64            { return 0 }
func (e fixedEstimator) BTreeRangeCount(uint32, string, string) uint64 { return 0 }
func (e fixedEstimator) AvgOutDegree(uint32, uint32, uint32) float64   { return 4.0 }

func main() {
	const (
		srcLabel uint32 = 1
		relType  uint32 = 10
		dstLabel uint32 = 2
	)
	est := fixedEstimator{labelCounts: map[uint32]uint64{
		srcLabel: 10_000, // many source nodes
		dstLabel: 100,    // few destination nodes
	}}

	decision := plan.SelectExpandDirection(srcLabel, relType, dstLabel, est)

	dir := "OUT"
	if decision.Dir == plan.ExpandIn {
		dir = "IN"
	}
	fmt.Printf("direction=%s frontier=%.0f\n", dir, decision.EstimatedFrontier)
}
Output:
direction=IN frontier=400

type ExpandDirection

type ExpandDirection uint8

ExpandDirection is the chosen traversal direction.

const (
	ExpandOut ExpandDirection = iota // expand from src → OUT neighbours
	ExpandIn                         // expand from dst → IN neighbours (reverse scan)
)

Recognised ExpandDirection values. When the two frontier estimates are equal, ExpandOut is preferred (stable tie-break).

type IndexEntry

type IndexEntry struct {
	Name string
	Kind IndexKind
	// Subscriber is the raw subscriber; planner rules can type-assert to
	// obtain kind-specific query APIs (e.g. hash.Lookup, btree.Range).
	Subscriber index.Subscriber
}

IndexEntry describes one registered index from the planner's perspective.

type IndexEstimator

type IndexEstimator struct {
	// contains filtered or unexported fields
}

IndexEstimator implements Estimator by consulting the index.Manager, the label.Index, and an avg-out-degree cache computed from the CSR.

IndexEstimator is safe for concurrent use.

func NewIndexEstimator

func NewIndexEstimator(labelIdx *label.Index, mgr *index.Manager) *IndexEstimator

NewIndexEstimator returns an estimator backed by the given label index and secondary index manager. Both may be nil (returns a zero estimator that always returns 0/1.0).

func (*IndexEstimator) AvgOutDegree

func (e *IndexEstimator) AvgOutDegree(srcLabel, relType, dstLabel uint32) float64

AvgOutDegree returns the cached average out-degree for the triple (srcLabel, relType, dstLabel). Returns 1.0 when no cached value is available. The cache is populated externally via UpdateDegreeCache after a new CSR snapshot is built.

func (*IndexEstimator) BTreeRangeCount

func (e *IndexEstimator) BTreeRangeCount(_ uint32, _, _ string) uint64

BTreeRangeCount returns an estimate of nodes whose indexed property value falls within [lo, hi]. The estimate applies a fixed 30% selectivity over the per-subscriber distinct-value count:

ceil(distinctValues * 0.30)

When a subscriber implements btreeDistinctValuer (btree.Index[V]), that count is used. When no such subscriber exists, the fallback is LabelCount(0) / 10 (10% global selectivity).

lo and hi are accepted for interface conformance; future implementations may use them to compute a tighter histogram bucket.

func (*IndexEstimator) HashLookupCount

func (e *IndexEstimator) HashLookupCount(_ uint32, _ any) uint64

HashLookupCount returns an estimate of nodes where the indexed property equals value. The estimate is computed as:

totalNodes / max(1, distinctValues)

where totalNodes is the sum of all label-zero counts (i.e. the count stored under label 0 is treated as total-node sentinel; when not available the fallback is LabelCount(0)). When no subscriber implements the DistinctValues capability, 0 is returned.

The property parameter is accepted for interface conformance and future routing once indexes expose their property IDs; currently the estimator consults the first subscriber that provides a DistinctValues() uint64 capability.

func (*IndexEstimator) LabelCount

func (e *IndexEstimator) LabelCount(lbl uint32) uint64

LabelCount returns the number of nodes carrying the given label. Returns 0 when the label index is nil or the label is unknown.

func (*IndexEstimator) UpdateDegreeCache

func (e *IndexEstimator) UpdateDegreeCache(srcLabel, relType, dstLabel uint32, avg float64)

UpdateDegreeCache updates the avg-out-degree cache for the triple (srcLabel, relType, dstLabel). This is called by snapshot rotation hooks after a new CSR is built.

type IndexKind

type IndexKind uint8

IndexKind classifies the physical implementation of a registered index as seen by the planner.

const (
	IndexKindUnknown IndexKind = iota
	IndexKindLabel             // label bitmap — supports label existence checks
	IndexKindHash              // hash exact-match — supports (prop == value) predicates
	IndexKindBTree             // B+ tree range — supports (lo <= prop <= hi) predicates
)

Recognised IndexKind values. IndexKindUnknown is returned for any subscriber whose Kind() string is not in the set {"label","hash","btree"}.

type IndexRegistry

type IndexRegistry struct {
	// contains filtered or unexported fields
}

IndexRegistry is a per-query snapshot of the registered indexes. It is created once per plan cycle and held for the duration of planning; it must NOT be shared across concurrent plan cycles.

func NewIndexRegistry

func NewIndexRegistry(mgr *index.Manager) *IndexRegistry

NewIndexRegistry snapshots the current state of mgr into an IndexRegistry. If mgr is nil, an empty registry is returned.

func (*IndexRegistry) All

func (r *IndexRegistry) All() []IndexEntry

All returns all registered entries in unspecified order.

func (*IndexRegistry) ByKind

func (r *IndexRegistry) ByKind(k IndexKind) []IndexEntry

ByKind returns entries whose Kind matches k.

func (*IndexRegistry) HasBTree

func (r *IndexRegistry) HasBTree() bool

HasBTree reports whether at least one btree index is registered.

func (*IndexRegistry) HasHash

func (r *IndexRegistry) HasHash() bool

HasHash reports whether at least one hash index is registered.

func (*IndexRegistry) Lookup

func (r *IndexRegistry) Lookup(name string) (IndexEntry, bool)

Lookup returns the first entry whose Name matches name, and whether found.

type JoinNode

type JoinNode struct {
	NodeVar  string
	LabelID  uint32
	EqPropID uint32 // 0 = no eq predicate
}

JoinNode represents one pattern in a multi-pattern MATCH. The planner enumerates these into a left-deep Expand/Apply chain.

type LeftDeepPlan

type LeftDeepPlan struct {
	Order []JoinNode
	// Costs records the per-step estimated intermediate row count (for EXPLAIN).
	Costs []float64
}

LeftDeepPlan is the output of greedy join enumeration: an ordered list of JoinNodes from cheapest (leftmost) to most expensive, representing a left-deep join tree.

func EnumerateLeftDeep

func EnumerateLeftDeep(patterns []JoinNode, est Estimator, reg *IndexRegistry) LeftDeepPlan

EnumerateLeftDeep produces a greedy left-deep join order for patterns. The algorithm:

  1. For each JoinNode, compute its leaf scan cost via SelectScanStrategy.
  2. Choose the cheapest leaf as the first node.
  3. Iteratively extend: for each remaining node, estimate the intermediate cardinality as current_rows * selectivity(node) and pick the minimum.
  4. Repeat until all nodes are placed.

O(n²) for n patterns; sufficient for n ≤ 8 per CLAUDE.md.

type ScanDecision

type ScanDecision struct {
	Kind      ScanKind
	IndexName string  // non-empty for IndexSeek and IndexRangeScan
	Cost      float64 // estimated rows × per-row cost
}

ScanDecision records the outcome of SelectScanStrategy.

func SelectScanStrategy

func SelectScanStrategy(input ScanInput, est Estimator, reg *IndexRegistry) ScanDecision

SelectScanStrategy returns the cheapest scan strategy for the given input using est for cardinality and reg for available indexes. It never returns an error; when information is insufficient, it falls back to ScanKindAllNodes.

type ScanInput

type ScanInput struct {
	Label   string // may be empty (no label constraint)
	LabelID uint32
	// EqPropID is the property key for an equality predicate (prop == value).
	// Zero when no equality predicate is available.
	EqPropID uint32
	// RangePropID is the property key for a range predicate.
	// Zero when no range predicate is available.
	RangePropID uint32
}

ScanInput describes the predicate context for scan selection.

type ScanKind

type ScanKind uint8

ScanKind describes the chosen physical scan type.

const (
	ScanKindAllNodes       ScanKind = iota // AllNodesScan — no label or index
	ScanKindLabel                          // NodeByLabelScan — label bitmap
	ScanKindIndexSeek                      // NodeByIndexSeek — hash exact-match
	ScanKindIndexRangeScan                 // NodeByIndexRangeScan — btree range
)

Recognised ScanKind values ordered from most expensive (AllNodes) to cheapest (IndexSeek). The planner always selects the minimum-cost option given the available label constraint, equality predicate, and range predicate.

type StatsManager

type StatsManager struct {
	// contains filtered or unexported fields
}

StatsManager wraps an IndexEstimator with generation-aware cache invalidation. When the underlying graph is mutated (Tx commit) or a new CSR snapshot is published, the caller should call NotifyRotation. On the next planner call, the estimator's avg-out-degree cache is discarded and rebuilt lazily from updated statistics.

The recommended pattern is:

sm := plan.NewStatsManager(est, 100)
// … after Publisher.Publish or Tx commit:
sm.NotifyRotation()
// … after re-populating degree statistics:
sm.MarkSeen()

StatsManager is safe for concurrent use.

func NewStatsManager

func NewStatsManager(est *IndexEstimator, maxStaleCalls uint64) *StatsManager

NewStatsManager returns a StatsManager wrapping est.

maxStaleCalls: if non-zero, after this many planner calls with a stale cache the manager forces a synchronous cache reset (clearing IndexEstimator.degreeCache). Pass 0 to rely solely on NotifyRotation + MarkSeen for explicit refresh control.

func (*StatsManager) Estimator

func (sm *StatsManager) Estimator() *IndexEstimator

Estimator returns the wrapped IndexEstimator after checking whether the avg-out-degree cache is excessively stale. When the cache is stale and maxStaleCalls (set in NewStatsManager) is exceeded, the degree cache is cleared synchronously before the estimator is returned, and the generation counter is advanced to the current one.

Callers must call UpdateDegreeCache on the returned estimator and then MarkSeen to re-populate the avg-out-degree cache after a forced reset.

func (*StatsManager) IsFresh

func (sm *StatsManager) IsFresh() bool

IsFresh reports whether the stats cache is up-to-date with the latest generation.

func (*StatsManager) LastRefresh

func (sm *StatsManager) LastRefresh() time.Time

LastRefresh returns the time at which the cache was last refreshed, or the zero Time if it has never been refreshed.

func (*StatsManager) MarkSeen

func (sm *StatsManager) MarkSeen()

MarkSeen marks the current generation as observed, resetting the stale counter. Call this after the planner has re-populated the degree cache from updated graph statistics.

func (*StatsManager) NotifyRotation

func (sm *StatsManager) NotifyRotation()

NotifyRotation marks the statistics cache as stale. Call this after each Tx commit that modifies graph structure or after each CSR snapshot rotation via generation.Publisher.Publish.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL