Documentation
¶
Overview ¶
Package plan provides the cost-based planner for the Cypher executor. This file implements selectivity estimation used by the scan-strategy selection rule (task-283).
When the underlying graph can rotate (new CSR snapshot published via generation.Publisher or a Tx commit that alters graph structure), wrap an IndexEstimator with a StatsManager. StatsManager provides generation-aware cache invalidation so that avg-out-degree estimates remain coherent with the live graph without requiring callers to manage the estimator's degree cache directly.
Index ¶
- Variables
- func CacheKey(query string, paramTypes []string) [32]byte
- type Cache
- type CacheStats
- type Estimator
- type ExpandDecision
- type ExpandDirection
- type IndexEntry
- type IndexEstimator
- func (e *IndexEstimator) AvgOutDegree(srcLabel, relType, dstLabel uint32) float64
- func (e *IndexEstimator) BTreeRangeCount(_ uint32, _, _ string) uint64
- func (e *IndexEstimator) HashLookupCount(_ uint32, _ any) uint64
- func (e *IndexEstimator) LabelCount(lbl uint32) uint64
- func (e *IndexEstimator) UpdateDegreeCache(srcLabel, relType, dstLabel uint32, avg float64)
- type IndexKind
- type IndexRegistry
- type JoinNode
- type LeftDeepPlan
- type ScanDecision
- type ScanInput
- type ScanKind
- type StatsManager
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var PerRowCost = map[ScanKind]float64{ ScanKindAllNodes: 1.0, ScanKindLabel: 0.5, ScanKindIndexSeek: 0.05, ScanKindIndexRangeScan: 0.1, }
PerRowCost is the assumed relative cost per row for each scan type. These are heuristic weights; they should be tuned in production via benchmarks.
Functions ¶
func CacheKey ¶
CacheKey computes the SHA-256 cache key for a query string and a slice of parameter type names. The parameter names are sorted before hashing so the key is independent of the slice order.
The key layout is:
uint32 big-endian query length | query bytes | sorted param types (each prefixed with uint32 big-endian length)
This makes collisions between a query with zero params and a query whose string happens to equal another's query+param encoding impossible.
Example ¶
ExampleCacheKey computes a stable plan-cache key for a query and its parameter type names. The key is independent of parameter order.
package main
import (
"fmt"
"github.com/FlavioCFOliveira/GoGraph/cypher/plan"
)
func main() {
a := plan.CacheKey("MATCH (n) RETURN n", []string{"Integer", "String"})
b := plan.CacheKey("MATCH (n) RETURN n", []string{"String", "Integer"})
// Same query and the same param types (in any order) hash identically.
fmt.Println("order-independent:", a == b)
c := plan.CacheKey("MATCH (n:Person) RETURN n", nil)
fmt.Println("different query differs:", a != c)
}
Output: order-independent: true different query differs: true
Types ¶
type Cache ¶
type Cache struct {
// contains filtered or unexported fields
}
Cache is a thread-safe LRU plan cache.
Key: a [32]byte SHA-256 digest (see CacheKey). Value: the logical plan stored as any; the cache does not inspect values.
Metrics (hits, misses, evictions) are exposed as atomic counters readable without acquiring the main lock via Cache.Stats.
Cache is safe for concurrent use.
func NewCache ¶
NewCache creates a Cache with the given capacity. If capacity < 1 it defaults to 1024.
func (*Cache) Clear ¶
func (c *Cache) Clear()
Clear removes all entries and resets the hit/miss/eviction counters.
func (*Cache) Get ¶
Get looks up key in the cache. If found it moves the entry to the most-recently-used position and returns (value, true). Otherwise it returns (nil, false).
func (*Cache) Put ¶
Put inserts or updates the entry for key with the given value. If inserting causes the cache to exceed its capacity, the least-recently-used entry is evicted.
func (*Cache) Stats ¶
func (c *Cache) Stats() CacheStats
Stats returns a point-in-time snapshot of cache metrics.
type CacheStats ¶
type CacheStats struct {
// Hits is the total number of successful Get lookups since the cache was
// created (or last [Cache.Clear]ed).
Hits uint64
// Misses is the total number of unsuccessful Get lookups.
Misses uint64
// Evictions is the total number of entries expelled by the LRU policy.
Evictions uint64
// Size is the current number of entries held in the cache.
Size int
// Cap is the maximum number of entries the cache can hold.
Cap int
}
CacheStats holds a point-in-time snapshot of Cache metrics.
type Estimator ¶
type Estimator interface {
// LabelCount returns the number of nodes carrying the given label.
LabelCount(label uint32) uint64
// HashLookupCount returns the estimated number of nodes carrying
// (property == value) for an exact-match hash index. Returns 0 when
// no hash index is registered for the property.
HashLookupCount(property uint32, value any) uint64
// BTreeRangeCount returns the estimated number of nodes in the range
// [lo, hi] for a B+ tree index. Returns 0 when no btree index is
// registered for the property.
BTreeRangeCount(property uint32, lo, hi string) uint64
// AvgOutDegree returns the average out-degree of nodes carrying
// srcLabel, following edges of relType to nodes carrying dstLabel.
// Returns 1.0 when insufficient data is available.
AvgOutDegree(srcLabel, relType, dstLabel uint32) float64
}
Estimator provides cardinality estimates for physical plan operators. All methods return uint64 row-count estimates; callers treat 0 as "unknown, assume 1".
type ExpandDecision ¶
type ExpandDecision struct {
Dir ExpandDirection
EstimatedFrontier float64 // estimated intermediate rows
}
ExpandDecision records the result of SelectExpandDirection.
func SelectExpandDirection ¶
func SelectExpandDirection(srcLabelID, relTypeID, dstLabelID uint32, est Estimator) ExpandDecision
SelectExpandDirection chooses the cheaper traversal direction for an undirected (or undirected-by-preference) relationship. When the dst-side label is more selective than the src-side, expand IN; otherwise expand OUT.
srcLabelID and dstLabelID may be 0 (no label constraint). relTypeID is the interned relationship type ID; 0 means any type.
Example ¶
ExampleSelectExpandDirection shows the planner choosing to expand from the more selective (smaller) side of an undirected relationship. Here the dst label has far fewer nodes, so the planner expands IN.
package main
import (
"fmt"
"github.com/FlavioCFOliveira/GoGraph/cypher/plan"
)
// fixedEstimator is a minimal Estimator returning canned cardinalities so the
// example output is deterministic.
type fixedEstimator struct {
labelCounts map[uint32]uint64
}
func (e fixedEstimator) LabelCount(label uint32) uint64 { return e.labelCounts[label] }
func (e fixedEstimator) HashLookupCount(uint32, any) uint64 { return 0 }
func (e fixedEstimator) BTreeRangeCount(uint32, string, string) uint64 { return 0 }
func (e fixedEstimator) AvgOutDegree(uint32, uint32, uint32) float64 { return 4.0 }
func main() {
const (
srcLabel uint32 = 1
relType uint32 = 10
dstLabel uint32 = 2
)
est := fixedEstimator{labelCounts: map[uint32]uint64{
srcLabel: 10_000, // many source nodes
dstLabel: 100, // few destination nodes
}}
decision := plan.SelectExpandDirection(srcLabel, relType, dstLabel, est)
dir := "OUT"
if decision.Dir == plan.ExpandIn {
dir = "IN"
}
fmt.Printf("direction=%s frontier=%.0f\n", dir, decision.EstimatedFrontier)
}
Output: direction=IN frontier=400
type ExpandDirection ¶
type ExpandDirection uint8
ExpandDirection is the chosen traversal direction.
const ( ExpandOut ExpandDirection = iota // expand from src → OUT neighbours ExpandIn // expand from dst → IN neighbours (reverse scan) )
Recognised ExpandDirection values. When the two frontier estimates are equal, ExpandOut is preferred (stable tie-break).
type IndexEntry ¶
type IndexEntry struct {
Name string
Kind IndexKind
// Subscriber is the raw subscriber; planner rules can type-assert to
// obtain kind-specific query APIs (e.g. hash.Lookup, btree.Range).
Subscriber index.Subscriber
}
IndexEntry describes one registered index from the planner's perspective.
type IndexEstimator ¶
type IndexEstimator struct {
// contains filtered or unexported fields
}
IndexEstimator implements Estimator by consulting the index.Manager, the label.Index, and an avg-out-degree cache computed from the CSR.
IndexEstimator is safe for concurrent use.
func NewIndexEstimator ¶
func NewIndexEstimator(labelIdx *label.Index, mgr *index.Manager) *IndexEstimator
NewIndexEstimator returns an estimator backed by the given label index and secondary index manager. Both may be nil (returns a zero estimator that always returns 0/1.0).
func (*IndexEstimator) AvgOutDegree ¶
func (e *IndexEstimator) AvgOutDegree(srcLabel, relType, dstLabel uint32) float64
AvgOutDegree returns the cached average out-degree for the triple (srcLabel, relType, dstLabel). Returns 1.0 when no cached value is available. The cache is populated externally via UpdateDegreeCache after a new CSR snapshot is built.
func (*IndexEstimator) BTreeRangeCount ¶
func (e *IndexEstimator) BTreeRangeCount(_ uint32, _, _ string) uint64
BTreeRangeCount returns an estimate of nodes whose indexed property value falls within [lo, hi]. The estimate applies a fixed 30% selectivity over the per-subscriber distinct-value count:
ceil(distinctValues * 0.30)
When a subscriber implements btreeDistinctValuer (btree.Index[V]), that count is used. When no such subscriber exists, the fallback is LabelCount(0) / 10 (10% global selectivity).
lo and hi are accepted for interface conformance; future implementations may use them to compute a tighter histogram bucket.
func (*IndexEstimator) HashLookupCount ¶
func (e *IndexEstimator) HashLookupCount(_ uint32, _ any) uint64
HashLookupCount returns an estimate of nodes where the indexed property equals value. The estimate is computed as:
totalNodes / max(1, distinctValues)
where totalNodes is the sum of all label-zero counts (i.e. the count stored under label 0 is treated as total-node sentinel; when not available the fallback is LabelCount(0)). When no subscriber implements the DistinctValues capability, 0 is returned.
The property parameter is accepted for interface conformance and future routing once indexes expose their property IDs; currently the estimator consults the first subscriber that provides a DistinctValues() uint64 capability.
func (*IndexEstimator) LabelCount ¶
func (e *IndexEstimator) LabelCount(lbl uint32) uint64
LabelCount returns the number of nodes carrying the given label. Returns 0 when the label index is nil or the label is unknown.
func (*IndexEstimator) UpdateDegreeCache ¶
func (e *IndexEstimator) UpdateDegreeCache(srcLabel, relType, dstLabel uint32, avg float64)
UpdateDegreeCache updates the avg-out-degree cache for the triple (srcLabel, relType, dstLabel). This is called by snapshot rotation hooks after a new CSR is built.
type IndexKind ¶
type IndexKind uint8
IndexKind classifies the physical implementation of a registered index as seen by the planner.
const ( IndexKindUnknown IndexKind = iota IndexKindLabel // label bitmap — supports label existence checks IndexKindHash // hash exact-match — supports (prop == value) predicates IndexKindBTree // B+ tree range — supports (lo <= prop <= hi) predicates )
Recognised IndexKind values. IndexKindUnknown is returned for any subscriber whose Kind() string is not in the set {"label","hash","btree"}.
type IndexRegistry ¶
type IndexRegistry struct {
// contains filtered or unexported fields
}
IndexRegistry is a per-query snapshot of the registered indexes. It is created once per plan cycle and held for the duration of planning; it must NOT be shared across concurrent plan cycles.
func NewIndexRegistry ¶
func NewIndexRegistry(mgr *index.Manager) *IndexRegistry
NewIndexRegistry snapshots the current state of mgr into an IndexRegistry. If mgr is nil, an empty registry is returned.
func (*IndexRegistry) All ¶
func (r *IndexRegistry) All() []IndexEntry
All returns all registered entries in unspecified order.
func (*IndexRegistry) ByKind ¶
func (r *IndexRegistry) ByKind(k IndexKind) []IndexEntry
ByKind returns entries whose Kind matches k.
func (*IndexRegistry) HasBTree ¶
func (r *IndexRegistry) HasBTree() bool
HasBTree reports whether at least one btree index is registered.
func (*IndexRegistry) HasHash ¶
func (r *IndexRegistry) HasHash() bool
HasHash reports whether at least one hash index is registered.
func (*IndexRegistry) Lookup ¶
func (r *IndexRegistry) Lookup(name string) (IndexEntry, bool)
Lookup returns the first entry whose Name matches name, and whether found.
type JoinNode ¶
JoinNode represents one pattern in a multi-pattern MATCH. The planner enumerates these into a left-deep Expand/Apply chain.
type LeftDeepPlan ¶
type LeftDeepPlan struct {
Order []JoinNode
// Costs records the per-step estimated intermediate row count (for EXPLAIN).
Costs []float64
}
LeftDeepPlan is the output of greedy join enumeration: an ordered list of JoinNodes from cheapest (leftmost) to most expensive, representing a left-deep join tree.
func EnumerateLeftDeep ¶
func EnumerateLeftDeep(patterns []JoinNode, est Estimator, reg *IndexRegistry) LeftDeepPlan
EnumerateLeftDeep produces a greedy left-deep join order for patterns. The algorithm:
- For each JoinNode, compute its leaf scan cost via SelectScanStrategy.
- Choose the cheapest leaf as the first node.
- Iteratively extend: for each remaining node, estimate the intermediate cardinality as current_rows * selectivity(node) and pick the minimum.
- Repeat until all nodes are placed.
O(n²) for n patterns; sufficient for n ≤ 8 per CLAUDE.md.
type ScanDecision ¶
type ScanDecision struct {
Kind ScanKind
IndexName string // non-empty for IndexSeek and IndexRangeScan
Cost float64 // estimated rows × per-row cost
}
ScanDecision records the outcome of SelectScanStrategy.
func SelectScanStrategy ¶
func SelectScanStrategy(input ScanInput, est Estimator, reg *IndexRegistry) ScanDecision
SelectScanStrategy returns the cheapest scan strategy for the given input using est for cardinality and reg for available indexes. It never returns an error; when information is insufficient, it falls back to ScanKindAllNodes.
type ScanInput ¶
type ScanInput struct {
Label string // may be empty (no label constraint)
LabelID uint32
// EqPropID is the property key for an equality predicate (prop == value).
// Zero when no equality predicate is available.
EqPropID uint32
// RangePropID is the property key for a range predicate.
// Zero when no range predicate is available.
RangePropID uint32
}
ScanInput describes the predicate context for scan selection.
type ScanKind ¶
type ScanKind uint8
ScanKind describes the chosen physical scan type.
const ( ScanKindAllNodes ScanKind = iota // AllNodesScan — no label or index ScanKindLabel // NodeByLabelScan — label bitmap ScanKindIndexSeek // NodeByIndexSeek — hash exact-match ScanKindIndexRangeScan // NodeByIndexRangeScan — btree range )
Recognised ScanKind values ordered from most expensive (AllNodes) to cheapest (IndexSeek). The planner always selects the minimum-cost option given the available label constraint, equality predicate, and range predicate.
type StatsManager ¶
type StatsManager struct {
// contains filtered or unexported fields
}
StatsManager wraps an IndexEstimator with generation-aware cache invalidation. When the underlying graph is mutated (Tx commit) or a new CSR snapshot is published, the caller should call NotifyRotation. On the next planner call, the estimator's avg-out-degree cache is discarded and rebuilt lazily from updated statistics.
The recommended pattern is:
sm := plan.NewStatsManager(est, 100) // … after Publisher.Publish or Tx commit: sm.NotifyRotation() // … after re-populating degree statistics: sm.MarkSeen()
StatsManager is safe for concurrent use.
func NewStatsManager ¶
func NewStatsManager(est *IndexEstimator, maxStaleCalls uint64) *StatsManager
NewStatsManager returns a StatsManager wrapping est.
maxStaleCalls: if non-zero, after this many planner calls with a stale cache the manager forces a synchronous cache reset (clearing IndexEstimator.degreeCache). Pass 0 to rely solely on NotifyRotation + MarkSeen for explicit refresh control.
func (*StatsManager) Estimator ¶
func (sm *StatsManager) Estimator() *IndexEstimator
Estimator returns the wrapped IndexEstimator after checking whether the avg-out-degree cache is excessively stale. When the cache is stale and maxStaleCalls (set in NewStatsManager) is exceeded, the degree cache is cleared synchronously before the estimator is returned, and the generation counter is advanced to the current one.
Callers must call UpdateDegreeCache on the returned estimator and then MarkSeen to re-populate the avg-out-degree cache after a forced reset.
func (*StatsManager) IsFresh ¶
func (sm *StatsManager) IsFresh() bool
IsFresh reports whether the stats cache is up-to-date with the latest generation.
func (*StatsManager) LastRefresh ¶
func (sm *StatsManager) LastRefresh() time.Time
LastRefresh returns the time at which the cache was last refreshed, or the zero Time if it has never been refreshed.
func (*StatsManager) MarkSeen ¶
func (sm *StatsManager) MarkSeen()
MarkSeen marks the current generation as observed, resetting the stale counter. Call this after the planner has re-populated the degree cache from updated graph statistics.
func (*StatsManager) NotifyRotation ¶
func (sm *StatsManager) NotifyRotation()
NotifyRotation marks the statistics cache as stale. Call this after each Tx commit that modifies graph structure or after each CSR snapshot rotation via generation.Publisher.Publish.