graphcontroller

package
v0.0.0-...-3123745 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 7, 2026 License: Apache-2.0 Imports: 43 Imported by: 0

Documentation

Overview

apply.go contains the cluster-mutating primitives for the Graph controller: resource apply via SSA (Template and Patch), field release, and SSA configuration (field manager identity, third-party manager detection).

The reconcile loop in controller.go dispatches nodes; this file executes against the Kubernetes API server. All direct Patch, Delete, and ownership verification calls are here — controller.go does not touch the cluster directly for resource management.

Prune logic (applied-set diff, deletion ordering) lives in prune.go.

compile.go orchestrates compilation — turning a GraphRevision's spec into a compiled graph artifact. Always compiles from scratch every reconcile. No caching of compilation output across cycles.

Package graphcontroller implements a proof-of-concept Graph controller.

The controller watches Graph custom resources and reconciles them in two phases:

Phase 1 — Revision management:

  1. Ensure a GraphRevision exists for the current Graph spec generation
  2. If the spec changed, materialize + compile + create a new revision
  3. Manage revision activation (old stays Active until new revision converges)

Phase 2 — Node reconciliation (from the active revision):

  1. Parse the active revision's spec into a DAG
  2. Walk the DAG in topological order, evaluating pre-compiled CEL programs
  3. Apply evaluated templates via server-side apply
  4. Prune resources removed between revisions
  5. Update revision and Graph status

delete.go implements Graph deletion — ordered teardown, finalization, and patch field release. Separated from controller.go because deletion is a distinct lifecycle protocol with its own invariants (ordering, finalizers, orphan semantics) and does not share walk state with the normal reconcile path.

Per 005-reconciliation.md: "When a Graph is deleted, all nodes become prune candidates; full prune algorithm runs." The actual deletion logic lives in pruneResources (prune.go); this file collects the managed key set and delegates.

errors.go classifies Kubernetes API errors into plan states.

Client errors (4xx) → NodeError. Server errors (5xx/timeout/network) → NodeSystemError. Non-API errors (CEL, template) → NodeError. The plan state flows into the Graph's status condition, giving operators a clean signal for triage.

eval.go is the template evaluation engine. It walks value trees, evaluates ${...} CEL expressions, strips $${...} deferred expressions, checks readyWhen and includeWhen conditions, and extracts dependency references for DAG building.

All CEL evaluation goes through the evaluator struct, which holds a reference to the pre-compiled compiledGraph. No CEL compilation happens in this file — programs are looked up from the compiled graph and evaluated against the current scope.

finalization.go implements the finalization state machine described in 005-reconciliation.md § Finalization.

Finalization gates target deletion on successful creation and readiness of "finalizer" resources. It runs as a separate phase BEFORE the prune walk's deletion decisions. The prune walk receives completedTargets (safe to delete) and protectedKeys (must not prune) from finalization.

State is derived from cluster state each cycle (stateless):

  • GET the finalizer resource. If NotFound → create (Creating).
  • If exists but not ready → WaitingReady.
  • If exists and ready → Complete.

No cross-cycle state needed — idempotent by design.

foreach.go implements forEach node expansion — stamping a template once per item in a collection.

instance.go holds per-Graph mutable reconcile-time state.

In the stateless reconciler model, instanceState is ephemeral — created fresh every reconcile from compilation output. No cross-cycle state is preserved. The InstanceMap exists solely to support the onNewType callback (requeue all graphs when a new CRD is observed).

metric.go implements the metric: node type — a prometheus metric driven by CEL evaluation. The metric value is an explicit CEL expression that must evaluate to a number. Labels are direct CEL expressions evaluated in normal scope — no implicit iteration. Propagation-driven: re-evaluates when upstream dependencies change.

node.go contains the per-node reconciliation handlers, one per node type. The coordinator in controller.go dispatches nodes here; these handlers evaluate templates and call into apply.go for cluster mutations.

propagate.go implements the DAG propagation algorithm for a single reconcile cycle.

The propagation evaluates nodes sequentially in topological order. Each node is evaluated after all of its hard dependencies have been resolved. The propagation is the single writer to shared state (scope, plan, applied keys).

Per 005-reconciliation.md § Reconcile: "The coordinator walks the DAG in topological order, dispatching each node when its dependencies are resolved."

prune.go computes and executes the prune walk — removing resources no longer in the applied set. Prune candidates are the diff between previous and current applied keys. Deletion follows reverse topological order from the DAG.

pruneResources is the single code path for both:

  • Normal prune: candidates = keys in previous but not current
  • Teardown: candidates = all managed keys, currentKeys = empty

The prune walk handles: template deletion, patch field release, contributor- aware blocking (third-party field managers), finalization sequencing, and dynamic resource discovery for forEach/CEL-named resources.

resourcekey.go defines the applied-set key format for managed resources.

Resource keys are identity-only strings: group/version/Kind/namespace/name. Dispatch metadata (NodeType, HasStatus) travels in the Applied struct alongside the key, separating resource identity from cleanup semantics.

Per 003-ownership.md § Applied Set.

revision.go implements GraphRevision — an immutable, namespace-scoped snapshot of a Graph's spec.

Every structural mutation to a Graph (spec generation bump) produces a new GraphRevision. The revision contains the Graph's node declarations as authored — CEL ${...} expressions are preserved as-is. Ownership labels and template hashes are injected at apply time, not at snapshot time.

A revision can only be created if the spec compiles successfully (CEL + DAG). Its existence proves the spec was structurally valid at creation time.

The controller reconciles managed resources from the active revision, not the Graph spec directly. This decouples the operational truth (what the controller is converging toward) from the authoring surface (what the user wrote).

setup.go initializes the Graph controller — watch infrastructure, schema resolution, reconciler struct, and controller-runtime registration. Called once from cmd/main.go at startup. Separated from controller.go because initialization and reconciliation are different lifecycles.

walkstate.go defines per-reconcile node and plan state for DAG traversal.

NodeState tracks the reconcile-time outcome for each node. PlanState aggregates node states across a single reconcile cycle. These are created fresh per reconcile and do not persist across cycles.

Per 005-reconciliation.md § Node States.

Index

Constants

View Source
const (

	// DefaultMaxConcurrentReconciles is the number of reconcile workers.
	// Multiple workers keep the API server busy — each reconcile does
	// SSA applies, GETs, and informer syncs that can block. 16 is a
	// heuristic — high enough to keep a typical API server busy under
	// normal graph workloads, tune if needed.
	DefaultMaxConcurrentReconciles = 16
)

Variables

View Source
var ErrEvaluation = errors.New("evaluation error")

ErrEvaluation indicates that the error originates from a non-API operation: CEL evaluation, template rendering, JSON marshaling, or other deterministic local computation. Errors wrapped with this sentinel are classified as NodeError by classifyAPIError, even if their message text happens to contain network-like patterns (e.g., "unexpected EOF" from malformed JSON). Without this, the string-based network error pattern matcher would misclassify them as NodeSystemError, triggering 5-second retry for errors that can only resolve via propagation or spec change.

View Source
var ErrFieldConflict = errors.New("field conflict")

ErrFieldConflict indicates that an SSA apply received a 409 Conflict because another actor has taken ownership of fields the controller manages. This is a permanent error for the resource until the external actor releases the field or the Graph spec changes to no longer write that field.

View Source
var ErrPending = errors.New("data pending")

ErrPending indicates that CEL evaluation failed because required data is not yet available (e.g., a resource's status field hasn't been populated). This is a retryable condition — the controller should requeue and try again.

View Source
var ErrReadyWhenFailed = errors.New("readyWhen evaluation failed")

ErrReadyWhenFailed indicates that a readyWhen expression failed to evaluate due to a permanent expression error (not data pending, not a transient condition). Per 001-graph.md: "readyWhen is a health signal — it does not gate downstream execution." The coordinator classifies this as NodeNotReady (not NodeError) so dependents proceed. The underlying error is preserved in the chain for logging and status reporting.

View Source
var ErrWaitingForReadiness = errors.New("waiting for readiness")

ErrWaitingForReadiness indicates that a resource exists but hasn't satisfied its readyWhen conditions yet. Downstream resources should wait.

View Source
var GraphGVK = schema.GroupVersionKind{
	Group:   "experimental.kro.run",
	Version: "v1alpha1",
	Kind:    "Graph",
}
View Source
var GraphRevisionGVK = schema.GroupVersionKind{
	Group:   "experimental.kro.run",
	Version: "v1alpha1",
	Kind:    "GraphRevision",
}

GraphRevisionGVK is the GVK for the GraphRevision custom resource.

Functions

func ListRevisionsForTest

func ListRevisionsForTest(ctx context.Context, c client.Client, graphName, namespace string) ([]*unstructured.Unstructured, error)

ListRevisionsForTest exports listRevisions for the test package (package graphcontroller_test). Necessary because the tests are in a separate package for black-box testing.

Types

type Applied

type Applied struct {
	Key       string
	NodeType  graphpkg.NodeType
	HasStatus bool
}

Applied identifies a resource the controller has written to. Key is identity (group/version/Kind/namespace/name); NodeType and HasStatus are cleanup dispatch metadata — NodeType determines delete vs release, HasStatus determines whether release must also target the status subresource.

type GraphReconciler

type GraphReconciler struct {
	Client         client.Client
	APIReader      client.Reader              // direct API server reader — bypasses cache for managed resources
	SchemaResolver resolver.SchemaResolver    // nil = all resource nodes fall back to dyn
	SchemaGen      *compiler.SchemaGeneration // nil = no generation tracking; never triggers recompilation
	Watcher        *watches.WatchCoordinator  // nil = no dynamic watches (backward compat with existing tests)
	Caches         *InstanceMap               // per-revision compiled expression caches
	Scope          *scopeResolver             // nil = unknown scope; staticResourceKey falls back to namespace-substitution heuristic
	Metrics        *MetricStore               // per-controller metric store; nil = metrics disabled
}

GraphReconciler reconciles Graph objects.

func (*GraphReconciler) Reconcile

func (r *GraphReconciler) Reconcile(ctx context.Context, req ctrl.Request) (result ctrl.Result, reconcileErr error)

type InstanceMap

type InstanceMap struct {
	// contains filtered or unexported fields
}

InstanceMap is a concurrent-safe map of per-instance state keyed by namespace/revision-name.

func SetupWithManager

func SetupWithManager(mgr ctrl.Manager, restConfig *rest.Config, maxWorkers int) (shutdown func(), instances *InstanceMap, err error)

SetupWithManager registers the Graph controller with a controller-runtime manager. This is the single setup path for both production and tests. It creates the watch infrastructure internally — callers provide the manager and a rest.Config (needed for the metadata client).

maxWorkers controls MaxConcurrentReconciles. Values ≤ 0 default to 4. Multiple workers prevent watch event starvation under load — with a single worker, dynamic watch events can't be delivered while it's busy processing another Graph's reconcile.

Returns a shutdown function that stops the watch manager. The caller must invoke this on teardown.

func (*InstanceMap) CacheSizes

func (m *InstanceMap) CacheSizes() int

CacheSizes returns the number of cached instance states.

type MetricStore

type MetricStore struct {
	// contains filtered or unexported fields
}

MetricStore manages prometheus GaugeVec registrations across Graph instances. Metrics persist across reconcile cycles (stateless evaluator, stateful metric). Thread-safe — multiple reconcile workers may update concurrently.

Each metric gets its own prometheus.Registry. This isolates label mutations: when a metric's label keys change, its registry is replaced without affecting other metrics or the controller's infrastructure metrics. The MetricStore implements prometheus.Collector — the metrics server collects from it to expose all graph metrics on /metrics.

func NewMetricStore

func NewMetricStore() *MetricStore

NewMetricStore creates a MetricStore. Register it as a prometheus.Collector with the metrics server to expose graph metrics on /metrics.

func (*MetricStore) Cleanup

func (s *MetricStore) Cleanup(graphKey string)

Cleanup removes all metrics registered for a given Graph key. Called when a Graph is deleted.

func (*MetricStore) Collect

func (s *MetricStore) Collect(ch chan<- prometheus.Metric)

Collect implements prometheus.Collector. Snapshots gauge pointers under the lock, then collects outside the lock to minimize critical section duration.

func (*MetricStore) Describe

func (s *MetricStore) Describe(chan<- *prometheus.Desc)

Describe implements prometheus.Collector. Sends nothing — the MetricStore is an "unchecked" collector because the set of metrics is dynamic (metrics are created/destroyed at runtime as Graphs are reconciled).

func (*MetricStore) Gather

func (s *MetricStore) Gather() ([]*dto.MetricFamily, error)

Gather implements prometheus.Gatherer. Iterates all per-metric registries and merges their metric families. Used by unit tests to verify metric state.

func (*MetricStore) Remove

func (s *MetricStore) Remove(graphKey, metricName string)

Remove unregisters a single metric for a given Graph key. Called during prune when a metric node is removed from the spec or its metric name changes between revisions.

func (*MetricStore) Reset

func (s *MetricStore) Reset(graphKey, metricName string)

Reset clears all label series from a metric registered for a given Graph key and metric name. Called before forEach metric iteration to ensure stale dimensions from previous cycles are removed — only actively-emitted series survive each reconcile cycle.

type NodeState

type NodeState int

NodeState tracks the reconcile-time state of a single node.

const (
	// NodeUnvisited is the zero-value sentinel — "not yet processed by the
	// propagation." The dispatch guard (tryDispatch) uses != NodeUnvisited to detect nodes
	// that have already been assigned a state by the propagation or by inheritance.
	NodeUnvisited NodeState = iota

	NodePending     // Data not yet available (retryable)
	NodeReady       // Applied and readyWhen satisfied
	NodeNotReady    // Applied but readyWhen not satisfied
	NodeExcluded    // Definitive absence: excluded by includeWhen evaluating to false
	NodeBlocked     // Uncertain absence: dependency in error state
	NodeError       // Client request failed (4xx)
	NodeConflict    // SSA 409 — field ownership taken by another actor
	NodeSystemError // Server/infrastructure failure (5xx, timeout, network)

)

func (NodeState) String

func (s NodeState) String() string

String returns the human-readable name of the NodeState.

type PlanState

type PlanState struct {
	States map[string]NodeState
}

PlanState tracks the state of all nodes during a reconcile cycle.

func NewPlanState

func NewPlanState(dag *dagpkg.DAG) *PlanState

NewPlanState creates a fresh plan state with all nodes unvisited.

func (*PlanState) SetState

func (ps *PlanState) SetState(id string, state NodeState)

SetState records a node's authoritative state.

State does NOT propagate to dependents — the walk coordinator dispatches dependents explicitly via tryDispatch, which evaluates all dependencies with full precedence (Excluded > Blocked > Pending).

Previous versions eagerly propagated state via a first-wins flood fill. This violated precedence in diamond dependencies: if an Error parent propagated before an Excluded parent, the child was marked Blocked instead of Excluded — an incorrect classification that prevented pruning resources that should have been pruned.

func (*PlanState) Summary

func (ps *PlanState) Summary() PlanSummary

Summary returns aggregate state for status reporting.

type PlanSummary

type PlanSummary struct {
	HasPending     bool
	HasNotReady    bool
	HasBlocked     bool
	HasConflict    bool
	HasError       bool
	HasSystemError bool
	ReadyCount     int
}

PlanSummary holds aggregate state from a completed DAG propagation.

func (PlanSummary) HasUncertainty

func (s PlanSummary) HasUncertainty() bool

HasUncertainty reports whether any node state creates uncertainty about which resources should exist. Per 005-reconciliation.md: "Uncertain absence blocks pruning — the resource might reappear once the blocker resolves."

func (PlanSummary) IsClean

func (s PlanSummary) IsClean() bool

IsClean reports whether all nodes have converged with no errors or pending states. Used to determine if superseded revisions can be garbage collected.

Directories

Path Synopsis
cel.go contains CEL runtime integration: expression compilation/evaluation and error classification for distinguishing retryable "data pending" errors from expression bugs.
cel.go contains CEL runtime integration: expression compilation/evaluation and error classification for distinguishing retryable "data pending" errors from expression bugs.
labels.go defines the identity label scheme for managed resources.
labels.go defines the identity label scheme for managed resources.
coordinator.go implements WatchCoordinator — event routing from informers to the correct Graph(s) using scalar (name-based) and collection (selector-based) reverse indexes.
coordinator.go implements WatchCoordinator — event routing from informers to the correct Graph(s) using scalar (name-based) and collection (selector-based) reverse indexes.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL