Directories
¶
| Path | Synopsis |
|---|---|
|
api
|
|
|
v1alpha1
Package v1alpha1 contains API Schema definitions for the inferencecache v1alpha1 API group.
|
Package v1alpha1 contains API Schema definitions for the inferencecache v1alpha1 API group. |
|
cmd
|
|
|
controller
command
|
|
|
kvevent-subscriber
command
Command kvevent-subscriber runs as a sidecar next to a vLLM engine replica: it subscribes to the engine's KV-cache events over ZMQ and reports cache state to the inferencecache-server over gRPC.
|
Command kvevent-subscriber runs as a sidecar next to a vLLM engine replica: it subscribes to the engine's KV-cache events over ZMQ and reports cache state to the inferencecache-server over gRPC. |
|
server
command
|
|
|
hack
|
|
|
verify-samples
command
Package main is the verify-samples CI helper.
|
Package main is the verify-samples CI helper. |
|
internal
|
|
|
webhook/pod
Package pod is the controller-owned mutating admission webhook that auto- wires user-provided inference engine pods to a matching cache backend — either a managed backend the controller provisions (LMCache today) or an External backend whose lifecycle the operator owns.
|
Package pod is the controller-owned mutating admission webhook that auto- wires user-provided inference engine pods to a matching cache backend — either a managed backend the controller provisions (LMCache today) or an External backend whose lifecycle the operator owns. |
|
webhook/v1alpha1
Package v1alpha1 hosts the admission webhooks (defaulting + validating) for the operator-facing v1alpha1 CRDs: CacheBackend, CachePolicy, and CacheTenant.
|
Package v1alpha1 hosts the admission webhooks (defaulting + validating) for the operator-facing v1alpha1 CRDs: CacheBackend, CachePolicy, and CacheTenant. |
|
pkg
|
|
|
adapters/engine
Package engine is the KV-event subscriber.
|
Package engine is the KV-event subscriber. |
|
adapters/runtime
Package runtime is the controller-owned runtime-adapter seam: the plug-point that keeps engine-specific cache wiring out of the core CacheBackend reconciler.
|
Package runtime is the controller-owned runtime-adapter seam: the plug-point that keeps engine-specific cache wiring out of the core CacheBackend reconciler. |
|
adapters/runtime/external
Package external is the runtime adapter for CacheBackend{type: External}: the controller does NOT provision pods for the cache, the operator points the CR at a pre-existing remote cache they manage themselves, and the adapter wires engine pods to that endpoint with the same engine wire format the managed-LMCache path uses (see pkg/adapters/runtime/internal/enginewire).
|
Package external is the runtime adapter for CacheBackend{type: External}: the controller does NOT provision pods for the cache, the operator points the CR at a pre-existing remote cache they manage themselves, and the adapter wires engine pods to that endpoint with the same engine wire format the managed-LMCache path uses (see pkg/adapters/runtime/internal/enginewire). |
|
adapters/runtime/internal/enginewire
Package enginewire holds the engine-side wire format shared by every runtime adapter that fronts an LMCache-compatible cache (the in-tree vLLM+LMCache adapter and the External passthrough adapter today; future adapters that also speak the LMCache connector protocol can import it the same way).
|
Package enginewire holds the engine-side wire format shared by every runtime adapter that fronts an LMCache-compatible cache (the in-tree vLLM+LMCache adapter and the External passthrough adapter today; future adapters that also speak the LMCache connector protocol can import it the same way). |
|
index
Package index is part of inferencecache-server: the cluster cache-state aggregator (the CacheIndex), populated from engine KV events and queried by LookupRoute.
|
Package index is part of inferencecache-server: the cluster cache-state aggregator (the CacheIndex), populated from engine KV events and queried by LookupRoute. |
|
render
Package render is the mutable-slot prompt rendering engine (the "wedge"): it turns templated prompts into stable cache keys so a gateway's cache-aware routing matches on real prompts.
|
Package render is the mutable-slot prompt rendering engine (the "wedge"): it turns templated prompts into stable cache keys so a gateway's cache-aware routing matches on real prompts. |
|
server/auth
Package auth provides HTTP middleware that gates the policy server's internal controller-facing endpoints (/snapshot and /policy) on a Kubernetes ServiceAccount bearer token.
|
Package auth provides HTTP middleware that gates the policy server's internal controller-facing endpoints (/snapshot and /policy) on a Kubernetes ServiceAccount bearer token. |
Click to show internal directories.
Click to hide internal directories.