Documentation
¶
Overview ¶
Package server implements the AICR System Configuration Recommendation API as defined in api/aicr/aicr-v1.yaml
This implementation follows production-grade distributed systems best practices:
Architecture ¶
The server implements a stateless HTTP API with the following key components:
- Request validation using regex patterns from OpenAPI spec
- Rate limiting using token bucket algorithm (golang.org/x/time/rate)
- Request ID tracking for distributed tracing
- Panic recovery for resilience
- Graceful shutdown handling
- Health and readiness probes for Kubernetes
Usage ¶
Basic server startup:
package main
import (
"github.com/NVIDIA/aicr/pkg/server"
)
func main() {
if err := server.Run(); err != nil {
panic(err)
}
}
Custom configuration:
config := server.DefaultConfig()
config.Port = 9090
config.RateLimit = 200 // 200 requests/sec
config.RateLimitBurst = 400
if err := server.RunWithConfig(config); err != nil {
panic(err)
}
API Endpoints ¶
GET /v1/recipe - Generate configuration recipe
Query parameters: - os: ubuntu, cos, any (default: any) - osv: OS version (e.g., 24.04, 22.04) - kernel: kernel version (e.g., 6.8, 5.15.0) - service: eks, gke, aks, self-managed, any (default: any) - k8s: Kubernetes version (e.g., 1.33, 1.32) - gpu: h100, gb200, a100, l40, any (default: any) - intent: training, inference, any (default: any) - context: true/false - include context metadata (default: false) Example: curl "http://localhost:8080/v1/recipe?os=ubuntu&osv=24.04&gpu=h100&intent=training"
GET /health - Health check (for liveness probe)
Always returns 200 OK with {"status": "healthy", "timestamp": "..."}
GET /ready - Readiness check (for readiness probe)
Returns 200 OK when ready, 503 when not ready
Observability ¶
Request ID Tracking:
All requests accept an optional X-Request-Id header (UUID format). If not provided, the server generates one automatically. The request ID is returned in the X-Request-Id response header and included in all error responses for tracing.
Rate Limiting:
Response headers indicate rate limit status: X-RateLimit-Limit: Total requests allowed per window X-RateLimit-Remaining: Requests remaining in current window X-RateLimit-Reset: Unix timestamp when window resets When rate limited, returns 429 with Retry-After header.
Cache Headers:
Recommendation responses include Cache-Control headers for CDN/client caching: Cache-Control: public, max-age=300
Error Handling ¶
All errors return a consistent JSON structure:
{
"code": "INVALID_PARAMETER",
"message": "invalid osFamily: must be one of Ubuntu, RHEL, ALL",
"details": {"request": {...}},
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2025-12-22T12:00:00Z",
"retryable": false
}
Error codes:
- INVALID_PARAMETER: Invalid request parameter (400)
- INVALID_JSON: Malformed JSON payload (400)
- NO_MATCHING_RULE: No recommendation found (404)
- RATE_LIMIT_EXCEEDED: Too many requests (429)
- INTERNAL_ERROR: Server error (500)
Deployment ¶
Kubernetes deployment example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: aicr-recommendation-api
spec:
replicas: 3
selector:
matchLabels:
app: aicr-recommendation-api
template:
metadata:
labels:
app: aicr-recommendation-api
spec:
containers:
- name: api
image: aicr-recommendation-api:latest
ports:
- containerPort: 8080
env:
- name: PORT
value: "8080"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1000m
memory: 512Mi
Performance ¶
Benchmarks (on M1 Mac):
BenchmarkGetRecommendations-8 50000 23000 ns/op 5000 B/op 80 allocs/op BenchmarkValidation-8 500000 2500 ns/op 500 B/op 10 allocs/op
The server is designed to handle thousands of requests per second with proper horizontal scaling. Rate limiting prevents resource exhaustion.
References ¶
- OpenAPI spec: api/aicr/aicr-v1.yaml
- Rate limiting: https://pkg.go.dev/golang.org/x/time/rate
- UUID generation: https://pkg.go.dev/github.com/google/uuid
- Error groups: https://pkg.go.dev/golang.org/x/sync/errgroup
- HTTP best practices: https://datatracker.ietf.org/doc/html/rfc7807
- Kubernetes probes: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Index ¶
- Constants
- func HTTPStatusFromCode(code aicrerrors.ErrorCode) int
- func SetAPIVersionHeader(w http.ResponseWriter, version string)
- func WriteError(w http.ResponseWriter, r *http.Request, statusCode int, ...)
- func WriteErrorFromErr(w http.ResponseWriter, r *http.Request, err error, fallbackMessage string, ...)
- type Config
- type ErrorResponse
- type HealthResponse
- type Option
- type Server
Constants ¶
const (
// DefaultAPIVersion is the default API version if none is negotiated
DefaultAPIVersion = "v1"
)
Variables ¶
This section is empty.
Functions ¶
func HTTPStatusFromCode ¶
func HTTPStatusFromCode(code aicrerrors.ErrorCode) int
HTTPStatusFromCode maps a canonical error code to an HTTP status. This keeps transport-layer semantics centralized.
func SetAPIVersionHeader ¶
func SetAPIVersionHeader(w http.ResponseWriter, version string)
SetAPIVersionHeader sets the API version header in the response. This helps clients understand which version of the API is being used.
func WriteError ¶
func WriteError(w http.ResponseWriter, r *http.Request, statusCode int, code aicrerrors.ErrorCode, message string, retryable bool, details map[string]any)
writeError writes error response
func WriteErrorFromErr ¶
func WriteErrorFromErr(w http.ResponseWriter, r *http.Request, err error, fallbackMessage string, extraDetails map[string]any)
WriteErrorFromErr writes an ErrorResponse based on a canonical structured error. If err is not a *errors.StructuredError, it falls back to INTERNAL.
Types ¶
type Config ¶
type Config struct {
// Server identity
Name string
Version string
// Additional Handlers to be added to the server
Handlers map[string]http.HandlerFunc
// Server configuration
Address string
Port int
// Rate limiting configuration
RateLimit rate.Limit // requests per second
RateLimitBurst int // burst size
// Timeouts
ReadTimeout time.Duration
WriteTimeout time.Duration
IdleTimeout time.Duration
ShutdownTimeout time.Duration
}
Config holds server configuration
type ErrorResponse ¶
type ErrorResponse struct {
Code string `json:"code" yaml:"code"`
Message string `json:"message" yaml:"message"`
Details map[string]any `json:"details,omitempty" yaml:"details,omitempty"`
RequestID string `json:"requestId" yaml:"requestId"`
Timestamp time.Time `json:"timestamp" yaml:"timestamp"`
Retryable bool `json:"retryable" yaml:"retryable"`
}
ErrorResponse represents error responses as per OpenAPI spec
type HealthResponse ¶
type HealthResponse struct {
Status string `json:"status" yaml:"status"`
Timestamp time.Time `json:"timestamp" yaml:"timestamp"`
Reason string `json:"reason,omitempty" yaml:"reason,omitempty"`
}
HealthResponse represents health check response
type Option ¶
type Option func(*Server)
Option is a functional option for configuring Server instances.
func WithConfig ¶
WithConfig returns an Option that sets a custom configuration for the Server.
func WithHandler ¶
func WithHandler(handlers map[string]http.HandlerFunc) Option
WithHandler returns an Option that adds custom HTTP handlers to the server. The map keys are URL paths and values are the corresponding handler functions.
func WithVersion ¶
WithVersion returns an Option that sets the server version in the configuration.
type Server ¶
type Server struct {
// contains filtered or unexported fields
}
Server represents the HTTP server for handling API requests. It includes rate limiting, health checks, metrics, and graceful shutdown capabilities.
func New ¶
New creates a new Server instance with the provided functional options. It parses environment configuration, sets up rate limiting, and configures the HTTP server with health checks, metrics, and custom handlers.
func (*Server) Run ¶
RunWithConfig starts the server with custom configuration and graceful shutdown handling.