server

package

v0.8.0 Latest Latest Go to latest Published: Feb 27, 2026 License: Apache-2.0 Imports: 21 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/NVIDIA/aicr

Links

Open Source Insights

Documentation ¶

Overview ¶

Package server implements the AICR System Configuration Recommendation API as defined in api/aicr/aicr-v1.yaml

This implementation follows production-grade distributed systems best practices:

Architecture ¶

The server implements a stateless HTTP API with the following key components:

Request validation using regex patterns from OpenAPI spec
Rate limiting using token bucket algorithm (golang.org/x/time/rate)
Request ID tracking for distributed tracing
Panic recovery for resilience
Graceful shutdown handling
Health and readiness probes for Kubernetes

Usage ¶

Basic server startup:

package main

import (
    "github.com/NVIDIA/aicr/pkg/server"
)

func main() {
    if err := server.Run(); err != nil {
        panic(err)
    }
}

Custom configuration:

config := server.DefaultConfig()
config.Port = 9090
config.RateLimit = 200  // 200 requests/sec
config.RateLimitBurst = 400

if err := server.RunWithConfig(config); err != nil {
    panic(err)
}

API Endpoints ¶

GET /v1/recipe - Generate configuration recipe

Query parameters:
  - os: ubuntu, cos, any (default: any)
  - osv: OS version (e.g., 24.04, 22.04)
  - kernel: kernel version (e.g., 6.8, 5.15.0)
  - service: eks, gke, aks, self-managed, any (default: any)
  - k8s: Kubernetes version (e.g., 1.33, 1.32)
  - gpu: h100, gb200, a100, l40, any (default: any)
  - intent: training, inference, any (default: any)
  - context: true/false - include context metadata (default: false)

Example:
  curl "http://localhost:8080/v1/recipe?os=ubuntu&osv=24.04&gpu=h100&intent=training"

GET /health - Health check (for liveness probe)

Always returns 200 OK with {"status": "healthy", "timestamp": "..."}

GET /ready - Readiness check (for readiness probe)

Returns 200 OK when ready, 503 when not ready

Observability ¶

Request ID Tracking:

All requests accept an optional X-Request-Id header (UUID format).
If not provided, the server generates one automatically.
The request ID is returned in the X-Request-Id response header
and included in all error responses for tracing.

Rate Limiting:

Response headers indicate rate limit status:
  X-RateLimit-Limit: Total requests allowed per window
  X-RateLimit-Remaining: Requests remaining in current window
  X-RateLimit-Reset: Unix timestamp when window resets

When rate limited, returns 429 with Retry-After header.

Cache Headers:

Recommendation responses include Cache-Control headers for CDN/client caching:
  Cache-Control: public, max-age=300

Error Handling ¶

All errors return a consistent JSON structure:

{
  "code": "INVALID_PARAMETER",
  "message": "invalid osFamily: must be one of Ubuntu, RHEL, ALL",
  "details": {"request": {...}},
  "requestId": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2025-12-22T12:00:00Z",
  "retryable": false
}

Error codes:

INVALID_PARAMETER: Invalid request parameter (400)
INVALID_JSON: Malformed JSON payload (400)
NO_MATCHING_RULE: No recommendation found (404)
RATE_LIMIT_EXCEEDED: Too many requests (429)
INTERNAL_ERROR: Server error (500)

Deployment ¶

Kubernetes deployment example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aicr-recommendation-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: aicr-recommendation-api
  template:
    metadata:
      labels:
        app: aicr-recommendation-api
    spec:
      containers:
      - name: api
        image: aicr-recommendation-api:latest
        ports:
        - containerPort: 8080
        env:
        - name: PORT
          value: "8080"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 1000m
            memory: 512Mi

Performance ¶

Benchmarks (on M1 Mac):

BenchmarkGetRecommendations-8    50000    23000 ns/op    5000 B/op    80 allocs/op
BenchmarkValidation-8           500000     2500 ns/op     500 B/op    10 allocs/op

The server is designed to handle thousands of requests per second with proper horizontal scaling. Rate limiting prevents resource exhaustion.

References ¶

OpenAPI spec: api/aicr/aicr-v1.yaml
Rate limiting: https://pkg.go.dev/golang.org/x/time/rate
UUID generation: https://pkg.go.dev/github.com/google/uuid
Error groups: https://pkg.go.dev/golang.org/x/sync/errgroup
HTTP best practices: https://datatracker.ietf.org/doc/html/rfc7807
Kubernetes probes: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Index ¶

Constants
func HTTPStatusFromCode(code aicrerrors.ErrorCode) int
func SetAPIVersionHeader(w http.ResponseWriter, version string)
func WriteError(w http.ResponseWriter, r *http.Request, statusCode int, ...)
func WriteErrorFromErr(w http.ResponseWriter, r *http.Request, err error, fallbackMessage string, ...)
type Config
type ErrorResponse
type HealthResponse
type Option
type Server
- func New(opts ...Option) *Server

Constants ¶

View Source

const (
	// DefaultAPIVersion is the default API version if none is negotiated
	DefaultAPIVersion = "v1"
)

Variables ¶

This section is empty.

Functions ¶

func HTTPStatusFromCode ¶

func HTTPStatusFromCode(code aicrerrors.ErrorCode) int

HTTPStatusFromCode maps a canonical error code to an HTTP status. This keeps transport-layer semantics centralized.

func SetAPIVersionHeader ¶

func SetAPIVersionHeader(w http.ResponseWriter, version string)

SetAPIVersionHeader sets the API version header in the response. This helps clients understand which version of the API is being used.

func WriteErrorFromErr ¶

func WriteErrorFromErr(w http.ResponseWriter, r *http.Request, err error, fallbackMessage string, extraDetails map[string]any)

WriteErrorFromErr writes an ErrorResponse based on a canonical structured error. If err is not a *errors.StructuredError, it falls back to INTERNAL.

Types ¶

type ErrorResponse ¶

type ErrorResponse struct {
	Code      string         `json:"code" yaml:"code"`
	Message   string         `json:"message" yaml:"message"`
	Details   map[string]any `json:"details,omitempty" yaml:"details,omitempty"`
	RequestID string         `json:"requestId" yaml:"requestId"`
	Timestamp time.Time      `json:"timestamp" yaml:"timestamp"`
	Retryable bool           `json:"retryable" yaml:"retryable"`
}

ErrorResponse represents error responses as per OpenAPI spec

type HealthResponse ¶

type HealthResponse struct {
	Status    string    `json:"status" yaml:"status"`
	Timestamp time.Time `json:"timestamp" yaml:"timestamp"`
	Reason    string    `json:"reason,omitempty" yaml:"reason,omitempty"`
}

HealthResponse represents health check response

type Option ¶

type Option func(*Server)

Option is a functional option for configuring Server instances.

func WithConfig ¶

func WithConfig(cfg *Config) Option

WithConfig returns an Option that sets a custom configuration for the Server.

func WithHandler ¶

func WithHandler(handlers map[string]http.HandlerFunc) Option

WithHandler returns an Option that adds custom HTTP handlers to the server. The map keys are URL paths and values are the corresponding handler functions.

func WithName ¶

func WithName(name string) Option

WithName returns an Option that sets the server name in the configuration.

func WithVersion ¶

func WithVersion(version string) Option

WithVersion returns an Option that sets the server version in the configuration.

type Server ¶

type Server struct {
	// contains filtered or unexported fields
}

Server represents the HTTP server for handling API requests. It includes rate limiting, health checks, metrics, and graceful shutdown capabilities.

func New ¶

func New(opts ...Option) *Server

New creates a new Server instance with the provided functional options. It parses environment configuration, sets up rate limiting, and configures the HTTP server with health checks, metrics, and custom handlers.

func (*Server) Run ¶

func (s *Server) Run(ctx context.Context) error

RunWithConfig starts the server with custom configuration and graceful shutdown handling.

func (*Server) Shutdown ¶

func (s *Server) Shutdown(ctx context.Context) error

Shutdown gracefully shuts down the server within the given context.

func (*Server) Start ¶

func (s *Server) Start(ctx context.Context) error

Start starts the HTTP server and listens for incoming requests.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL