server

package
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 27, 2026 License: Apache-2.0 Imports: 21 Imported by: 0

Documentation

Overview

Package server implements the AICR System Configuration Recommendation API as defined in api/aicr/aicr-v1.yaml

This implementation follows production-grade distributed systems best practices:

Architecture

The server implements a stateless HTTP API with the following key components:

  • Request validation using regex patterns from OpenAPI spec
  • Rate limiting using token bucket algorithm (golang.org/x/time/rate)
  • Request ID tracking for distributed tracing
  • Panic recovery for resilience
  • Graceful shutdown handling
  • Health and readiness probes for Kubernetes

Usage

Basic server startup:

package main

import (
    "github.com/NVIDIA/aicr/pkg/server"
)

func main() {
    if err := server.Run(); err != nil {
        panic(err)
    }
}

Custom configuration:

config := server.DefaultConfig()
config.Port = 9090
config.RateLimit = 200  // 200 requests/sec
config.RateLimitBurst = 400

if err := server.RunWithConfig(config); err != nil {
    panic(err)
}

API Endpoints

GET /v1/recipe - Generate configuration recipe

Query parameters:
  - os: ubuntu, cos, any (default: any)
  - osv: OS version (e.g., 24.04, 22.04)
  - kernel: kernel version (e.g., 6.8, 5.15.0)
  - service: eks, gke, aks, self-managed, any (default: any)
  - k8s: Kubernetes version (e.g., 1.33, 1.32)
  - gpu: h100, gb200, a100, l40, any (default: any)
  - intent: training, inference, any (default: any)
  - context: true/false - include context metadata (default: false)

Example:
  curl "http://localhost:8080/v1/recipe?os=ubuntu&osv=24.04&gpu=h100&intent=training"

GET /health - Health check (for liveness probe)

Always returns 200 OK with {"status": "healthy", "timestamp": "..."}

GET /ready - Readiness check (for readiness probe)

Returns 200 OK when ready, 503 when not ready

Observability

Request ID Tracking:

All requests accept an optional X-Request-Id header (UUID format).
If not provided, the server generates one automatically.
The request ID is returned in the X-Request-Id response header
and included in all error responses for tracing.

Rate Limiting:

Response headers indicate rate limit status:
  X-RateLimit-Limit: Total requests allowed per window
  X-RateLimit-Remaining: Requests remaining in current window
  X-RateLimit-Reset: Unix timestamp when window resets

When rate limited, returns 429 with Retry-After header.

Cache Headers:

Recommendation responses include Cache-Control headers for CDN/client caching:
  Cache-Control: public, max-age=300

Error Handling

All errors return a consistent JSON structure:

{
  "code": "INVALID_PARAMETER",
  "message": "invalid osFamily: must be one of Ubuntu, RHEL, ALL",
  "details": {"request": {...}},
  "requestId": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2025-12-22T12:00:00Z",
  "retryable": false
}

Error codes:

  • INVALID_PARAMETER: Invalid request parameter (400)
  • INVALID_JSON: Malformed JSON payload (400)
  • NO_MATCHING_RULE: No recommendation found (404)
  • RATE_LIMIT_EXCEEDED: Too many requests (429)
  • INTERNAL_ERROR: Server error (500)

Deployment

Kubernetes deployment example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aicr-recommendation-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: aicr-recommendation-api
  template:
    metadata:
      labels:
        app: aicr-recommendation-api
    spec:
      containers:
      - name: api
        image: aicr-recommendation-api:latest
        ports:
        - containerPort: 8080
        env:
        - name: PORT
          value: "8080"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 1000m
            memory: 512Mi

Performance

Benchmarks (on M1 Mac):

BenchmarkGetRecommendations-8    50000    23000 ns/op    5000 B/op    80 allocs/op
BenchmarkValidation-8           500000     2500 ns/op     500 B/op    10 allocs/op

The server is designed to handle thousands of requests per second with proper horizontal scaling. Rate limiting prevents resource exhaustion.

References

Index

Constants

View Source
const (
	// DefaultAPIVersion is the default API version if none is negotiated
	DefaultAPIVersion = "v1"
)

Variables

This section is empty.

Functions

func HTTPStatusFromCode

func HTTPStatusFromCode(code aicrerrors.ErrorCode) int

HTTPStatusFromCode maps a canonical error code to an HTTP status. This keeps transport-layer semantics centralized.

func SetAPIVersionHeader

func SetAPIVersionHeader(w http.ResponseWriter, version string)

SetAPIVersionHeader sets the API version header in the response. This helps clients understand which version of the API is being used.

func WriteError

func WriteError(w http.ResponseWriter, r *http.Request, statusCode int,
	code aicrerrors.ErrorCode, message string, retryable bool, details map[string]any)

writeError writes error response

func WriteErrorFromErr

func WriteErrorFromErr(w http.ResponseWriter, r *http.Request, err error, fallbackMessage string, extraDetails map[string]any)

WriteErrorFromErr writes an ErrorResponse based on a canonical structured error. If err is not a *errors.StructuredError, it falls back to INTERNAL.

Types

type Config

type Config struct {
	// Server identity
	Name    string
	Version string

	// Additional Handlers to be added to the server
	Handlers map[string]http.HandlerFunc

	// Server configuration
	Address string
	Port    int

	// Rate limiting configuration
	RateLimit      rate.Limit // requests per second
	RateLimitBurst int        // burst size

	// Timeouts
	ReadTimeout     time.Duration
	WriteTimeout    time.Duration
	IdleTimeout     time.Duration
	ShutdownTimeout time.Duration
}

Config holds server configuration

type ErrorResponse

type ErrorResponse struct {
	Code      string         `json:"code" yaml:"code"`
	Message   string         `json:"message" yaml:"message"`
	Details   map[string]any `json:"details,omitempty" yaml:"details,omitempty"`
	RequestID string         `json:"requestId" yaml:"requestId"`
	Timestamp time.Time      `json:"timestamp" yaml:"timestamp"`
	Retryable bool           `json:"retryable" yaml:"retryable"`
}

ErrorResponse represents error responses as per OpenAPI spec

type HealthResponse

type HealthResponse struct {
	Status    string    `json:"status" yaml:"status"`
	Timestamp time.Time `json:"timestamp" yaml:"timestamp"`
	Reason    string    `json:"reason,omitempty" yaml:"reason,omitempty"`
}

HealthResponse represents health check response

type Option

type Option func(*Server)

Option is a functional option for configuring Server instances.

func WithConfig

func WithConfig(cfg *Config) Option

WithConfig returns an Option that sets a custom configuration for the Server.

func WithHandler

func WithHandler(handlers map[string]http.HandlerFunc) Option

WithHandler returns an Option that adds custom HTTP handlers to the server. The map keys are URL paths and values are the corresponding handler functions.

func WithName

func WithName(name string) Option

WithName returns an Option that sets the server name in the configuration.

func WithVersion

func WithVersion(version string) Option

WithVersion returns an Option that sets the server version in the configuration.

type Server

type Server struct {
	// contains filtered or unexported fields
}

Server represents the HTTP server for handling API requests. It includes rate limiting, health checks, metrics, and graceful shutdown capabilities.

func New

func New(opts ...Option) *Server

New creates a new Server instance with the provided functional options. It parses environment configuration, sets up rate limiting, and configures the HTTP server with health checks, metrics, and custom handlers.

func (*Server) Run

func (s *Server) Run(ctx context.Context) error

RunWithConfig starts the server with custom configuration and graceful shutdown handling.

func (*Server) Shutdown

func (s *Server) Shutdown(ctx context.Context) error

Shutdown gracefully shuts down the server within the given context.

func (*Server) Start

func (s *Server) Start(ctx context.Context) error

Start starts the HTTP server and listens for incoming requests.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL