rla

command
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 17, 2026 License: Apache-2.0 Imports: 1 Imported by: 0

README

Rack Level Administration (RLA)

Makefile targets

  • gen-pb: Regenerate GRPC autogenerated code for the RLA server
  • gen-psmapi-pb: Regenerate GRPC autogenerated code for the PSM client
  • gen-doc: Generate gRPC API documentation (docs/grpc-api.md for GitLab, docs/grpc-api.html for local)
  • container: Build a container tagged for local use (last release version + timestamp + git checksum).

How to run RLA on local machine (direct)

Prerequisites
# 1. Start PostgreSQL
./scripts/start-test-postgres.sh

# 2. Start Temporal (required for task operations)
brew install temporal  # if not installed
temporal server start-dev --namespace rla --port 7233
Build and Run
# Build
go build -o rla

# Set environment (or use direnv with .envrc)
export DB_ADDR=localhost DB_PORT=30432 DB_USER=postgres DB_PASSWORD=postgres DB_DATABASE=rla_test
export TEMPORAL_HOST=localhost TEMPORAL_PORT=7233 TEMPORAL_NAMESPACE=rla

# Run migrations
./rla db migrate

# Start server (default port 50051)
./rla serve

Note: Carbide is not available locally; power/firmware operations will fail. Use dev/staging for those tests.

Test with grpcui
go install github.com/fullstorydev/grpcui/cmd/grpcui@latest
grpcui -plaintext localhost:50051

Task Architecture

Design Principles
  1. 1 Request → 1 TaskSpec: Each operation gRPC request (PowerOn, UpgradeFirmware, etc.) maps to one TaskSpec.

  2. 1 Rack → 1 Task: Task Manager splits TaskSpec by rack. This ensures fault isolation, parallel execution, and independent status tracking per rack.

  3. Temporal Workflows: Each Task runs as a Temporal workflow for durable execution, automatic retries, and observability.

Data Flow
gRPC Request → Server (convert to TaskSpec)
             → Task Manager (resolve + split by rack → create Tasks)
             → Executor (start Temporal Workflow per Task)
             → Workflow (fan-out Activities by component)
             → Activity (call Carbide/PSM API)

gRPC APIs

List all available APIs:

grpcurl -plaintext localhost:50051 list v1.RLA

Describe a specific API:

grpcurl -plaintext localhost:50051 describe v1.RLA/CreateExpectedRack
Example: GetComponentInfoBySerial
grpcurl -plaintext -d '{
  "serial_info": {"manufacturer": "Wiwynn", "serial_number": "B8111801000851500108Y0SA"},
  "with_rack": true
}' localhost:50051 v1.RLA/GetComponentInfoBySerial

Response:

{
  "component": {
    "type": "COMPONENT_TYPE_COMPUTE",
    "info": {
      "id": {
        "id": "57e70199-550d-4934-8fe8-3d951be1afcf"
      },
      "name": "gpu17-nvl5-gp2-cin1-jhb01",
      "manufacturer": "WiWynn",
      "model": "GB200 Compute Tray",
      "serial_number": "B8111801000851500108Y0SA"
    },
    "firmware_version": "FW 1.2.00GA",
    "position": {
      "slot_id": 36,
      "tray_idx": 16,
      "host_id": 0
    },
    "bmcs": [
      {
        "type": "BMC_TYPE_HOST",
        "mac_address": "d8:19:09:00:04:af"
      },
      {
        "type": "BMC_TYPE_DPU",
        "mac_address": "e0:9d:73:80:c0:59"
      },
      {
        "type": "BMC_TYPE_DPU",
        "mac_address": "e0:9d:73:88:de:1b"
      }
    ]
  },
  "rack": {
    "info": {
      "id": {
        "id": "0fe0e9bb-29e0-4940-b923-8e6f7dc017aa"
      },
      "name": "g10",
      "manufacturer": "Wiwynn",
      "serial_number": "B8111801000951700005Y0SA"
    },
    "location": {
      "region": "MY",
      "datacenter": "jhb01",
      "room": "EW-F00-DH-02",
      "position": "EW-F00-DH-02-G10"
    },
    "components": []
  }
}

PSM (Powershelf Manager) Integration

RLA integrates with the Powershelf Manager (PSM) service to manage power shelves. PSM runs as a sidecar container in the same pod as RLA.

Architecture

The integration is implemented in internal/psmapi/ which provides a gRPC client for communicating with PSM:

  • mod.go - Client interface definition
  • grpc.go - Real gRPC client implementation
  • mock.go - Mock client for unit testing
  • model.go - Data models for PSM entities
Configuration

RLA connects to PSM via the PSM_API_URL environment variable, which is set to localhost:50052 by default (PSM sidecar port).

Component Manager Integration

The powershelf component manager (internal/componentmanager/powershelf/) uses the PSM client to:

  • Registration: Register powershelves with PSM
  • PowerControl: Power on/off/reset powershelves
  • FirmwareControl: Upgrade/downgrade powershelf firmware
  • Status: Check powershelf health status
  • FirmwareVersion: Get current firmware version
  • PowerStatus: Get current power state
Usage
import "github.com/NVIDIA/ncx-infra-controller-rest/rla/internal/psmapi"

// Create a client
client, err := psmapi.NewClient(30 * time.Second)
if err != nil {
    log.Fatal(err)
}
defer client.Close()

// Get all powershelves
powershelves, err := client.GetPowershelves(ctx, nil)

// Power on specific powershelves
results, err := client.PowerOn(ctx, []string{"aa:bb:cc:dd:ee:ff"})
Regenerating PSM Protobuf

If the PSM proto file changes, regenerate the client code:

make gen-psmapi-pb

This requires buf to be installed (will be auto-installed if not present).

Documentation

Overview

* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. * SPDX-License-Identifier: Apache-2.0 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License.

Directories

Path Synopsis
internal
alert
Package alert provides an abstraction for sending alerts/notifications from RLA workflows and activities.
Package alert provides an abstraction for sending alerts/notifications from RLA workflows and activities.
inventory/manager
Package manager provides the business logic layer for inventory management.
Package manager provides the business logic layer for inventory management.
inventory/store
Package store provides the storage layer for inventory management.
Package store provides the storage layer for inventory management.
nsmapi
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.
service
Package service implements the gRPC server for the RLA (Rack Level Asset) management system.
Package service implements the gRPC server for the RLA (Rack Level Asset) management system.
task/componentmanager/nvlswitch/nvswitchmanager
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.
task/componentmanager/providers/nvswitchmanager
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.
task/conflict
Package conflict provides data-driven task conflict detection for RLA.
Package conflict provides data-driven task conflict detection for RLA.
task/store
Package store provides the storage layer for task and operation rule management.
Package store provides the storage layer for task and operation rule management.
pkg
client
Package client provides a gRPC client for interacting with the RLA service.
Package client provides a gRPC client for interacting with the RLA service.
metadata
Package metadata contains build-time metadata for the RLA service.
Package metadata contains build-time metadata for the RLA service.
types
Package types provides public domain types for the RLA client.
Package types provides public domain types for the RLA client.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL