rla

command

v1.1.0 Latest Latest Go to latest Published: Mar 17, 2026 License: Apache-2.0 Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/NVIDIA/ncx-infra-controller-rest

Links

Open Source Insights

README ¶

Rack Level Administration (RLA)

Makefile targets

gen-pb: Regenerate GRPC autogenerated code for the RLA server
gen-psmapi-pb: Regenerate GRPC autogenerated code for the PSM client
gen-doc: Generate gRPC API documentation (docs/grpc-api.md for GitLab, docs/grpc-api.html for local)
container: Build a container tagged for local use (last release version + timestamp + git checksum).

How to run RLA on local machine (direct)

Prerequisites

# 1. Start PostgreSQL
./scripts/start-test-postgres.sh

# 2. Start Temporal (required for task operations)
brew install temporal  # if not installed
temporal server start-dev --namespace rla --port 7233

Build and Run

# Build
go build -o rla

# Set environment (or use direnv with .envrc)
export DB_ADDR=localhost DB_PORT=30432 DB_USER=postgres DB_PASSWORD=postgres DB_DATABASE=rla_test
export TEMPORAL_HOST=localhost TEMPORAL_PORT=7233 TEMPORAL_NAMESPACE=rla

# Run migrations
./rla db migrate

# Start server (default port 50051)
./rla serve

Note: Carbide is not available locally; power/firmware operations will fail. Use dev/staging for those tests.

Test with grpcui

go install github.com/fullstorydev/grpcui/cmd/grpcui@latest
grpcui -plaintext localhost:50051

Task Architecture

Design Principles

1 Request → 1 TaskSpec: Each operation gRPC request (PowerOn, UpgradeFirmware, etc.) maps to one TaskSpec.
1 Rack → 1 Task: Task Manager splits TaskSpec by rack. This ensures fault isolation, parallel execution, and independent status tracking per rack.
Temporal Workflows: Each Task runs as a Temporal workflow for durable execution, automatic retries, and observability.

Data Flow

gRPC Request → Server (convert to TaskSpec)
             → Task Manager (resolve + split by rack → create Tasks)
             → Executor (start Temporal Workflow per Task)
             → Workflow (fan-out Activities by component)
             → Activity (call Carbide/PSM API)

gRPC APIs

List all available APIs:

grpcurl -plaintext localhost:50051 list v1.RLA

Describe a specific API:

grpcurl -plaintext localhost:50051 describe v1.RLA/CreateExpectedRack

Example: GetComponentInfoBySerial

grpcurl -plaintext -d '{
  "serial_info": {"manufacturer": "Wiwynn", "serial_number": "B8111801000851500108Y0SA"},
  "with_rack": true
}' localhost:50051 v1.RLA/GetComponentInfoBySerial

Response:

{
  "component": {
    "type": "COMPONENT_TYPE_COMPUTE",
    "info": {
      "id": {
        "id": "57e70199-550d-4934-8fe8-3d951be1afcf"
      },
      "name": "gpu17-nvl5-gp2-cin1-jhb01",
      "manufacturer": "WiWynn",
      "model": "GB200 Compute Tray",
      "serial_number": "B8111801000851500108Y0SA"
    },
    "firmware_version": "FW 1.2.00GA",
    "position": {
      "slot_id": 36,
      "tray_idx": 16,
      "host_id": 0
    },
    "bmcs": [
      {
        "type": "BMC_TYPE_HOST",
        "mac_address": "d8:19:09:00:04:af"
      },
      {
        "type": "BMC_TYPE_DPU",
        "mac_address": "e0:9d:73:80:c0:59"
      },
      {
        "type": "BMC_TYPE_DPU",
        "mac_address": "e0:9d:73:88:de:1b"
      }
    ]
  },
  "rack": {
    "info": {
      "id": {
        "id": "0fe0e9bb-29e0-4940-b923-8e6f7dc017aa"
      },
      "name": "g10",
      "manufacturer": "Wiwynn",
      "serial_number": "B8111801000951700005Y0SA"
    },
    "location": {
      "region": "MY",
      "datacenter": "jhb01",
      "room": "EW-F00-DH-02",
      "position": "EW-F00-DH-02-G10"
    },
    "components": []
  }
}

PSM (Powershelf Manager) Integration

RLA integrates with the Powershelf Manager (PSM) service to manage power shelves. PSM runs as a sidecar container in the same pod as RLA.

Architecture

The integration is implemented in internal/psmapi/ which provides a gRPC client for communicating with PSM:

mod.go - Client interface definition
grpc.go - Real gRPC client implementation
mock.go - Mock client for unit testing
model.go - Data models for PSM entities

Configuration

RLA connects to PSM via the PSM_API_URL environment variable, which is set to localhost:50052 by default (PSM sidecar port).

Component Manager Integration

The powershelf component manager (internal/componentmanager/powershelf/) uses the PSM client to:

Registration: Register powershelves with PSM
PowerControl: Power on/off/reset powershelves
FirmwareControl: Upgrade/downgrade powershelf firmware
Status: Check powershelf health status
FirmwareVersion: Get current firmware version
PowerStatus: Get current power state

Usage

import "github.com/NVIDIA/ncx-infra-controller-rest/rla/internal/psmapi"

// Create a client
client, err := psmapi.NewClient(30 * time.Second)
if err != nil {
    log.Fatal(err)
}
defer client.Close()

// Get all powershelves
powershelves, err := client.GetPowershelves(ctx, nil)

// Power on specific powershelves
results, err := client.PowerOn(ctx, []string{"aa:bb:cc:dd:ee:ff"})

Regenerating PSM Protobuf

If the PSM proto file changes, regenerate the client code:

make gen-psmapi-pb

This requires buf to be installed (will be auto-installed if not present).

Documentation ¶

Overview ¶

* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. * SPDX-License-Identifier: Apache-2.0 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cmd
internal
alert Package alert provides an abstraction for sending alerts/notifications from RLA workflows and activities.	Package alert provides an abstraction for sending alerts/notifications from RLA workflows and activities.
carbideapi
carbideapi/gen
certs
clients/temporal
common/utils
config
converter/dao
converter/protobuf
db/migrations
db/model
db/query
inventory/manager Package manager provides the business logic layer for inventory management.	Package manager provides the business logic layer for inventory management.
inventory/store Package store provides the storage layer for inventory management.	Package store provides the storage layer for inventory management.
inventorysync
nsmapi * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.	* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.
nsmapi/gen
operation
psmapi
psmapi/gen
service Package service implements the gRPC server for the RLA (Rack Level Asset) management system.	Package service implements the gRPC server for the RLA (Rack Level Asset) management system.
task/common
task/componentmanager
task/componentmanager/compute/carbide
task/componentmanager/mock
task/componentmanager/nvlswitch/carbide
task/componentmanager/nvlswitch/nvswitchmanager * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.	* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.
task/componentmanager/powershelf/psm
task/componentmanager/providers/carbide
task/componentmanager/providers/nvswitchmanager * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.	* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.
task/componentmanager/providers/psm
task/conflict Package conflict provides data-driven task conflict detection for RLA.	Package conflict provides data-driven task conflict detection for RLA.
task/executor
task/executor/temporalworkflow/activity
task/executor/temporalworkflow/common
task/executor/temporalworkflow/manager
task/executor/temporalworkflow/workflow
task/manager
task/operationrules
task/operations
task/store Package store provides the storage layer for task and operation rule management.	Package store provides the storage layer for task and operation rule management.
task/task
pkg
client Package client provides a gRPC client for interacting with the RLA service.	Package client provides a gRPC client for interacting with the RLA service.
common/Identifier
common/deviceinfo
common/devicetypes
common/errors
common/location
common/rackopreport
common/utils
inventoryobjects/bmc
inventoryobjects/component
inventoryobjects/nvldomain
inventoryobjects/rack
metadata Package metadata contains build-time metadata for the RLA service.	Package metadata contains build-time metadata for the RLA service.
proto/v1
types Package types provides public domain types for the RLA client.	Package types provides public domain types for the RLA client.
workerpool

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL