rdmaperf

command
v0.6.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 21, 2026 License: MIT Imports: 21 Imported by: 0

README

rdmaperf

rdmaperf compares TCP over ordinary network links, TCP over Thunderbolt Bridge, and RDMA userspace readiness.

Run a server on the receiving Mac:

GOWORK=off go run ./examples/rdma/rdmaperf serve -listen 169.254.x.y:9000

Run clients from the other Mac:

GOWORK=off go run ./examples/rdma/rdmaperf tcp -addr 169.254.x.y:9000 -pattern stream -size 1M -duration 30s
GOWORK=off go run ./examples/rdma/rdmaperf tcp -addr 169.254.x.y:9000 -pattern pingpong -size 64 -duration 30s
GOWORK=off go run ./examples/rdma/rdmaperf sweep -addr 169.254.x.y:9000 -pattern stream -duration 10s

Use interfaces to find local addresses:

GOWORK=off go run ./examples/rdma/rdmaperf interfaces

Use the same commands against Wi-Fi, Ethernet, and Thunderbolt Bridge addresses. The TCP tests measure the chosen IP path; they do not prove RDMA datapath use.

Check RDMA userspace readiness separately:

GOWORK=off go run ./examples/rdma/rdmaperf rdma-probe -timeout 10s -json

True RDMA datapath benchmarking requires successful protection-domain, completion-queue, memory-region, queue-pair lifecycle, QP transition, and work completion polling. When provider calls block, the probe watchdog exits 124 instead of leaving the command hung.

The rdma-pingpong command drives QP INIT->RTR. On the Apple Thunderbolt RDMA provider, repeated failed RTR attempts have been observed to tear down the kernel transmit path and wedge the port until reboot. The command therefore requires -allow-rtr. Treat each run as one bounded experiment, not as a retry loop.

AppleThunderboltRDMA also appears to have tight resource limits. Community reports against JACCL/exo describe practical ceilings around dozens of protection domains and about one hundred memory regions per device, with resource exhaustion sometimes requiring reboot. rdma-pingpong opens one context, one protection domain, one completion queue, one memory region, and one queue pair per role; do not wrap it in a retry loop or launch many instances in parallel.

Long-lived applications should not leave successful QPs idle for long periods. EXO/JACCL reports describe idle connections degrading after tens of minutes. This tool runs its TCP post-RTS barrier and immediately starts traffic; code that keeps QPs open should add an application-level heartbeat or tear down idle QPs.

To test the RDMA datapath, run rdma-pingpong on two Macs connected by Thunderbolt. TCP is used only to exchange LID/QPN/PSN/GID setup data; measured traffic uses RDMA UC SEND/RECV.

By default rdma-pingpong chooses the source GID by preferring IPv4-mapped GIDs, then Apple Thunderbolt GID index 1, then the first non-zero GID on non-Thunderbolt link layers. On Apple Thunderbolt, auto-selection never uses index 0, even when index 0 is non-zero or IPv4-mapped. This matches the JACCL Thunderbolt fallback and avoids the observed index-0 route that can fail at INIT->RTR with errno 60. Use -gid-index 0 only for an explicit diagnostic run.

Server:

perl -e 'alarm shift; exec @ARGV' 30 env GOWORK=off \
  go run ./examples/rdma/rdmaperf rdma-pingpong \
  -listen 169.254.x.y:19100 -size 64 -iters 10000 \
  -setup-timeout 12s -allow-rtr -json

Client:

perl -e 'alarm shift; exec @ARGV' 30 env GOWORK=off \
  go run ./examples/rdma/rdmaperf rdma-pingpong \
  -addr 169.254.x.y:19100 -size 64 -iters 10000 \
  -setup-timeout 12s -allow-rtr -json

Save JSON, stderr, and exit status from both roles. If QP setup fails after resources open, the output includes the local and remote LID, QPN, PSN, GID index, and GID needed for the next QP transition fix. If local resource setup times out first, the JSON reports the setup timeout instead.

-setup-timeout lets the command write a structured error when a provider call blocks, but it does not recover a wedged provider goroutine. Keep the outer timeout wrapper while testing ports or machines that may wedge in the RDMA provider. The perl wrapper above is available on stock macOS. If the remote side is launched by non-login SSH, ensure PATH includes the Go toolchain before the script runs its preflight.

If both roles reach QP INIT->RTR and fail with modify qp RTR: errno 60, save both JSON files, stderr, exit status, and recent AppleThunderboltRDMA logs, then stop. Do not retry the same topology in a loop; repeated failed RTR attempts can wedge the provider even when one bounded attempt cleans up.

Omit -name and -device after Thunderbolt cable or port changes so rdma-pingpong can auto-select a PORT_ACTIVE RDMA device. Use explicit selection only after confirming the RDMA device's port is active. With -setup-timeout, auto-selection probes each candidate in a child process so a wedged port query does not block the parent command. A wedged provider may still leave the isolated child process behind until the kernel releases it.

If RDMA setup starts failing with protection-domain allocation errors, or if a provider probe leaves uninterruptible child processes behind, reboot the affected node before another performance run. After reboot, confirm RDMA is still enabled and verify the selected rdma_en* port is PORT_ACTIVE. Prefer assigning explicit IP addresses to the underlying per-port Thunderbolt interfaces over destroying bridge0, especially on headless machines or setups that also use USB Ethernet adapters.

Documentation

Overview

Command rdmaperf measures TCP and RDMA readiness paths.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL