accel

package module

v1.3.0 Latest Latest Go to latest Published: Jun 7, 2026 License: BSD-3-Clause Imports: 14 Imported by: 12

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/luxfi/accel

Links

Open Source Insights

Documentation ¶

Overview ¶

Package accel provides GPU-accelerated operations for blockchain and ML workloads.

The package supports multiple GPU backends (Metal, WebGPU, CUDA) via runtime plugin discovery. When built without CGO or when no backends are available, operations return ErrNoBackends.

Architecture ¶

accel wraps the lux-accel C++ library which provides:

ML operations: matmul, attention, convolution, normalization
Crypto operations: batch signature verification, hashing, Merkle trees
ZK operations: NTT, MSM, polynomial arithmetic
Lattice crypto: Kyber, Dilithium post-quantum operations
FHE operations: BFV/CKKS homomorphic encryption
DEX operations: AMM swaps, TWAP, order matching

Backend Selection ¶

Backends are automatically detected and selected in this priority order:

CUDA (NVIDIA GPUs)
Metal (Apple Silicon)
WebGPU (cross-platform fallback)

You can override with environment variable GPU_BACKEND or via API:

session, _ := accel.NewSessionWithBackend(accel.BackendMetal)

The deprecated names LUX_BACKEND, LUX_ACCEL_BACKEND, and CRYPTO_BACKEND (when set to a backend name) are read for one transition release with a deprecation log message.

Runtime Backend Selection ¶

For intelligent backend selection based on required operations:

// Select best backend for ZK operations
backend, _ := accel.SelectBackend(accel.OpNTT, accel.OpMSM)
session, _ := accel.NewSessionWithBackend(backend)

// Query capabilities
caps, _ := accel.Capabilities(accel.BackendWebGPU)
if caps.Supports(accel.OpMSM) {
    // Use MSM on WebGPU
}

// Compare backends for an operation
comparison, _ := accel.CompareBackends(accel.OpNTT, 10)
fmt.Printf("Fastest backend for NTT: %s\n", comparison.Fastest)

// Print all capabilities
accel.PrintCapabilities()

Pure Go Mode ¶

When built with CGO_ENABLED=0, the package compiles in pure Go mode. All operations return ErrNoBackends but the package remains importable, allowing graceful fallback to CPU implementations.

Basic Usage ¶

// Initialize library
if err := accel.Init(); err != nil {
    log.Printf("GPU accel not available: %v", err)
}
defer accel.Shutdown()

// Check availability
if !accel.Available() {
    // Use CPU fallback
    return
}

// Create session
session, err := accel.NewSession()
if err != nil {
    log.Fatal(err)
}
defer session.Close()

// Create tensors
a, _ := accel.NewTensor[float32](session, []int{1024, 1024})
b, _ := accel.NewTensor[float32](session, []int{1024, 1024})
c, _ := accel.NewTensor[float32](session, []int{1024, 1024})

// Perform GPU operation
if err := session.ML().MatMul(a.Untyped(), b.Untyped(), c.Untyped()); err != nil {
    log.Fatal(err)
}

Integration with Lux Node ¶

The accel package integrates with lux-node for:

Batch signature verification in consensus
Merkle tree computation for state sync
Post-quantum cryptography for future-proofing

See the node/consensus and precompile packages for integration examples.

Index ¶

Constants
Variables
func AllCapabilities() map[BackendType]*BackendCapabilities
func Available() bool
func BLSBatchVerify(pks, sigs, msgs [][]byte) ([]bool, error)
func CUDAAvailable() bool
func CryptoSecp256k1Ecrecover(hash, r, s []byte, v byte) ([]byte, error)
func CryptoSecp256k1EcrecoverBatch(inputs []byte) (pubkeys, statuses []byte, err error)
func DeviceCount() int
func DilithiumSign(msg, sk []byte) (sig []byte, err error)
func DilithiumVerify(msg, sig, pk []byte) (bool, error)
func GetLastError() string
func GetVersion() string
func Init() error
func Keccak256Batch(inputs [][]byte) ([][]byte, error)
func KyberDecaps(ct, sk []byte) (ss []byte, err error)
func KyberEncaps(pk []byte) (ct, ss []byte, err error)
func KyberKeyGen() (pk, sk []byte, err error)
func LatticeNTTMLDSABatch(polys [][]int32, inverse bool) error
func LoadPlugin(path string) error
func MLDSASignBatch(mode int, msgs, sks [][]byte, msgWidth int) ([][]byte, error)
func MLDSAVerifyBatch(mode int, msgs, sigs, pks [][]byte, msgWidth int) ([]bool, error)
func MSM(scalars, bases [][]byte) ([]byte, error)
func MerkleRoot(leaves [][]byte) ([]byte, error)
func MetalAvailable() bool
func NTTForward(coeffs, roots []uint64, modulus uint64) error
func NTTInverse(coeffs, invRoots []uint64, modulus uint64) error
func PrintCapabilities()
func SHA256Batch(inputs [][]byte) ([][]byte, error)
func Shutdown()
func WebGPUAvailable() bool
type BackendCapabilities
- func Capabilities(backend BackendType) (*BackendCapabilities, error)
- func GetCapabilities(backend BackendType) (*BackendCapabilities, error)
- func (c *BackendCapabilities) SupportedOperations() []OperationType
- func (c *BackendCapabilities) Supports(op OperationType) bool
- func (c *BackendCapabilities) SupportsCategory(cat string) bool
type BackendComparison
- func CompareBackends(op OperationType, iterations int) (*BackendComparison, error)
type BackendInfo
type BackendType
- func Backends() []BackendType
- func MustSelectBackend(ops ...OperationType) BackendType
- func SelectBackend(ops ...OperationType) (BackendType, error)
- func SelectBestBackend(ops []OperationType, preferPerformance bool) (BackendType, error)
- func (b BackendType) String() string
type BenchmarkResult
type Candidate
type CryptoOps
type DEXOps
type DType
- func DTypeOf[T TensorElement]() DType
- func (d DType) Size() int
- func (d DType) String() string
type DeviceCaps
- func (c DeviceCaps) Has(cap DeviceCaps) bool
type DeviceInfo
- func Devices() []DeviceInfo
- func (d *DeviceInfo) MemoryGB() float64
type Error
- func (e *Error) Error() string
- func (e *Error) Unwrap() error
type FHEOps
type LatticeOps
type MLOps
type OperationType
- func (o OperationType) Category() string
- func (o OperationType) String() string
type PathReport
- func GPUPaths() PathReport
type Priority
type Provenance
- func (Provenance) GPUPaths() PathReport
type Session
- func DefaultSession() (*Session, error)
- func NewSession(opts ...SessionOption) (*Session, error)
- func NewSessionWithBackend(backend BackendType, opts ...SessionOption) (*Session, error)
- func NewSessionWithDevice(backend BackendType, deviceIndex int, opts ...SessionOption) (*Session, error)
- func (s *Session) Backend() BackendType
- func (s *Session) Close() error
- func (s *Session) Crypto() CryptoOps
- func (s *Session) DEX() DEXOps
- func (s *Session) DeviceInfo() DeviceInfo
- func (s *Session) FHE() FHEOps
- func (s *Session) IsClosed() bool
- func (s *Session) Lattice() LatticeOps
- func (s *Session) ML() MLOps
- func (s *Session) Sync() error
- func (s *Session) SyncContext(ctx context.Context) error
- func (s *Session) ZK() ZKOps
type SessionOption
- func WithAsync(async bool) SessionOption
- func WithBackend(b BackendType) SessionOption
- func WithDevice(index int) SessionOption
type Source
type Tensor
- func NewTensor[T TensorElement](s *Session, shape []int) (*Tensor[T], error)
- func NewTensorWithData[T TensorElement](s *Session, shape []int, data []T) (*Tensor[T], error)
- func (t *Tensor[T]) Bytes() int
- func (t *Tensor[T]) Close()
- func (t *Tensor[T]) DType() DType
- func (t *Tensor[T]) FromSlice(src []T) error
- func (t *Tensor[T]) NDim() int
- func (t *Tensor[T]) NumEl() int
- func (t *Tensor[T]) Shape() []int
- func (t *Tensor[T]) ToSlice() ([]T, error)
- func (t *Tensor[T]) Untyped() *UntypedTensor
type TensorElement
type UntypedTensor
- func (t *UntypedTensor) Bytes() int
- func (t *UntypedTensor) DType() DType
- func (t *UntypedTensor) Handle() uintptr
- func (t *UntypedTensor) NDim() int
- func (t *UntypedTensor) NumEl() int
- func (t *UntypedTensor) Shape() []int
type VMSession
- func NewVMSession(vmID string, opts ...VMSessionOption) (*VMSession, error)
- func (v *VMSession) Close() error
- func (v *VMSession) ID() string
- func (v *VMSession) IsAvailable() bool
- func (v *VMSession) IsClosed() bool
- func (v *VMSession) MemoryUsed() int64
- func (v *VMSession) Priority() Priority
- func (v *VMSession) Session() *Session
- func (v *VMSession) Stats() (uint64, uint64, uint64)
- func (v *VMSession) Submit(ctx context.Context, f func(*Session) error) error
- func (v *VMSession) Sync() error
type VMSessionOption
- func WithMemoryBudget(bytes int64) VMSessionOption
- func WithPriority(p Priority) VMSessionOption
- func WithQueueDepth(n int) VMSessionOption
- func WithSharedDevice() VMSessionOption
- func WithVMBackend(b BackendType) VMSessionOption
type ZKOps

Constants ¶

View Source

const (
	BLSBatchVerifyThreshold    = 64  // Min signatures for GPU batch verify
	BLSBatchAggregateThreshold = 128 // Min items for GPU aggregation
	HashBatchThreshold         = 32  // Min items for GPU batch hash
	NTTBatchThreshold          = 4   // Min polynomials for GPU batch NTT
	MSMBatchThreshold          = 64  // Min points for GPU MSM
	KyberBatchThreshold        = 8   // Min operations for GPU batch
	DilithiumBatchThreshold    = 8   // Min operations for GPU batch
)

Batch operation thresholds - minimum items for GPU acceleration to be worthwhile.

View Source

const (
	KyberPublicKeySize  = 1184
	KyberSecretKeySize  = 2400
	KyberCiphertextSize = 1088
	KyberSharedKeySize  = 32
)

Kyber key and ciphertext sizes (ML-KEM-768)

View Source

const (
	DilithiumPublicKeySize = 1952
	DilithiumSecretKeySize = 4016
	DilithiumSignatureSize = 3309
)

Dilithium sizes (ML-DSA-65). The DilithiumSecretKeySize=4016 constant predates the FIPS 204 final fix that pinned the ML-DSA-65 secret key at 4032 bytes; new code should prefer the MLDSA* sizes below.

View Source

const (
	// Mode IDs.
	MLDSAMode44 = 2
	MLDSAMode65 = 3
	MLDSAMode87 = 5

	// Per-mode tensor widths (FIPS 204).
	MLDSA44PublicKeySize = 1312
	MLDSA44SecretKeySize = 2560
	MLDSA44SignatureSize = 2420

	MLDSA65PublicKeySize = 1952
	MLDSA65SecretKeySize = 4032
	MLDSA65SignatureSize = 3309

	MLDSA87PublicKeySize = 2592
	MLDSA87SecretKeySize = 4896
	MLDSA87SignatureSize = 4627

	// ML-DSA NTT poly width (FIPS 204 fixed at N = 256).
	MLDSANTTPolyLen = 256

	// MLDSABatchThreshold: minimum batch size at which the GPU
	// dispatch path is engaged. Below this, callers should fall
	// through to the per-element CPU oracle to amortise launch cost.
	MLDSABatchThreshold = 8
)

ML-DSA / FIPS 204 sizes per NIST level. Mode encoding matches the luxcpp/crypto/mldsa C ABI: 2 = ML-DSA-44, 3 = ML-DSA-65, 5 = ML-DSA-87.

View Source

const Version = "0.1.0"

Version is the library version.

Variables ¶

View Source

var (
	// ErrNoBackends indicates no GPU backends are available.
	ErrNoBackends = errors.New("accel: no GPU backends available")

	// ErrNotInitialized indicates the library was not initialized.
	ErrNotInitialized = errors.New("accel: library not initialized")

	// ErrInvalidArgument indicates an invalid argument was provided.
	ErrInvalidArgument = errors.New("accel: invalid argument")

	// ErrOutOfMemory indicates GPU memory allocation failed.
	ErrOutOfMemory = errors.New("accel: out of GPU memory")

	// ErrNotSupported indicates the operation is not supported.
	ErrNotSupported = errors.New("accel: operation not supported")

	// ErrKernelFailed indicates a GPU kernel execution failed.
	ErrKernelFailed = errors.New("accel: kernel execution failed")

	// ErrBackendNotFound indicates the requested backend is not available.
	ErrBackendNotFound = errors.New("accel: backend not found")

	// ErrSessionClosed indicates the session has been closed.
	ErrSessionClosed = errors.New("accel: session closed")

	// ErrShapeMismatch indicates tensor shapes are incompatible.
	ErrShapeMismatch = errors.New("accel: tensor shape mismatch")

	// ErrBatchSizeMismatch indicates mismatched batch input sizes.
	ErrBatchSizeMismatch = errors.New("accel: mismatched batch input sizes")

	// ErrNilInput indicates nil input in batch operation.
	ErrNilInput = errors.New("accel: nil input in batch operation")
)

View Source

var BackendPriority = []BackendType{
	BackendCUDA,
	BackendMetal,
	BackendWebGPU,
}

BackendPriority defines the order for automatic backend selection.

View Source

var ErrCryptoNativeUnavailable = errors.New("accel: lux_crypto native not built (use -tags=lux_crypto_native)")

ErrCryptoNativeUnavailable indicates the lux_crypto native libraries are not linked into this build. Rebuild with `-tags=lux_crypto_native` and ensure liblux_crypto_secp256k1.a is on the linker path.

View Source

var ErrEmptyVMID = errors.New("accel: vmID must not be empty")

ErrEmptyVMID is returned when NewVMSession is called with an empty vmID.

View Source

var ErrSessionBudgetExceeded = errors.New("accel: session memory budget exceeded")

ErrSessionBudgetExceeded is returned when an op would exceed the memory cap.

Functions ¶

func AllCapabilities ¶

func AllCapabilities() map[BackendType]*BackendCapabilities

AllCapabilities returns capabilities for all available backends.

func Available ¶

func Available() bool

Available returns true if at least one GPU backend is available.

func BLSBatchVerify ¶

func BLSBatchVerify(pks, sigs, msgs [][]byte) ([]bool, error)

BLSBatchVerify verifies multiple BLS signatures using GPU acceleration. Returns slice of bools indicating validity of each signature. Returns ErrNotSupported if GPU unavailable or batch too small.

func CUDAAvailable ¶

func CUDAAvailable() bool

CUDAAvailable returns true if CUDA backend is available.

func CryptoSecp256k1Ecrecover ¶ added in v1.0.8

func CryptoSecp256k1Ecrecover(hash, r, s []byte, v byte) ([]byte, error)

CryptoSecp256k1Ecrecover stub.

func CryptoSecp256k1EcrecoverBatch ¶ added in v1.0.8

func CryptoSecp256k1EcrecoverBatch(inputs []byte) (pubkeys, statuses []byte, err error)

CryptoSecp256k1EcrecoverBatch stub.

func DeviceCount ¶

func DeviceCount() int

DeviceCount returns the total number of available devices.

func DilithiumSign ¶

func DilithiumSign(msg, sk []byte) (sig []byte, err error)

DilithiumSign signs a message using Dilithium (ML-DSA).

func DilithiumVerify ¶

func DilithiumVerify(msg, sig, pk []byte) (bool, error)

DilithiumVerify verifies a Dilithium signature.

func GetLastError ¶

func GetLastError() string

GetLastError returns the last error message from the C library.

func GetVersion ¶

func GetVersion() string

GetVersion returns the C library version string.

func Init ¶

func Init() error

Init initializes the accel library. Must be called before any other operations. Safe to call multiple times; subsequent calls are no-ops.

func Keccak256Batch ¶

func Keccak256Batch(inputs [][]byte) ([][]byte, error)

Keccak256Batch computes Keccak256 hashes for multiple inputs using GPU. Returns ErrNotSupported if GPU unavailable or batch too small.

func KyberDecaps ¶

func KyberDecaps(ct, sk []byte) (ss []byte, err error)

KyberDecaps decapsulates a ciphertext using a secret key.

func KyberEncaps ¶

func KyberEncaps(pk []byte) (ct, ss []byte, err error)

KyberEncaps encapsulates a shared secret using a public key.

func KyberKeyGen ¶

func KyberKeyGen() (pk, sk []byte, err error)

KyberKeyGen generates a Kyber keypair using GPU acceleration.

func LatticeNTTMLDSABatch ¶ added in v1.2.0

func LatticeNTTMLDSABatch(polys [][]int32, inverse bool) error

LatticeNTTMLDSABatch performs the in-place forward (inverse=false) or inverse (inverse=true) Number-Theoretic Transform over Z_q[X]/(X^256 + 1) with q = 8380417 (the FIPS 204 ML-DSA prime), batched across n=len(polys) polynomials. Each polys[i] MUST have length MLDSANTTPolyLen (256). The transform writes back into the input slice.

Byte-equal to PQCLEAN_MLDSA65_CLEAN_ntt (forward). The dispatcher engages the GPU substrate only when len(polys) >= MLDSABatchThreshold (default 8); below threshold ErrNotSupported is returned so callers route to the per-poly CPU oracle.

func LoadPlugin ¶

func LoadPlugin(path string) error

LoadPlugin explicitly loads a backend plugin from a path.

func MLDSASignBatch ¶ added in v1.2.0

func MLDSASignBatch(mode int, msgs, sks [][]byte, msgWidth int) ([][]byte, error)

MLDSASignBatch signs n messages with ML-DSA at the given FIPS 204 mode in {MLDSAMode44, MLDSAMode65, MLDSAMode87}. msgs and sks are flattened byte slices in row-major order; the returned [][]byte is one sig per input.

Returns ErrNotSupported when the substrate has not shipped the op.

func MLDSAVerifyBatch ¶ added in v1.2.0

func MLDSAVerifyBatch(mode int, msgs, sigs, pks [][]byte, msgWidth int) ([]bool, error)

MLDSAVerifyBatch verifies n ML-DSA signatures at the given FIPS 204 mode in {MLDSAMode44, MLDSAMode65, MLDSAMode87}. msgs, sigs, pks are flattened byte slices in row-major order with the per-mode widths. results[i] is set to true iff sigs[i] verifies under pks[i] for msgs[i]. The msgWidth argument pins the per-message tensor stride (callers MUST pad to a fixed width — the FIPS 204 verifier hashes the full padded width into mu).

Returns ErrNotSupported when the GPU substrate or libluxaccel impl has not yet shipped the operation — callers should fall through to the per-element CPU verify path. Returns ErrBatchSizeMismatch on shape mismatch, ErrInvalidArgument on unknown mode.

func MSM ¶

func MSM(scalars, bases [][]byte) ([]byte, error)

MSM computes Multi-Scalar Multiplication: sum(scalars[i] * bases[i]) Returns ErrNotSupported if GPU unavailable or batch too small.

func MerkleRoot ¶

func MerkleRoot(leaves [][]byte) ([]byte, error)

MerkleRoot computes the Merkle root of leaves using GPU. Returns ErrNotSupported if GPU unavailable or batch too small.

func MetalAvailable ¶

func MetalAvailable() bool

MetalAvailable returns true if Metal backend is available.

func NTTForward ¶

func NTTForward(coeffs, roots []uint64, modulus uint64) error

NTTForward computes forward Number Theoretic Transform on a polynomial. Modifies coeffs in-place.

func NTTInverse ¶

func NTTInverse(coeffs, invRoots []uint64, modulus uint64) error

NTTInverse computes inverse Number Theoretic Transform on a polynomial. Modifies coeffs in-place.

func PrintCapabilities ¶

func PrintCapabilities()

PrintCapabilities prints a human-readable summary of backend capabilities.

func SHA256Batch ¶

func SHA256Batch(inputs [][]byte) ([][]byte, error)

SHA256Batch computes SHA256 hashes for multiple inputs using GPU. Returns ErrNotSupported if GPU unavailable or batch too small.

func Shutdown ¶

func Shutdown()

Shutdown releases all library resources. Call when done using the library.

func WebGPUAvailable ¶

func WebGPUAvailable() bool

WebGPUAvailable returns true if WebGPU backend is available.

Types ¶

type BackendCapabilities ¶

type BackendCapabilities struct {
	Backend    BackendType
	Operations map[OperationType]bool
	Categories map[string]bool
}

BackendCapabilities describes what operations a backend supports.

func Capabilities ¶

func Capabilities(backend BackendType) (*BackendCapabilities, error)

Capabilities returns the capabilities for a specific backend.

func GetCapabilities ¶

func GetCapabilities(backend BackendType) (*BackendCapabilities, error)

GetCapabilities returns the capabilities for a backend. This probes the backend to determine which operations are supported.

func (*BackendCapabilities) SupportedOperations ¶

func (c *BackendCapabilities) SupportedOperations() []OperationType

SupportedOperations returns a list of supported operations.

func (*BackendCapabilities) Supports ¶

func (c *BackendCapabilities) Supports(op OperationType) bool

Supports returns true if the backend supports the operation.

func (*BackendCapabilities) SupportsCategory ¶

func (c *BackendCapabilities) SupportsCategory(cat string) bool

SupportsCategory returns true if the backend supports any operation in the category.

type BackendComparison ¶

type BackendComparison struct {
	Operation OperationType
	Results   map[BackendType]BenchmarkResult
	Fastest   BackendType
}

BackendComparison holds benchmark results across backends.

func CompareBackends ¶

func CompareBackends(op OperationType, iterations int) (*BackendComparison, error)

CompareBackends runs a quick benchmark of an operation across all backends. Returns results for comparison. If the operation isn't supported, that backend's result will have an error.

type BackendInfo ¶

type BackendInfo struct {
	Type        BackendType
	Name        string
	APIVersion  int
	DeviceCount int
}

BackendInfo provides information about an available backend.

type BackendType ¶

type BackendType int

BackendType identifies a GPU compute backend.

const (
	// BackendAuto selects the best available backend automatically.
	// Priority: CUDA > Metal > WebGPU
	BackendAuto BackendType = iota

	// BackendMetal uses Apple Metal (macOS/iOS).
	BackendMetal

	// BackendWebGPU uses WebGPU via Dawn (cross-platform).
	BackendWebGPU

	// BackendCUDA uses NVIDIA CUDA.
	BackendCUDA
)

func Backends ¶

func Backends() []BackendType

Backends returns a list of available backend types.

func MustSelectBackend ¶

func MustSelectBackend(ops ...OperationType) BackendType

MustSelectBackend returns the best backend or panics on error.

func SelectBackend ¶

func SelectBackend(ops ...OperationType) (BackendType, error)

SelectBackend returns the best backend for the given operations. If ops is empty, returns the highest priority available backend.

func SelectBestBackend ¶

func SelectBestBackend(ops []OperationType, preferPerformance bool) (BackendType, error)

SelectBestBackend returns the best available backend for a set of operations. It considers backend availability, capability support, and optionally performance.

func (BackendType) String ¶

func (b BackendType) String() string

String returns the backend name.

type BenchmarkResult ¶

type BenchmarkResult struct {
	Backend   BackendType
	Operation OperationType
	Duration  time.Duration
	Error     error
}

BenchmarkResult holds timing data for an operation.

type Candidate ¶ added in v1.1.9

type Candidate struct {
	Source     Source
	IncludeDir string
	LibDir     string
	IncludeOK  bool // true if hqc.h is readable at IncludeDir/lux/gpu/hqc.h
	LibOK      bool // true if libluxgpu_hqc.a is readable at LibDir
}

Candidate is one entry in the discovery search list.

type CryptoOps ¶

type CryptoOps interface {
	// SHA256 computes SHA-256 hashes for a batch of inputs.
	// input: [N, input_len] bytes
	// output: [N, 32] bytes
	SHA256(input, output *UntypedTensor) error

	// Keccak256 computes Keccak-256 (Ethereum hash) for a batch.
	// input: [N, input_len] bytes
	// output: [N, 32] bytes
	Keccak256(input, output *UntypedTensor) error

	// Poseidon computes Poseidon hash (ZK-friendly).
	// input: [N, field_elements] uint64
	// output: [N, 1] uint64
	Poseidon(input, output *UntypedTensor) error

	// ECDSAVerifyBatch verifies multiple ECDSA signatures in parallel.
	// messages: [N, 32] bytes (message hashes)
	// signatures: [N, 64] bytes (r || s)
	// pubkeys: [N, 33] bytes (compressed) or [N, 65] (uncompressed)
	// results: [N] uint8 (1 = valid, 0 = invalid)
	ECDSAVerifyBatch(messages, signatures, pubkeys, results *UntypedTensor) error

	// Ed25519VerifyBatch verifies multiple Ed25519 signatures.
	// messages: [N, msg_len] bytes
	// signatures: [N, 64] bytes
	// pubkeys: [N, 32] bytes
	// results: [N] uint8 (1 = valid, 0 = invalid)
	Ed25519VerifyBatch(messages, signatures, pubkeys, results *UntypedTensor) error

	// BLSVerifyBatch verifies multiple BLS signatures.
	// messages: [N, msg_len] bytes
	// signatures: [N, 96] bytes (G2 points)
	// pubkeys: [N, 48] bytes (G1 points)
	// results: [N] uint8 (1 = valid, 0 = invalid)
	BLSVerifyBatch(messages, signatures, pubkeys, results *UntypedTensor) error

	// BLSAggregate aggregates multiple BLS signatures into one.
	// signatures: [N, 96] bytes
	// aggregated: [96] bytes
	BLSAggregate(signatures, aggregated *UntypedTensor) error

	// MerkleRoot computes Merkle root from leaves.
	// leaves: [N, 32] bytes (N must be power of 2)
	// root: [32] bytes
	MerkleRoot(leaves, root *UntypedTensor) error

	// MerkleBatch computes multiple Merkle roots in parallel.
	// leavesSet: [M, N, 32] bytes
	// roots: [M, 32] bytes
	MerkleBatch(leavesSet, roots *UntypedTensor) error

	// MerkleProof generates Merkle proof for a leaf.
	// leaves: [N, 32] bytes
	// leafIndex: index of the leaf
	// proof: [log2(N), 32] bytes
	MerkleProof(leaves *UntypedTensor, leafIndex int, proof *UntypedTensor) error
}

CryptoOps provides GPU-accelerated cryptographic operations.

type DEXOps ¶

type DEXOps interface {
	// ConstantProductSwap computes AMM swap output using x*y=k formula.
	// reserveX: [N] uint64 - X token reserves
	// reserveY: [N] uint64 - Y token reserves
	// amountIn: [N] uint64 - input amounts
	// xToY: true for X→Y swap, false for Y→X
	// amountOut: [N] uint64 - output amounts
	// fee: fee percentage (e.g., 0.003 for 0.3%)
	ConstantProductSwap(reserveX, reserveY, amountIn *UntypedTensor, xToY bool, amountOut *UntypedTensor, fee float32) error

	// ConstantProductSwapBatch processes multiple swaps.
	// reserves: [M, 2] uint64 (reserveX, reserveY per pool)
	// swaps: [N, 3] uint64 (poolIndex, amountIn, direction)
	// amounts: [N] uint64 output amounts
	ConstantProductSwapBatch(reserves, swaps, amounts *UntypedTensor, fee float32) error

	// ComputeTWAP computes time-weighted average price.
	// prices: [N] uint64 - historical prices
	// timestamps: [N] uint64 - timestamps
	// start, end: time range
	// twap: [1] uint64 output
	ComputeTWAP(prices, timestamps *UntypedTensor, start, end uint64, twap *UntypedTensor) error

	// MatchOrders matches bid/ask orders.
	// bids: [N, 3] uint64 (price, quantity, orderId)
	// asks: [M, 3] uint64 (price, quantity, orderId)
	// matches: output (bidId, askId, quantity, price)
	// prices: fill prices
	// amounts: fill amounts
	MatchOrders(bids, asks, matches, prices, amounts *UntypedTensor) error

	// MatchOrdersWithPriority matches orders with time/price priority.
	// bids: [N, 4] uint64 (price, quantity, orderId, timestamp)
	// asks: [M, 4] uint64
	MatchOrdersWithPriority(bids, asks, matches *UntypedTensor) error

	// ComputeLiquidity computes concentrated liquidity positions (Uniswap V3 style).
	// tickLower: [N] int32 - lower tick
	// tickUpper: [N] int32 - upper tick
	// amounts: [N, 2] uint64 (amount0, amount1)
	// liquidity: [N] uint128 output
	ComputeLiquidity(tickLower, tickUpper, amounts, liquidity *UntypedTensor) error

	// ComputePositionValue computes position value at current price.
	// liquidity: [N] uint128
	// tickLower: [N] int32
	// tickUpper: [N] int32
	// currentTick: current price tick
	// values: [N, 2] uint64 (token0, token1)
	ComputePositionValue(liquidity, tickLower, tickUpper *UntypedTensor, currentTick int32, values *UntypedTensor) error

	// CalculateFees computes accumulated fees for positions.
	// liquidity: [N] uint128
	// feeGrowthInside0: [N] uint256
	// feeGrowthInside1: [N] uint256
	// fees: [N, 2] uint64 output
	CalculateFees(liquidity, feeGrowthInside0, feeGrowthInside1, fees *UntypedTensor) error

	// BatchSettlement settles multiple trades atomically.
	// trades: [N, 4] uint64 (buyer, seller, token, amount)
	// balances: [M, T] uint64 (M users, T tokens)
	// newBalances: output
	BatchSettlement(trades, balances, newBalances *UntypedTensor) error
}

DEXOps provides GPU-accelerated DEX (decentralized exchange) operations.

type DType ¶

type DType int

DType represents a tensor data type.

const (
	Float32 DType = iota
	Float16
	Float64
	Int32
	Int64
	Uint8
	Uint32
	Uint64
)

func DTypeOf ¶

func DTypeOf[T TensorElement]() DType

DTypeOf returns the DType for a Go type.

func (DType) Size ¶

func (d DType) Size() int

Size returns the byte size of a single element.

func (DType) String ¶

func (d DType) String() string

String returns the dtype name.

type DeviceCaps ¶

type DeviceCaps uint32

DeviceCaps represents device capability flags.

const (
	CapFP16         DeviceCaps = 1 << iota // Half-precision float support
	CapFP64                                // Double precision support
	CapSubgroups                           // Subgroup/warp operations
	CapInt64Atomics                        // 64-bit atomic operations
)

func (DeviceCaps) Has ¶

func (c DeviceCaps) Has(cap DeviceCaps) bool

Has returns true if the device has the specified capability.

type DeviceInfo ¶

type DeviceInfo struct {
	Backend          BackendType
	Index            int
	Name             string
	Vendor           string
	IsDiscrete       bool
	IsUnifiedMemory  bool
	TotalMemory      uint64 // bytes
	MaxBufferSize    uint64 // bytes
	MaxWorkgroupSize uint32
	SIMDWidth        uint32
	Capabilities     DeviceCaps
}

DeviceInfo contains information about a compute device.

func Devices ¶

func Devices() []DeviceInfo

Devices returns information about all available devices across all backends.

func (*DeviceInfo) MemoryGB ¶

func (d *DeviceInfo) MemoryGB() float64

MemoryGB returns total memory in gigabytes.

type Error ¶

type Error struct {
	Op      string // Operation that failed
	Backend BackendType
	Err     error
	Detail  string // Additional detail from C library
}

Error wraps an error with additional context from the C library.

func (*Error) Error ¶

func (e *Error) Error() string

func (*Error) Unwrap ¶

func (e *Error) Unwrap() error

type FHEOps ¶

type FHEOps interface {
	// BFVEncrypt encrypts plaintext with BFV scheme.
	// plaintext: [N] int64 values (N ≤ poly_modulus_degree)
	// pk: public key
	// ciphertext: output ciphertext
	BFVEncrypt(plaintext, pk, ciphertext *UntypedTensor) error

	// BFVEncryptBatch encrypts multiple plaintexts.
	// plaintexts: [M, N] int64
	// pk: public key
	// ciphertexts: [M, ...] output
	BFVEncryptBatch(plaintexts, pk, ciphertexts *UntypedTensor) error

	// BFVDecrypt decrypts ciphertext.
	// ciphertext: input ciphertext
	// sk: secret key
	// plaintext: [N] int64 output
	BFVDecrypt(ciphertext, sk, plaintext *UntypedTensor) error

	// BFVAdd adds two ciphertexts.
	// ct1, ct2: input ciphertexts
	// result: output ciphertext
	BFVAdd(ct1, ct2, result *UntypedTensor) error

	// BFVMultiply multiplies ciphertexts with relinearization.
	// ct1, ct2: input ciphertexts
	// relinKey: relinearization key
	// result: output ciphertext
	BFVMultiply(ct1, ct2, relinKey, result *UntypedTensor) error

	// BFVMultiplyPlain multiplies ciphertext by plaintext.
	// ct: input ciphertext
	// plain: [N] int64 plaintext
	// result: output ciphertext
	BFVMultiplyPlain(ct, plain, result *UntypedTensor) error

	// BFVRotate rotates ciphertext slots.
	// ct: input ciphertext
	// galoisKey: Galois key for rotation
	// steps: rotation amount (positive = left)
	// result: output ciphertext
	BFVRotate(ct, galoisKey *UntypedTensor, steps int, result *UntypedTensor) error

	// CKKSEncrypt encrypts with CKKS (approximate arithmetic).
	// plaintext: [N] float64 values
	// pk: public key
	// scale: encoding scale
	// ciphertext: output
	CKKSEncrypt(plaintext, pk *UntypedTensor, scale float64, ciphertext *UntypedTensor) error

	// CKKSDecrypt decrypts CKKS ciphertext.
	// ciphertext: input
	// sk: secret key
	// plaintext: [N] float64 output
	CKKSDecrypt(ciphertext, sk, plaintext *UntypedTensor) error

	// CKKSAdd adds two CKKS ciphertexts.
	CKKSAdd(ct1, ct2, result *UntypedTensor) error

	// CKKSMultiply multiplies CKKS ciphertexts.
	CKKSMultiply(ct1, ct2, relinKey, result *UntypedTensor) error

	// CKKSRescale rescales ciphertext after multiplication.
	CKKSRescale(ct, result *UntypedTensor) error

	// CKKSRotate rotates CKKS slots.
	CKKSRotate(ct, galoisKey *UntypedTensor, steps int, result *UntypedTensor) error

	// Bootstrap refreshes ciphertext noise level (limited support).
	Bootstrap(ct, bootstrapKey, result *UntypedTensor) error
}

FHEOps provides GPU-accelerated fully homomorphic encryption operations. Supports BFV (exact arithmetic) and CKKS (approximate arithmetic) schemes.

type LatticeOps ¶

type LatticeOps interface {
	// KyberKeyGen generates Kyber (ML-KEM) key pair.
	// pk: [1184] bytes (Kyber768 public key)
	// sk: [2400] bytes (Kyber768 secret key)
	KyberKeyGen(pk, sk *UntypedTensor) error

	// KyberKeyGenBatch generates multiple key pairs in parallel.
	// pk: [N, 1184] bytes
	// sk: [N, 2400] bytes
	KyberKeyGenBatch(pk, sk *UntypedTensor) error

	// KyberEncaps encapsulates shared secret.
	// pk: [1184] bytes public key
	// ct: [1088] bytes ciphertext output
	// ss: [32] bytes shared secret output
	KyberEncaps(pk, ct, ss *UntypedTensor) error

	// KyberEncapsBatch performs batch encapsulation.
	// pk: [N, 1184] bytes
	// ct: [N, 1088] bytes
	// ss: [N, 32] bytes
	KyberEncapsBatch(pk, ct, ss *UntypedTensor) error

	// KyberDecaps decapsulates shared secret.
	// ct: [1088] bytes ciphertext
	// sk: [2400] bytes secret key
	// ss: [32] bytes shared secret output
	KyberDecaps(ct, sk, ss *UntypedTensor) error

	// KyberDecapsBatch performs batch decapsulation.
	// ct: [N, 1088] bytes
	// sk: [N, 2400] bytes
	// ss: [N, 32] bytes
	KyberDecapsBatch(ct, sk, ss *UntypedTensor) error

	// DilithiumKeyGen generates Dilithium (ML-DSA) key pair.
	// pk: [1952] bytes (Dilithium3 public key)
	// sk: [4016] bytes (Dilithium3 secret key)
	DilithiumKeyGen(pk, sk *UntypedTensor) error

	// DilithiumSign signs a message.
	// msg: [msg_len] bytes message
	// sk: [4016] bytes secret key
	// sig: [3293] bytes signature output
	DilithiumSign(msg, sk, sig *UntypedTensor) error

	// DilithiumSignBatch signs multiple messages in parallel.
	// msgs: [N, msg_len] bytes
	// sk: [4016] bytes (same key for all)
	// sigs: [N, 3293] bytes
	DilithiumSignBatch(msgs, sk, sigs *UntypedTensor) error

	// DilithiumVerify verifies a signature.
	// msg: [msg_len] bytes
	// sig: [3293] bytes
	// pk: [1952] bytes
	// Returns true if valid.
	DilithiumVerify(msg, sig, pk *UntypedTensor) (bool, error)

	// DilithiumVerifyBatch verifies multiple signatures.
	// msgs: [N, msg_len] bytes
	// sigs: [N, 3293] bytes
	// pks: [N, 1952] bytes
	// results: [N] uint8 (1 = valid, 0 = invalid)
	DilithiumVerifyBatch(msgs, sigs, pks, results *UntypedTensor) error

	// MLDSAVerifyBatch verifies a batch of ML-DSA / Dilithium signatures at the
	// given FIPS 204 NIST level. Unlike DilithiumVerifyBatch (which is pinned to
	// ML-DSA-65 / Dilithium3 for backwards compatibility), this entry point
	// accepts mode in {2, 3, 5} for ML-DSA-44, ML-DSA-65, ML-DSA-87 respectively.
	//
	// Tensor shapes (n = batch size, per FIPS 204):
	//   ML-DSA-44 : pk=1312  sig=2420
	//   ML-DSA-65 : pk=1952  sig=3309
	//   ML-DSA-87 : pk=2592  sig=4627
	//
	// msgs    : LUX_DTYPE_U8, shape [n, msg_width] (zero-padded right)
	// sigs    : LUX_DTYPE_U8, shape [n, sig_bytes]
	// pks     : LUX_DTYPE_U8, shape [n, pk_bytes]
	// results : LUX_DTYPE_U8, shape [n] (1 = valid, 0 = invalid)
	//
	// FIPS 204 verify is deterministic, so GPU and CPU paths produce
	// byte-identical accept/reject decisions per element. The results vector is
	// dense (no early abort) so callers can audit per-signer outcomes.
	MLDSAVerifyBatch(mode int, msgs, sigs, pks, results *UntypedTensor) error

	// MLDSASignBatch signs a batch of messages with ML-DSA / Dilithium at the
	// given FIPS 204 NIST level. mode in {2, 3, 5}.
	//
	// Sizes (FIPS 204):
	//   ML-DSA-44 : sk=2560  sig=2420
	//   ML-DSA-65 : sk=4032  sig=3309
	//   ML-DSA-87 : sk=4896  sig=4627
	//
	// msgs : [n, msg_width] bytes (zero-padded right)
	// sks  : [n, sk_bytes]  bytes
	// sigs : [n, sig_bytes] bytes
	//
	// ML-DSA signing is deterministic in hedged mode (per FIPS 204 §3.4) when
	// the deterministic flag is set; the GPU path must select the same hedging
	// mode as the caller-side CPU reference to remain byte-equal for KAT.
	MLDSASignBatch(mode int, msgs, sks, sigs *UntypedTensor) error

	// SLHDSASignBatch signs a batch of messages with SLH-DSA / Magnetar (FIPS 205).
	// mode encodes the parameter set:
	//   2  = SHA2-128f, 3  = SHA2-192f, 5  = SHA2-256f
	//   12 = SHAKE-128f, 13 = SHAKE-192f, 15 = SHAKE-256f
	// msgs: [N, msg_width] bytes (zero-padded right)
	// sks:  [N, sk_bytes]  bytes (per-mode: 64 / 96 / 128)
	// sigs: [N, sig_bytes] bytes (per-mode: 17088 / 35664 / 49856 for 'f')
	SLHDSASignBatch(mode int, msgs, sks, sigs *UntypedTensor) error

	// SLHDSAVerifyBatch verifies a batch of SLH-DSA / Magnetar (FIPS 205)
	// signatures. mode encoding as for SLHDSASignBatch. Results vector is
	// dense (no early abort) so callers can audit per-signer outcomes.
	// msgs:    [N, msg_width] bytes
	// sigs:    [N, sig_bytes] bytes
	// pks:     [N, pk_bytes]  bytes (per-mode: 32 / 48 / 64)
	// results: [N] uint8 (1 = valid, 0 = invalid)
	SLHDSAVerifyBatch(mode int, msgs, sigs, pks, results *UntypedTensor) error

	// PolynomialNTT performs NTT in lattice polynomial ring.
	// Operates on polynomials in Z_q[X]/(X^256 + 1).
	PolynomialNTT(input, output *UntypedTensor, q uint32) error

	// PolynomialINTT performs inverse NTT.
	PolynomialINTT(input, output *UntypedTensor, q uint32) error

	// PolynomialMul multiplies polynomials in NTT domain.
	PolynomialMul(a, b, c *UntypedTensor, q uint32) error

	// PolynomialAdd adds polynomials.
	PolynomialAdd(a, b, c *UntypedTensor, q uint32) error

	// LatticeNTTMLDSABatch performs the in-place forward (inverse=false) or
	// inverse (inverse=true) NTT over Z_q[X]/(X^256 + 1) with q = 8380417 (the
	// ML-DSA / FIPS 204 prime), batched across N polynomials.
	//
	// Tensor shape:
	//   polys : LUX_DTYPE_I32, shape [N, 256] — in-place transform
	//
	// Byte-equal to PQCLEAN_MLDSA65_CLEAN_ntt (forward) and
	// PQCLEAN_MLDSA65_CLEAN_invntt_tomont (inverse). Pulsar Round-2 fits the
	// batch=22 dispatch shape; below the GPU-dispatch threshold the caller
	// MUST route to the per-poly CPU oracle (the threshold gating lives in
	// the consumer, not in this primitive).
	LatticeNTTMLDSABatch(polys *UntypedTensor, inverse bool) error
}

LatticeOps provides GPU-accelerated lattice-based cryptography operations. Implements NIST post-quantum standards: ML-KEM (Kyber) and ML-DSA (Dilithium).

type MLOps ¶

type MLOps interface {
	// MatMul performs matrix multiplication: C = A @ B
	MatMul(a, b, c *UntypedTensor) error

	// MatMulTranspose performs C = A @ B^T or C = A^T @ B
	MatMulTranspose(a, b, c *UntypedTensor, transposeA, transposeB bool) error

	// ReLU applies rectified linear unit: y = max(0, x)
	ReLU(input, output *UntypedTensor) error

	// GELU applies Gaussian error linear unit activation
	GELU(input, output *UntypedTensor) error

	// Softmax applies softmax along an axis
	Softmax(input, output *UntypedTensor, axis int) error

	// LayerNorm applies layer normalization
	LayerNorm(input, gamma, beta, output *UntypedTensor, eps float32) error

	// Attention computes scaled dot-product attention
	// output = softmax(Q @ K^T / scale) @ V
	Attention(q, k, v, output *UntypedTensor, scale float32) error

	// Conv2D performs 2D convolution
	Conv2D(input, kernel, output *UntypedTensor, stride, padding [2]int) error

	// MaxPool2D performs 2D max pooling
	MaxPool2D(input, output *UntypedTensor, kernelSize, stride [2]int) error

	// BatchNorm applies batch normalization
	BatchNorm(input, gamma, beta, mean, variance, output *UntypedTensor, eps float32) error

	// Dropout applies dropout with given probability (inference mode)
	Dropout(input, output *UntypedTensor, p float32) error

	// Add performs element-wise addition
	Add(a, b, c *UntypedTensor) error

	// Multiply performs element-wise multiplication
	Multiply(a, b, c *UntypedTensor) error

	// Sum reduces tensor along specified axes
	Sum(input, output *UntypedTensor, axes []int) error

	// Mean reduces tensor along specified axes
	Mean(input, output *UntypedTensor, axes []int) error
}

MLOps provides GPU-accelerated machine learning operations.

type OperationType ¶

type OperationType int

OperationType identifies a type of compute operation.

const (
	// ML Operations
	OpMatMul OperationType = iota
	OpReLU
	OpGELU
	OpSoftmax
	OpLayerNorm
	OpAttention

	// Crypto Operations
	OpSHA256
	OpKeccak256
	OpPoseidon
	OpECDSAVerify
	OpEd25519Verify
	OpBLSVerify
	OpMerkleRoot

	// ZK Operations
	OpNTT
	OpINTT
	OpMSM
	OpPolyMul

	// FHE Operations
	OpBFVEncrypt
	OpBFVDecrypt
	OpBFVAdd
	OpBFVMul

	// Lattice Operations
	OpKyberKeyGen
	OpKyberEncaps
	OpKyberDecaps
	OpDilithiumSign
	OpDilithiumVerify

	// DEX Operations
	OpConstantProductSwap
	OpTWAP
	OpOrderMatch
)

func (OperationType) Category ¶

func (o OperationType) Category() string

Category returns the operation category.

func (OperationType) String ¶

func (o OperationType) String() string

String returns the operation name.

type PathReport ¶ added in v1.1.9

type PathReport struct {
	// IncludeDir is the path that contains `lux/gpu/hqc.h`.
	IncludeDir string

	// LibDir is the path that contains `libluxgpu_hqc.a` (or `.so`
	// / `.dylib` for shared installs).
	LibDir string

	// Library is the static library file the linker would use.
	// Empty when SourceMissing.
	Library string

	// Source tags which prefix in the fallback chain resolved.
	Source Source

	// Candidates lists every prefix that was probed, in order. The
	// first entry whose include + lib are both present is the
	// resolved one; callers can show this slice in diagnostics to
	// explain why a particular install was chosen.
	Candidates []Candidate
}

PathReport names the resolved GPU substrate location.

func GPUPaths ¶ added in v1.1.9

func GPUPaths() PathReport

GPUPaths is a package-level convenience that calls Provenance{}.GPUPaths().

type Priority ¶ added in v1.0.9

type Priority int

Priority assigns relative GPU scheduling weight to a VMSession. Higher value runs first when multiple sessions contend for the queue.

const (
	PriorityLow      Priority = 1
	PriorityNormal   Priority = 5
	PriorityHigh     Priority = 10
	PriorityCritical Priority = 100
)

type Provenance ¶ added in v1.1.9

type Provenance struct{}

Provenance reports how the GPU substrate (libluxgpu*.a and the lux/gpu/*.h headers) was located at build / runtime. It exists for debugging "which install am I using" questions — the accel CGO directives in ops/code/code_cpu.go enumerate every standard prefix, so the actual compiler search resolves silently. This type lets callers see what the same search WOULD find if they re-ran it.

func (Provenance) GPUPaths ¶ added in v1.1.9

func (Provenance) GPUPaths() PathReport

GPUPaths returns the resolved GPU substrate location for this host. It probes every fallback prefix the cgo build also probes, in the same priority order, and returns the first one that has BOTH the header and the static library.

Returns a PathReport with Source = SourceMissing (and Candidates populated for diagnostics) when no install can be found. Callers that need a hard error can check Library == "" or Source == SourceMissing.

type Session ¶

type Session struct {
	// contains filtered or unexported fields
}

Session manages a GPU acceleration context. All tensor operations must use tensors created from the same session. Session is safe for concurrent use.

func DefaultSession ¶

func DefaultSession() (*Session, error)

DefaultSession returns a lazily initialized default session. It uses the best available backend (Metal on macOS, CUDA on Linux).

func NewSession ¶

func NewSession(opts ...SessionOption) (*Session, error)

NewSession creates a new acceleration session with auto-detected best backend.

func NewSessionWithBackend ¶

func NewSessionWithBackend(backend BackendType, opts ...SessionOption) (*Session, error)

NewSessionWithBackend creates a session using a specific backend.

func NewSessionWithDevice ¶

func NewSessionWithDevice(backend BackendType, deviceIndex int, opts ...SessionOption) (*Session, error)

NewSessionWithDevice creates a session using a specific device.

func (*Session) Backend ¶

func (s *Session) Backend() BackendType

Backend returns the backend type for this session.

func (*Session) Close ¶

func (s *Session) Close() error

Close releases all session resources.

func (*Session) Crypto ¶

func (s *Session) Crypto() CryptoOps

Crypto returns the cryptographic operations interface.

func (*Session) DEX ¶

func (s *Session) DEX() DEXOps

DEX returns the decentralized exchange operations interface.

func (*Session) DeviceInfo ¶

func (s *Session) DeviceInfo() DeviceInfo

DeviceInfo returns information about the session's device.

func (*Session) FHE ¶

func (s *Session) FHE() FHEOps

FHE returns the fully homomorphic encryption operations interface.

func (*Session) IsClosed ¶

func (s *Session) IsClosed() bool

IsClosed returns true if the session has been closed.

func (*Session) Lattice ¶

func (s *Session) Lattice() LatticeOps

Lattice returns the lattice cryptography operations interface.

func (*Session) ML ¶

func (s *Session) ML() MLOps

ML returns the ML operations interface.

func (*Session) Sync ¶

func (s *Session) Sync() error

Sync waits for all pending operations to complete.

func (*Session) SyncContext ¶

func (s *Session) SyncContext(ctx context.Context) error

SyncContext waits for pending operations with context cancellation.

func (*Session) ZK ¶

func (s *Session) ZK() ZKOps

ZK returns the zero-knowledge proof operations interface.

type SessionOption ¶

type SessionOption func(*sessionConfig)

SessionOption configures session creation.

func WithAsync ¶

func WithAsync(async bool) SessionOption

WithAsync enables asynchronous operation mode.

func WithBackend ¶

func WithBackend(b BackendType) SessionOption

WithBackend specifies the backend to use.

func WithDevice ¶

func WithDevice(index int) SessionOption

WithDevice specifies the device index within the backend.

type Source ¶ added in v1.1.9

type Source string

Source identifies how a path was resolved.

const (
	// SourceEnv: explicit LUX_GPU_PREFIX (or back-compat LUX_MLX_PREFIX)
	// environment variable.
	SourceEnv Source = "env-prefix"

	// SourceCgoEnv: CGO_CFLAGS / CGO_LDFLAGS supplied at build time.
	// Only detectable via the live environment; reported when the
	// variables are non-empty at the call time of GPUPaths().
	SourceCgoEnv Source = "cgo-env"

	// SourcePkgConfig: pkg-config reports a `lux-gpu` package.
	// Detected by running `pkg-config --variable=prefix lux-gpu`.
	SourcePkgConfig Source = "pkg-config"

	// SourceHomebrewARM: /opt/homebrew (Apple Silicon).
	SourceHomebrewARM Source = "homebrew-arm"

	// SourceHomebrewKeg: /opt/homebrew/opt/lux-gpu (keg-only formula).
	SourceHomebrewKeg Source = "homebrew-keg"

	// SourceHomebrewIntel: /usr/local/opt/lux-gpu (Intel Mac Homebrew).
	SourceHomebrewIntel Source = "homebrew-intel"

	// SourceSystem: /usr/local install (canonical POSIX).
	SourceSystem Source = "system"

	// SourceLuxPrefix: /opt/lux install (Lux canonical prefix).
	SourceLuxPrefix Source = "lux-prefix"

	// SourceModuleRelative: ${SRCDIR}/../../../mlx/{include,build} —
	// in-tree dev fallback, only valid when accel is in a Go workspace
	// next to luxfi/mlx (NOT when accel is in the Go module cache).
	SourceModuleRelative Source = "module-relative"

	// SourceMissing: no candidate prefix on this host has the headers
	// or the library. cgo builds will fail to link unless the caller
	// provides CGO_CFLAGS/CGO_LDFLAGS at build time.
	SourceMissing Source = "missing"
)

type Tensor ¶

type Tensor[T TensorElement] struct {
	// contains filtered or unexported fields
}

Tensor represents a multi-dimensional array on GPU memory. Tensor is not safe for concurrent modification but safe for concurrent reads.

func NewTensor ¶

func NewTensor[T TensorElement](s *Session, shape []int) (*Tensor[T], error)

NewTensor creates a new tensor with the given shape.

func NewTensorWithData ¶

func NewTensorWithData[T TensorElement](s *Session, shape []int, data []T) (*Tensor[T], error)

NewTensorWithData creates a tensor initialized with data from a slice.

func (*Tensor[T]) Bytes ¶

func (t *Tensor[T]) Bytes() int

Bytes returns the total byte size.

func (*Tensor[T]) Close ¶

func (t *Tensor[T]) Close()

Close releases tensor resources.

func (*Tensor[T]) DType ¶

func (t *Tensor[T]) DType() DType

DType returns the element data type.

func (*Tensor[T]) FromSlice ¶

func (t *Tensor[T]) FromSlice(src []T) error

FromSlice copies data from a Go slice to the tensor.

func (*Tensor[T]) NDim ¶

func (t *Tensor[T]) NDim() int

NDim returns the number of dimensions.

func (*Tensor[T]) NumEl ¶

func (t *Tensor[T]) NumEl() int

NumEl returns the total number of elements.

func (*Tensor[T]) Shape ¶

func (t *Tensor[T]) Shape() []int

Shape returns a copy of the tensor shape.

func (*Tensor[T]) ToSlice ¶

func (t *Tensor[T]) ToSlice() ([]T, error)

ToSlice copies tensor data to a Go slice.

func (*Tensor[T]) Untyped ¶

func (t *Tensor[T]) Untyped() *UntypedTensor

Untyped returns an untyped view of the tensor for passing to ops.

type TensorElement ¶

type TensorElement interface {
	float32 | float64 | int32 | int64 | uint8 | uint32 | uint64
}

TensorElement is a type constraint for tensor element types.

type UntypedTensor ¶

type UntypedTensor struct {
	// contains filtered or unexported fields
}

UntypedTensor provides type-erased tensor operations. Used internally and for dynamic typing scenarios.

func (*UntypedTensor) Bytes ¶

func (t *UntypedTensor) Bytes() int

Bytes returns the total byte size.

func (*UntypedTensor) DType ¶

func (t *UntypedTensor) DType() DType

DType returns the element data type.

func (*UntypedTensor) Handle ¶

func (t *UntypedTensor) Handle() uintptr

Handle returns the raw tensor handle for CGO operations.

func (*UntypedTensor) NDim ¶

func (t *UntypedTensor) NDim() int

NDim returns the number of dimensions.

func (*UntypedTensor) NumEl ¶

func (t *UntypedTensor) NumEl() int

NumEl returns the total number of elements.

func (*UntypedTensor) Shape ¶

func (t *UntypedTensor) Shape() []int

Shape returns a copy of the tensor shape.

type VMSession ¶ added in v1.0.9

type VMSession struct {
	// contains filtered or unexported fields
}

VMSession is a per-VM GPU session providing:

Isolation: closing one VM session does not affect others.
Ordering: ops submitted to a session complete in submission order.
Priority: sessions with higher Priority preempt the global queue.
Memory budget: optional cap on cumulative allocations.

VMSession is safe for concurrent use; submissions from many goroutines on the same session are serialized in FIFO order.

func NewVMSession ¶ added in v1.0.9

func NewVMSession(vmID string, opts ...VMSessionOption) (*VMSession, error)

NewVMSession creates an isolated VM session. vmID must be non-empty and is used in error messages and metrics.

func (*VMSession) Close ¶ added in v1.0.9

func (v *VMSession) Close() error

Close releases this session. Safe to call multiple times. Closing a session does NOT affect other VM sessions, even when WithSharedDevice was used (the shared default Session is reference- counted and only torn down by accel.Shutdown).

func (*VMSession) ID ¶ added in v1.0.9

func (v *VMSession) ID() string

ID returns the VM identifier.

func (*VMSession) IsAvailable ¶ added in v1.0.9

func (v *VMSession) IsAvailable() bool

IsAvailable reports whether the session is backed by a real GPU session.

func (*VMSession) IsClosed ¶ added in v1.0.9

func (v *VMSession) IsClosed() bool

IsClosed reports whether Close() has been called.

func (*VMSession) MemoryUsed ¶ added in v1.0.9

func (v *VMSession) MemoryUsed() int64

MemoryUsed returns the current cumulative allocation count, in bytes.

func (*VMSession) Priority ¶ added in v1.0.9

func (v *VMSession) Priority() Priority

Priority returns the dispatch priority.

func (*VMSession) Session ¶ added in v1.0.9

func (v *VMSession) Session() *Session

Session returns the underlying GPU session, or nil if no backend was available. Callers should check IsAvailable() before dereferencing.

func (*VMSession) Stats ¶ added in v1.0.9

func (v *VMSession) Stats() (uint64, uint64, uint64)

Stats returns dispatch counters: (dispatched, completed, failed).

func (*VMSession) Submit ¶ added in v1.0.9

func (v *VMSession) Submit(ctx context.Context, f func(*Session) error) error

Submit serializes f under the session's FIFO queue. It returns ErrSessionClosed if the session has been closed. Within a VMSession, concurrent Submit calls execute in arrival order (Go's sync.Mutex guarantees FIFO acquisition under contention via the runtime's starvation-prevention handoff).

func (*VMSession) Sync ¶ added in v1.0.9

func (v *VMSession) Sync() error

Sync blocks until all pending ops on this session complete.

type VMSessionOption ¶ added in v1.0.9

type VMSessionOption func(*vmSessionConfig)

VMSessionOption configures a per-VM session.

func WithMemoryBudget ¶ added in v1.0.9

func WithMemoryBudget(bytes int64) VMSessionOption

WithMemoryBudget caps the GPU memory this session may allocate (bytes). 0 means unlimited (default).

func WithPriority ¶ added in v1.0.9

func WithPriority(p Priority) VMSessionOption

WithPriority sets the dispatch priority for the VM session.

func WithQueueDepth ¶ added in v1.0.9

func WithQueueDepth(n int) VMSessionOption

WithQueueDepth sets the in-flight op queue depth (default 1024).

func WithSharedDevice ¶ added in v1.0.9

func WithSharedDevice() VMSessionOption

WithSharedDevice routes the VM session through the process default Session rather than allocating a new device-side session. Use when the underlying driver doesn't support multiple sessions (or to avoid CUDA context churn).

func WithVMBackend ¶ added in v1.0.9

func WithVMBackend(b BackendType) VMSessionOption

WithVMBackend pins the VM session to a specific backend.

type ZKOps ¶

type ZKOps interface {
	// NTT performs Number Theoretic Transform.
	// input: [N] uint64 coefficients
	// output: [N] uint64 NTT values
	// roots: [N] uint64 roots of unity
	// modulus: prime modulus
	NTT(input, output, roots *UntypedTensor, modulus uint64) error

	// INTT performs inverse NTT.
	// input: [N] uint64 NTT values
	// output: [N] uint64 coefficients
	// invRoots: [N] uint64 inverse roots of unity
	// modulus: prime modulus
	INTT(input, output, invRoots *UntypedTensor, modulus uint64) error

	// MSM performs multi-scalar multiplication on elliptic curves.
	// scalars: [N, scalar_size] bytes
	// bases: [N, point_size] bytes (affine points)
	// result: [point_size] bytes
	MSM(scalars, bases, result *UntypedTensor) error

	// MSMBatch performs multiple MSMs in parallel.
	// scalars: [M, N, scalar_size] bytes
	// bases: [M, N, point_size] bytes
	// results: [M, point_size] bytes
	MSMBatch(scalars, bases, results *UntypedTensor) error

	// PolyMul multiplies polynomials in coefficient form.
	// a: [N] uint64 coefficients
	// b: [N] uint64 coefficients
	// c: [2N-1] uint64 result coefficients
	// modulus: prime modulus
	PolyMul(a, b, c *UntypedTensor, modulus uint64) error

	// PolyEval evaluates polynomial at given points.
	// coeffs: [degree+1] uint64
	// points: [N] uint64
	// results: [N] uint64
	// modulus: prime modulus
	PolyEval(coeffs, points, results *UntypedTensor, modulus uint64) error

	// CommitPoly computes polynomial commitment (KZG).
	// coeffs: [degree+1, field_size] bytes
	// srs: structured reference string
	// commitment: [point_size] bytes
	CommitPoly(coeffs, srs, commitment *UntypedTensor) error

	// FFT performs Fast Fourier Transform (complex).
	// input: [N, 2] float32 (real, imag)
	// output: [N, 2] float32
	FFT(input, output *UntypedTensor) error

	// IFFT performs inverse FFT.
	IFFT(input, output *UntypedTensor) error

	// FieldAdd adds field elements.
	// a: [N] uint64
	// b: [N] uint64
	// c: [N] uint64
	// modulus: prime modulus
	FieldAdd(a, b, c *UntypedTensor, modulus uint64) error

	// FieldMul multiplies field elements.
	FieldMul(a, b, c *UntypedTensor, modulus uint64) error

	// FieldInv computes modular inverse.
	FieldInv(a, b *UntypedTensor, modulus uint64) error
}

ZKOps provides GPU-accelerated zero-knowledge proof operations.

Directories ¶

Path	Synopsis
hqc Package hqc is the public accel surface for HQC (Hamming Quasi-Cyclic) post-quantum KEM operations.	Package hqc is the public accel surface for HQC (Hamming Quasi-Cyclic) post-quantum KEM operations.
internal
capi Package capi provides CGO bindings to the lux-accel C library.	Package capi provides CGO bindings to the lux-accel C library.
ops
code Package code provides GPU-accelerated code-based cryptography operations.	Package code provides GPU-accelerated code-based cryptography operations.
consensus Package consensus provides GPU-accelerated consensus primitives.	Package consensus provides GPU-accelerated consensus primitives.
crypto Package crypto provides GPU-accelerated cryptographic operations.	Package crypto provides GPU-accelerated cryptographic operations.
dex Package dex provides GPU-accelerated DEX operations.	Package dex provides GPU-accelerated DEX operations.
fhe Package fhe provides GPU-accelerated Fully Homomorphic Encryption operations.	Package fhe provides GPU-accelerated Fully Homomorphic Encryption operations.
lattice Package lattice provides GPU-accelerated lattice cryptography operations.	Package lattice provides GPU-accelerated lattice cryptography operations.
zk Package zk provides GPU-accelerated zero-knowledge proof operations.	Package zk provides GPU-accelerated zero-knowledge proof operations.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL