Documentation
¶
Overview ¶
Package accel provides GPU-accelerated operations for blockchain and ML workloads.
The package supports multiple GPU backends (Metal, WebGPU, CUDA) via runtime plugin discovery. When built without CGO or when no backends are available, operations return ErrNoBackends.
Architecture ¶
accel wraps the lux-accel C++ library which provides:
- ML operations: matmul, attention, convolution, normalization
- Crypto operations: batch signature verification, hashing, Merkle trees
- ZK operations: NTT, MSM, polynomial arithmetic
- Lattice crypto: Kyber, Dilithium post-quantum operations
- FHE operations: BFV/CKKS homomorphic encryption
- DEX operations: AMM swaps, TWAP, order matching
Backend Selection ¶
Backends are automatically detected and selected in this priority order:
- CUDA (NVIDIA GPUs)
- Metal (Apple Silicon)
- WebGPU (cross-platform fallback)
You can override with environment variable LUX_BACKEND or via API:
session, _ := accel.NewSessionWithBackend(accel.BackendMetal)
Runtime Backend Selection ¶
For intelligent backend selection based on required operations:
// Select best backend for ZK operations
backend, _ := accel.SelectBackend(accel.OpNTT, accel.OpMSM)
session, _ := accel.NewSessionWithBackend(backend)
// Query capabilities
caps, _ := accel.Capabilities(accel.BackendWebGPU)
if caps.Supports(accel.OpMSM) {
// Use MSM on WebGPU
}
// Compare backends for an operation
comparison, _ := accel.CompareBackends(accel.OpNTT, 10)
fmt.Printf("Fastest backend for NTT: %s\n", comparison.Fastest)
// Print all capabilities
accel.PrintCapabilities()
Pure Go Mode ¶
When built with CGO_ENABLED=0, the package compiles in pure Go mode. All operations return ErrNoBackends but the package remains importable, allowing graceful fallback to CPU implementations.
Basic Usage ¶
// Initialize library
if err := accel.Init(); err != nil {
log.Printf("GPU accel not available: %v", err)
}
defer accel.Shutdown()
// Check availability
if !accel.Available() {
// Use CPU fallback
return
}
// Create session
session, err := accel.NewSession()
if err != nil {
log.Fatal(err)
}
defer session.Close()
// Create tensors
a, _ := accel.NewTensor[float32](session, []int{1024, 1024})
b, _ := accel.NewTensor[float32](session, []int{1024, 1024})
c, _ := accel.NewTensor[float32](session, []int{1024, 1024})
// Perform GPU operation
if err := session.ML().MatMul(a.Untyped(), b.Untyped(), c.Untyped()); err != nil {
log.Fatal(err)
}
Integration with Lux Node ¶
The accel package integrates with lux-node for:
- Batch signature verification in consensus
- Merkle tree computation for state sync
- Post-quantum cryptography for future-proofing
See the node/consensus and precompile packages for integration examples.
Index ¶
- Constants
- Variables
- func AllCapabilities() map[BackendType]*BackendCapabilities
- func Available() bool
- func BLSBatchVerify(pks, sigs, msgs [][]byte) ([]bool, error)
- func CUDAAvailable() bool
- func DeviceCount() int
- func DilithiumSign(msg, sk []byte) (sig []byte, err error)
- func DilithiumVerify(msg, sig, pk []byte) (bool, error)
- func GetLastError() string
- func GetVersion() string
- func Init() error
- func Keccak256Batch(inputs [][]byte) ([][]byte, error)
- func KyberDecaps(ct, sk []byte) (ss []byte, err error)
- func KyberEncaps(pk []byte) (ct, ss []byte, err error)
- func KyberKeyGen() (pk, sk []byte, err error)
- func LoadPlugin(path string) error
- func MSM(scalars, bases [][]byte) ([]byte, error)
- func MerkleRoot(leaves [][]byte) ([]byte, error)
- func MetalAvailable() bool
- func NTTForward(coeffs, roots []uint64, modulus uint64) error
- func NTTInverse(coeffs, invRoots []uint64, modulus uint64) error
- func PrintCapabilities()
- func SHA256Batch(inputs [][]byte) ([][]byte, error)
- func Shutdown()
- func WebGPUAvailable() bool
- type BackendCapabilities
- type BackendComparison
- type BackendInfo
- type BackendType
- type BenchmarkResult
- type CryptoOps
- type DEXOps
- type DType
- type DeviceCaps
- type DeviceInfo
- type Error
- type FHEOps
- type LatticeOps
- type MLOps
- type OperationType
- type Session
- func (s *Session) Backend() BackendType
- func (s *Session) Close() error
- func (s *Session) Crypto() CryptoOps
- func (s *Session) DEX() DEXOps
- func (s *Session) DeviceInfo() DeviceInfo
- func (s *Session) FHE() FHEOps
- func (s *Session) IsClosed() bool
- func (s *Session) Lattice() LatticeOps
- func (s *Session) ML() MLOps
- func (s *Session) Sync() error
- func (s *Session) SyncContext(ctx context.Context) error
- func (s *Session) ZK() ZKOps
- type SessionOption
- type Tensor
- func (t *Tensor[T]) Bytes() int
- func (t *Tensor[T]) Close()
- func (t *Tensor[T]) DType() DType
- func (t *Tensor[T]) FromSlice(src []T) error
- func (t *Tensor[T]) NDim() int
- func (t *Tensor[T]) NumEl() int
- func (t *Tensor[T]) Shape() []int
- func (t *Tensor[T]) ToSlice() ([]T, error)
- func (t *Tensor[T]) Untyped() *UntypedTensor
- type TensorElement
- type UntypedTensor
- type ZKOps
Constants ¶
const ( BLSBatchVerifyThreshold = 64 // Min signatures for GPU batch verify BLSBatchAggregateThreshold = 128 // Min items for GPU aggregation HashBatchThreshold = 32 // Min items for GPU batch hash NTTBatchThreshold = 4 // Min polynomials for GPU batch NTT MSMBatchThreshold = 64 // Min points for GPU MSM KyberBatchThreshold = 8 // Min operations for GPU batch DilithiumBatchThreshold = 8 // Min operations for GPU batch )
Batch operation thresholds - minimum items for GPU acceleration to be worthwhile.
const ( KyberPublicKeySize = 1184 KyberSecretKeySize = 2400 KyberCiphertextSize = 1088 )
Kyber key and ciphertext sizes (ML-KEM-768)
const ( DilithiumPublicKeySize = 1952 DilithiumSecretKeySize = 4016 DilithiumSignatureSize = 3309 )
Dilithium sizes (ML-DSA-65)
const Version = "0.1.0"
Version is the library version.
Variables ¶
var ( // ErrNoBackends indicates no GPU backends are available. ErrNoBackends = errors.New("accel: no GPU backends available") // ErrNotInitialized indicates the library was not initialized. ErrNotInitialized = errors.New("accel: library not initialized") // ErrInvalidArgument indicates an invalid argument was provided. ErrInvalidArgument = errors.New("accel: invalid argument") // ErrOutOfMemory indicates GPU memory allocation failed. ErrOutOfMemory = errors.New("accel: out of GPU memory") // ErrNotSupported indicates the operation is not supported. ErrNotSupported = errors.New("accel: operation not supported") // ErrKernelFailed indicates a GPU kernel execution failed. ErrKernelFailed = errors.New("accel: kernel execution failed") // ErrBackendNotFound indicates the requested backend is not available. ErrBackendNotFound = errors.New("accel: backend not found") // ErrSessionClosed indicates the session has been closed. ErrSessionClosed = errors.New("accel: session closed") // ErrShapeMismatch indicates tensor shapes are incompatible. ErrShapeMismatch = errors.New("accel: tensor shape mismatch") // ErrBatchSizeMismatch indicates mismatched batch input sizes. ErrBatchSizeMismatch = errors.New("accel: mismatched batch input sizes") // ErrNilInput indicates nil input in batch operation. ErrNilInput = errors.New("accel: nil input in batch operation") )
var BackendPriority = []BackendType{ BackendCUDA, BackendMetal, BackendWebGPU, }
BackendPriority defines the order for automatic backend selection.
Functions ¶
func AllCapabilities ¶
func AllCapabilities() map[BackendType]*BackendCapabilities
AllCapabilities returns capabilities for all available backends.
func Available ¶
func Available() bool
Available returns true if at least one GPU backend is available.
func BLSBatchVerify ¶
BLSBatchVerify verifies multiple BLS signatures using GPU acceleration. Returns slice of bools indicating validity of each signature. Returns ErrNotSupported if GPU unavailable or batch too small.
func CUDAAvailable ¶
func CUDAAvailable() bool
CUDAAvailable returns true if CUDA backend is available.
func DeviceCount ¶
func DeviceCount() int
DeviceCount returns the total number of available devices.
func DilithiumSign ¶
DilithiumSign signs a message using Dilithium (ML-DSA).
func DilithiumVerify ¶
DilithiumVerify verifies a Dilithium signature.
func GetLastError ¶
func GetLastError() string
GetLastError returns the last error message from the C library.
func Init ¶
func Init() error
Init initializes the accel library. Must be called before any other operations. Safe to call multiple times; subsequent calls are no-ops.
func Keccak256Batch ¶
Keccak256Batch computes Keccak256 hashes for multiple inputs using GPU. Returns ErrNotSupported if GPU unavailable or batch too small.
func KyberDecaps ¶
KyberDecaps decapsulates a ciphertext using a secret key.
func KyberEncaps ¶
KyberEncaps encapsulates a shared secret using a public key.
func KyberKeyGen ¶
KyberKeyGen generates a Kyber keypair using GPU acceleration.
func LoadPlugin ¶
LoadPlugin explicitly loads a backend plugin from a path.
func MSM ¶
MSM computes Multi-Scalar Multiplication: sum(scalars[i] * bases[i]) Returns ErrNotSupported if GPU unavailable or batch too small.
func MerkleRoot ¶
MerkleRoot computes the Merkle root of leaves using GPU. Returns ErrNotSupported if GPU unavailable or batch too small.
func MetalAvailable ¶
func MetalAvailable() bool
MetalAvailable returns true if Metal backend is available.
func NTTForward ¶
NTTForward computes forward Number Theoretic Transform on a polynomial. Modifies coeffs in-place.
func NTTInverse ¶
NTTInverse computes inverse Number Theoretic Transform on a polynomial. Modifies coeffs in-place.
func PrintCapabilities ¶
func PrintCapabilities()
PrintCapabilities prints a human-readable summary of backend capabilities.
func SHA256Batch ¶
SHA256Batch computes SHA256 hashes for multiple inputs using GPU. Returns ErrNotSupported if GPU unavailable or batch too small.
func Shutdown ¶
func Shutdown()
Shutdown releases all library resources. Call when done using the library.
func WebGPUAvailable ¶
func WebGPUAvailable() bool
WebGPUAvailable returns true if WebGPU backend is available.
Types ¶
type BackendCapabilities ¶
type BackendCapabilities struct {
Backend BackendType
Operations map[OperationType]bool
Categories map[string]bool
}
BackendCapabilities describes what operations a backend supports.
func Capabilities ¶
func Capabilities(backend BackendType) (*BackendCapabilities, error)
Capabilities returns the capabilities for a specific backend.
func GetCapabilities ¶
func GetCapabilities(backend BackendType) (*BackendCapabilities, error)
GetCapabilities returns the capabilities for a backend. This probes the backend to determine which operations are supported.
func (*BackendCapabilities) SupportedOperations ¶
func (c *BackendCapabilities) SupportedOperations() []OperationType
SupportedOperations returns a list of supported operations.
func (*BackendCapabilities) Supports ¶
func (c *BackendCapabilities) Supports(op OperationType) bool
Supports returns true if the backend supports the operation.
func (*BackendCapabilities) SupportsCategory ¶
func (c *BackendCapabilities) SupportsCategory(cat string) bool
SupportsCategory returns true if the backend supports any operation in the category.
type BackendComparison ¶
type BackendComparison struct {
Operation OperationType
Results map[BackendType]BenchmarkResult
Fastest BackendType
}
BackendComparison holds benchmark results across backends.
func CompareBackends ¶
func CompareBackends(op OperationType, iterations int) (*BackendComparison, error)
CompareBackends runs a quick benchmark of an operation across all backends. Returns results for comparison. If the operation isn't supported, that backend's result will have an error.
type BackendInfo ¶
type BackendInfo struct {
Type BackendType
Name string
APIVersion int
DeviceCount int
}
BackendInfo provides information about an available backend.
type BackendType ¶
type BackendType int
BackendType identifies a GPU compute backend.
const ( // BackendAuto selects the best available backend automatically. // Priority: CUDA > Metal > WebGPU BackendAuto BackendType = iota // BackendMetal uses Apple Metal (macOS/iOS). BackendMetal // BackendWebGPU uses WebGPU via Dawn (cross-platform). BackendWebGPU // BackendCUDA uses NVIDIA CUDA. BackendCUDA )
func MustSelectBackend ¶
func MustSelectBackend(ops ...OperationType) BackendType
MustSelectBackend returns the best backend or panics on error.
func SelectBackend ¶
func SelectBackend(ops ...OperationType) (BackendType, error)
SelectBackend returns the best backend for the given operations. If ops is empty, returns the highest priority available backend.
func SelectBestBackend ¶
func SelectBestBackend(ops []OperationType, preferPerformance bool) (BackendType, error)
SelectBestBackend returns the best available backend for a set of operations. It considers backend availability, capability support, and optionally performance.
type BenchmarkResult ¶
type BenchmarkResult struct {
Backend BackendType
Operation OperationType
Duration time.Duration
Error error
}
BenchmarkResult holds timing data for an operation.
type CryptoOps ¶
type CryptoOps interface {
// SHA256 computes SHA-256 hashes for a batch of inputs.
// input: [N, input_len] bytes
// output: [N, 32] bytes
SHA256(input, output *UntypedTensor) error
// Keccak256 computes Keccak-256 (Ethereum hash) for a batch.
// input: [N, input_len] bytes
// output: [N, 32] bytes
Keccak256(input, output *UntypedTensor) error
// Poseidon computes Poseidon hash (ZK-friendly).
// input: [N, field_elements] uint64
// output: [N, 1] uint64
Poseidon(input, output *UntypedTensor) error
// ECDSAVerifyBatch verifies multiple ECDSA signatures in parallel.
// messages: [N, 32] bytes (message hashes)
// signatures: [N, 64] bytes (r || s)
// pubkeys: [N, 33] bytes (compressed) or [N, 65] (uncompressed)
// results: [N] uint8 (1 = valid, 0 = invalid)
ECDSAVerifyBatch(messages, signatures, pubkeys, results *UntypedTensor) error
// Ed25519VerifyBatch verifies multiple Ed25519 signatures.
// messages: [N, msg_len] bytes
// signatures: [N, 64] bytes
// pubkeys: [N, 32] bytes
// results: [N] uint8 (1 = valid, 0 = invalid)
Ed25519VerifyBatch(messages, signatures, pubkeys, results *UntypedTensor) error
// BLSVerifyBatch verifies multiple BLS signatures.
// messages: [N, msg_len] bytes
// signatures: [N, 96] bytes (G2 points)
// pubkeys: [N, 48] bytes (G1 points)
// results: [N] uint8 (1 = valid, 0 = invalid)
BLSVerifyBatch(messages, signatures, pubkeys, results *UntypedTensor) error
// BLSAggregate aggregates multiple BLS signatures into one.
// signatures: [N, 96] bytes
// aggregated: [96] bytes
BLSAggregate(signatures, aggregated *UntypedTensor) error
// MerkleRoot computes Merkle root from leaves.
// leaves: [N, 32] bytes (N must be power of 2)
// root: [32] bytes
MerkleRoot(leaves, root *UntypedTensor) error
// MerkleBatch computes multiple Merkle roots in parallel.
// leavesSet: [M, N, 32] bytes
// roots: [M, 32] bytes
MerkleBatch(leavesSet, roots *UntypedTensor) error
// MerkleProof generates Merkle proof for a leaf.
// leaves: [N, 32] bytes
// leafIndex: index of the leaf
// proof: [log2(N), 32] bytes
MerkleProof(leaves *UntypedTensor, leafIndex int, proof *UntypedTensor) error
}
CryptoOps provides GPU-accelerated cryptographic operations.
type DEXOps ¶
type DEXOps interface {
// ConstantProductSwap computes AMM swap output using x*y=k formula.
// reserveX: [N] uint64 - X token reserves
// reserveY: [N] uint64 - Y token reserves
// amountIn: [N] uint64 - input amounts
// xToY: true for X→Y swap, false for Y→X
// amountOut: [N] uint64 - output amounts
// fee: fee percentage (e.g., 0.003 for 0.3%)
ConstantProductSwap(reserveX, reserveY, amountIn *UntypedTensor, xToY bool, amountOut *UntypedTensor, fee float32) error
// ConstantProductSwapBatch processes multiple swaps.
// reserves: [M, 2] uint64 (reserveX, reserveY per pool)
// swaps: [N, 3] uint64 (poolIndex, amountIn, direction)
// amounts: [N] uint64 output amounts
ConstantProductSwapBatch(reserves, swaps, amounts *UntypedTensor, fee float32) error
// ComputeTWAP computes time-weighted average price.
// prices: [N] uint64 - historical prices
// timestamps: [N] uint64 - timestamps
// start, end: time range
// twap: [1] uint64 output
ComputeTWAP(prices, timestamps *UntypedTensor, start, end uint64, twap *UntypedTensor) error
// MatchOrders matches bid/ask orders.
// bids: [N, 3] uint64 (price, quantity, orderId)
// asks: [M, 3] uint64 (price, quantity, orderId)
// matches: output (bidId, askId, quantity, price)
// prices: fill prices
// amounts: fill amounts
MatchOrders(bids, asks, matches, prices, amounts *UntypedTensor) error
// MatchOrdersWithPriority matches orders with time/price priority.
// bids: [N, 4] uint64 (price, quantity, orderId, timestamp)
// asks: [M, 4] uint64
MatchOrdersWithPriority(bids, asks, matches *UntypedTensor) error
// ComputeLiquidity computes concentrated liquidity positions (Uniswap V3 style).
// tickLower: [N] int32 - lower tick
// tickUpper: [N] int32 - upper tick
// amounts: [N, 2] uint64 (amount0, amount1)
// liquidity: [N] uint128 output
ComputeLiquidity(tickLower, tickUpper, amounts, liquidity *UntypedTensor) error
// ComputePositionValue computes position value at current price.
// liquidity: [N] uint128
// tickLower: [N] int32
// tickUpper: [N] int32
// currentTick: current price tick
// values: [N, 2] uint64 (token0, token1)
ComputePositionValue(liquidity, tickLower, tickUpper *UntypedTensor, currentTick int32, values *UntypedTensor) error
// CalculateFees computes accumulated fees for positions.
// liquidity: [N] uint128
// feeGrowthInside0: [N] uint256
// feeGrowthInside1: [N] uint256
// fees: [N, 2] uint64 output
CalculateFees(liquidity, feeGrowthInside0, feeGrowthInside1, fees *UntypedTensor) error
// BatchSettlement settles multiple trades atomically.
// trades: [N, 4] uint64 (buyer, seller, token, amount)
// balances: [M, T] uint64 (M users, T tokens)
// newBalances: output
BatchSettlement(trades, balances, newBalances *UntypedTensor) error
}
DEXOps provides GPU-accelerated DEX (decentralized exchange) operations.
type DeviceCaps ¶
type DeviceCaps uint32
DeviceCaps represents device capability flags.
const ( CapFP16 DeviceCaps = 1 << iota // Half-precision float support CapFP64 // Double precision support CapSubgroups // Subgroup/warp operations CapInt64Atomics // 64-bit atomic operations )
func (DeviceCaps) Has ¶
func (c DeviceCaps) Has(cap DeviceCaps) bool
Has returns true if the device has the specified capability.
type DeviceInfo ¶
type DeviceInfo struct {
Backend BackendType
Index int
Name string
Vendor string
IsDiscrete bool
IsUnifiedMemory bool
TotalMemory uint64 // bytes
MaxBufferSize uint64 // bytes
MaxWorkgroupSize uint32
SIMDWidth uint32
Capabilities DeviceCaps
}
DeviceInfo contains information about a compute device.
func Devices ¶
func Devices() []DeviceInfo
Devices returns information about all available devices across all backends.
func (*DeviceInfo) MemoryGB ¶
func (d *DeviceInfo) MemoryGB() float64
MemoryGB returns total memory in gigabytes.
type Error ¶
type Error struct {
Op string // Operation that failed
Backend BackendType
Err error
Detail string // Additional detail from C library
}
Error wraps an error with additional context from the C library.
type FHEOps ¶
type FHEOps interface {
// BFVEncrypt encrypts plaintext with BFV scheme.
// plaintext: [N] int64 values (N ≤ poly_modulus_degree)
// pk: public key
// ciphertext: output ciphertext
BFVEncrypt(plaintext, pk, ciphertext *UntypedTensor) error
// BFVEncryptBatch encrypts multiple plaintexts.
// plaintexts: [M, N] int64
// pk: public key
// ciphertexts: [M, ...] output
BFVEncryptBatch(plaintexts, pk, ciphertexts *UntypedTensor) error
// BFVDecrypt decrypts ciphertext.
// ciphertext: input ciphertext
// sk: secret key
// plaintext: [N] int64 output
BFVDecrypt(ciphertext, sk, plaintext *UntypedTensor) error
// BFVAdd adds two ciphertexts.
// ct1, ct2: input ciphertexts
// result: output ciphertext
BFVAdd(ct1, ct2, result *UntypedTensor) error
// BFVMultiply multiplies ciphertexts with relinearization.
// ct1, ct2: input ciphertexts
// relinKey: relinearization key
// result: output ciphertext
BFVMultiply(ct1, ct2, relinKey, result *UntypedTensor) error
// BFVMultiplyPlain multiplies ciphertext by plaintext.
// ct: input ciphertext
// plain: [N] int64 plaintext
// result: output ciphertext
BFVMultiplyPlain(ct, plain, result *UntypedTensor) error
// BFVRotate rotates ciphertext slots.
// ct: input ciphertext
// galoisKey: Galois key for rotation
// steps: rotation amount (positive = left)
// result: output ciphertext
BFVRotate(ct, galoisKey *UntypedTensor, steps int, result *UntypedTensor) error
// CKKSEncrypt encrypts with CKKS (approximate arithmetic).
// plaintext: [N] float64 values
// pk: public key
// scale: encoding scale
// ciphertext: output
CKKSEncrypt(plaintext, pk *UntypedTensor, scale float64, ciphertext *UntypedTensor) error
// CKKSDecrypt decrypts CKKS ciphertext.
// ciphertext: input
// sk: secret key
// plaintext: [N] float64 output
CKKSDecrypt(ciphertext, sk, plaintext *UntypedTensor) error
// CKKSAdd adds two CKKS ciphertexts.
CKKSAdd(ct1, ct2, result *UntypedTensor) error
// CKKSMultiply multiplies CKKS ciphertexts.
CKKSMultiply(ct1, ct2, relinKey, result *UntypedTensor) error
// CKKSRescale rescales ciphertext after multiplication.
CKKSRescale(ct, result *UntypedTensor) error
// CKKSRotate rotates CKKS slots.
CKKSRotate(ct, galoisKey *UntypedTensor, steps int, result *UntypedTensor) error
// Bootstrap refreshes ciphertext noise level (limited support).
Bootstrap(ct, bootstrapKey, result *UntypedTensor) error
}
FHEOps provides GPU-accelerated fully homomorphic encryption operations. Supports BFV (exact arithmetic) and CKKS (approximate arithmetic) schemes.
type LatticeOps ¶
type LatticeOps interface {
// KyberKeyGen generates Kyber (ML-KEM) key pair.
// pk: [1184] bytes (Kyber768 public key)
// sk: [2400] bytes (Kyber768 secret key)
KyberKeyGen(pk, sk *UntypedTensor) error
// KyberKeyGenBatch generates multiple key pairs in parallel.
// pk: [N, 1184] bytes
// sk: [N, 2400] bytes
KyberKeyGenBatch(pk, sk *UntypedTensor) error
// KyberEncaps encapsulates shared secret.
// pk: [1184] bytes public key
// ct: [1088] bytes ciphertext output
// ss: [32] bytes shared secret output
KyberEncaps(pk, ct, ss *UntypedTensor) error
// KyberEncapsBatch performs batch encapsulation.
// pk: [N, 1184] bytes
// ct: [N, 1088] bytes
// ss: [N, 32] bytes
KyberEncapsBatch(pk, ct, ss *UntypedTensor) error
// KyberDecaps decapsulates shared secret.
// ct: [1088] bytes ciphertext
// sk: [2400] bytes secret key
// ss: [32] bytes shared secret output
KyberDecaps(ct, sk, ss *UntypedTensor) error
// KyberDecapsBatch performs batch decapsulation.
// ct: [N, 1088] bytes
// sk: [N, 2400] bytes
// ss: [N, 32] bytes
KyberDecapsBatch(ct, sk, ss *UntypedTensor) error
// DilithiumKeyGen generates Dilithium (ML-DSA) key pair.
// pk: [1952] bytes (Dilithium3 public key)
// sk: [4016] bytes (Dilithium3 secret key)
DilithiumKeyGen(pk, sk *UntypedTensor) error
// DilithiumSign signs a message.
// msg: [msg_len] bytes message
// sk: [4016] bytes secret key
// sig: [3293] bytes signature output
DilithiumSign(msg, sk, sig *UntypedTensor) error
// DilithiumSignBatch signs multiple messages in parallel.
// msgs: [N, msg_len] bytes
// sk: [4016] bytes (same key for all)
// sigs: [N, 3293] bytes
DilithiumSignBatch(msgs, sk, sigs *UntypedTensor) error
// DilithiumVerify verifies a signature.
// msg: [msg_len] bytes
// sig: [3293] bytes
// pk: [1952] bytes
// Returns true if valid.
DilithiumVerify(msg, sig, pk *UntypedTensor) (bool, error)
// DilithiumVerifyBatch verifies multiple signatures.
// msgs: [N, msg_len] bytes
// sigs: [N, 3293] bytes
// pks: [N, 1952] bytes
// results: [N] uint8 (1 = valid, 0 = invalid)
DilithiumVerifyBatch(msgs, sigs, pks, results *UntypedTensor) error
// PolynomialNTT performs NTT in lattice polynomial ring.
// Operates on polynomials in Z_q[X]/(X^256 + 1).
PolynomialNTT(input, output *UntypedTensor, q uint32) error
// PolynomialINTT performs inverse NTT.
PolynomialINTT(input, output *UntypedTensor, q uint32) error
// PolynomialMul multiplies polynomials in NTT domain.
PolynomialMul(a, b, c *UntypedTensor, q uint32) error
// PolynomialAdd adds polynomials.
PolynomialAdd(a, b, c *UntypedTensor, q uint32) error
}
LatticeOps provides GPU-accelerated lattice-based cryptography operations. Implements NIST post-quantum standards: ML-KEM (Kyber) and ML-DSA (Dilithium).
type MLOps ¶
type MLOps interface {
// MatMul performs matrix multiplication: C = A @ B
MatMul(a, b, c *UntypedTensor) error
// MatMulTranspose performs C = A @ B^T or C = A^T @ B
MatMulTranspose(a, b, c *UntypedTensor, transposeA, transposeB bool) error
// ReLU applies rectified linear unit: y = max(0, x)
ReLU(input, output *UntypedTensor) error
// GELU applies Gaussian error linear unit activation
GELU(input, output *UntypedTensor) error
// Softmax applies softmax along an axis
Softmax(input, output *UntypedTensor, axis int) error
// LayerNorm applies layer normalization
LayerNorm(input, gamma, beta, output *UntypedTensor, eps float32) error
// Attention computes scaled dot-product attention
// output = softmax(Q @ K^T / scale) @ V
Attention(q, k, v, output *UntypedTensor, scale float32) error
// Conv2D performs 2D convolution
Conv2D(input, kernel, output *UntypedTensor, stride, padding [2]int) error
// MaxPool2D performs 2D max pooling
MaxPool2D(input, output *UntypedTensor, kernelSize, stride [2]int) error
// BatchNorm applies batch normalization
BatchNorm(input, gamma, beta, mean, variance, output *UntypedTensor, eps float32) error
// Dropout applies dropout with given probability (inference mode)
Dropout(input, output *UntypedTensor, p float32) error
// Add performs element-wise addition
Add(a, b, c *UntypedTensor) error
// Multiply performs element-wise multiplication
Multiply(a, b, c *UntypedTensor) error
// Sum reduces tensor along specified axes
Sum(input, output *UntypedTensor, axes []int) error
// Mean reduces tensor along specified axes
Mean(input, output *UntypedTensor, axes []int) error
}
MLOps provides GPU-accelerated machine learning operations.
type OperationType ¶
type OperationType int
OperationType identifies a type of compute operation.
const ( // ML Operations OpMatMul OperationType = iota OpReLU OpGELU OpSoftmax OpLayerNorm OpAttention // Crypto Operations OpSHA256 OpKeccak256 OpPoseidon OpECDSAVerify OpEd25519Verify OpBLSVerify OpMerkleRoot // ZK Operations OpNTT OpINTT OpMSM OpPolyMul // FHE Operations OpBFVEncrypt OpBFVDecrypt OpBFVAdd OpBFVMul // Lattice Operations OpKyberKeyGen OpKyberEncaps OpKyberDecaps OpDilithiumSign OpDilithiumVerify // DEX Operations OpConstantProductSwap OpTWAP OpOrderMatch )
func (OperationType) Category ¶
func (o OperationType) Category() string
Category returns the operation category.
func (OperationType) String ¶
func (o OperationType) String() string
String returns the operation name.
type Session ¶
type Session struct {
// contains filtered or unexported fields
}
Session manages a GPU acceleration context. All tensor operations must use tensors created from the same session. Session is safe for concurrent use.
func DefaultSession ¶
DefaultSession returns a lazily initialized default session. It uses the best available backend (Metal on macOS, CUDA on Linux).
func NewSession ¶
func NewSession(opts ...SessionOption) (*Session, error)
NewSession creates a new acceleration session with auto-detected best backend.
func NewSessionWithBackend ¶
func NewSessionWithBackend(backend BackendType, opts ...SessionOption) (*Session, error)
NewSessionWithBackend creates a session using a specific backend.
func NewSessionWithDevice ¶
func NewSessionWithDevice(backend BackendType, deviceIndex int, opts ...SessionOption) (*Session, error)
NewSessionWithDevice creates a session using a specific device.
func (*Session) Backend ¶
func (s *Session) Backend() BackendType
Backend returns the backend type for this session.
func (*Session) DeviceInfo ¶
func (s *Session) DeviceInfo() DeviceInfo
DeviceInfo returns information about the session's device.
func (*Session) Lattice ¶
func (s *Session) Lattice() LatticeOps
Lattice returns the lattice cryptography operations interface.
func (*Session) SyncContext ¶
SyncContext waits for pending operations with context cancellation.
type SessionOption ¶
type SessionOption func(*sessionConfig)
SessionOption configures session creation.
func WithAsync ¶
func WithAsync(async bool) SessionOption
WithAsync enables asynchronous operation mode.
func WithBackend ¶
func WithBackend(b BackendType) SessionOption
WithBackend specifies the backend to use.
func WithDevice ¶
func WithDevice(index int) SessionOption
WithDevice specifies the device index within the backend.
type Tensor ¶
type Tensor[T TensorElement] struct { // contains filtered or unexported fields }
Tensor represents a multi-dimensional array on GPU memory. Tensor is not safe for concurrent modification but safe for concurrent reads.
func NewTensor ¶
func NewTensor[T TensorElement](s *Session, shape []int) (*Tensor[T], error)
NewTensor creates a new tensor with the given shape.
func NewTensorWithData ¶
func NewTensorWithData[T TensorElement](s *Session, shape []int, data []T) (*Tensor[T], error)
NewTensorWithData creates a tensor initialized with data from a slice.
func (*Tensor[T]) Untyped ¶
func (t *Tensor[T]) Untyped() *UntypedTensor
Untyped returns an untyped view of the tensor for passing to ops.
type TensorElement ¶
TensorElement is a type constraint for tensor element types.
type UntypedTensor ¶
type UntypedTensor struct {
// contains filtered or unexported fields
}
UntypedTensor provides type-erased tensor operations. Used internally and for dynamic typing scenarios.
func (*UntypedTensor) Bytes ¶
func (t *UntypedTensor) Bytes() int
Bytes returns the total byte size.
func (*UntypedTensor) DType ¶
func (t *UntypedTensor) DType() DType
DType returns the element data type.
func (*UntypedTensor) Handle ¶
func (t *UntypedTensor) Handle() uintptr
Handle returns the raw tensor handle for CGO operations.
func (*UntypedTensor) NDim ¶
func (t *UntypedTensor) NDim() int
NDim returns the number of dimensions.
func (*UntypedTensor) NumEl ¶
func (t *UntypedTensor) NumEl() int
NumEl returns the total number of elements.
func (*UntypedTensor) Shape ¶
func (t *UntypedTensor) Shape() []int
Shape returns a copy of the tensor shape.
type ZKOps ¶
type ZKOps interface {
// NTT performs Number Theoretic Transform.
// input: [N] uint64 coefficients
// output: [N] uint64 NTT values
// roots: [N] uint64 roots of unity
// modulus: prime modulus
NTT(input, output, roots *UntypedTensor, modulus uint64) error
// INTT performs inverse NTT.
// input: [N] uint64 NTT values
// output: [N] uint64 coefficients
// invRoots: [N] uint64 inverse roots of unity
// modulus: prime modulus
INTT(input, output, invRoots *UntypedTensor, modulus uint64) error
// MSM performs multi-scalar multiplication on elliptic curves.
// scalars: [N, scalar_size] bytes
// bases: [N, point_size] bytes (affine points)
// result: [point_size] bytes
MSM(scalars, bases, result *UntypedTensor) error
// MSMBatch performs multiple MSMs in parallel.
// scalars: [M, N, scalar_size] bytes
// bases: [M, N, point_size] bytes
// results: [M, point_size] bytes
MSMBatch(scalars, bases, results *UntypedTensor) error
// PolyMul multiplies polynomials in coefficient form.
// a: [N] uint64 coefficients
// b: [N] uint64 coefficients
// c: [2N-1] uint64 result coefficients
// modulus: prime modulus
PolyMul(a, b, c *UntypedTensor, modulus uint64) error
// PolyEval evaluates polynomial at given points.
// coeffs: [degree+1] uint64
// points: [N] uint64
// results: [N] uint64
// modulus: prime modulus
PolyEval(coeffs, points, results *UntypedTensor, modulus uint64) error
// CommitPoly computes polynomial commitment (KZG).
// coeffs: [degree+1, field_size] bytes
// srs: structured reference string
// commitment: [point_size] bytes
CommitPoly(coeffs, srs, commitment *UntypedTensor) error
// FFT performs Fast Fourier Transform (complex).
// input: [N, 2] float32 (real, imag)
// output: [N, 2] float32
FFT(input, output *UntypedTensor) error
// IFFT performs inverse FFT.
IFFT(input, output *UntypedTensor) error
// FieldAdd adds field elements.
// a: [N] uint64
// b: [N] uint64
// c: [N] uint64
// modulus: prime modulus
FieldAdd(a, b, c *UntypedTensor, modulus uint64) error
// FieldMul multiplies field elements.
FieldMul(a, b, c *UntypedTensor, modulus uint64) error
// FieldInv computes modular inverse.
FieldInv(a, b *UntypedTensor, modulus uint64) error
}
ZKOps provides GPU-accelerated zero-knowledge proof operations.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
internal
|
|
|
capi
Package capi provides CGO bindings to the lux-accel C library.
|
Package capi provides CGO bindings to the lux-accel C library. |
|
ops
|
|
|
consensus
Package consensus provides GPU-accelerated consensus primitives.
|
Package consensus provides GPU-accelerated consensus primitives. |
|
crypto
Package crypto provides GPU-accelerated cryptographic operations.
|
Package crypto provides GPU-accelerated cryptographic operations. |
|
dex
Package dex provides GPU-accelerated DEX operations.
|
Package dex provides GPU-accelerated DEX operations. |
|
fhe
Package fhe provides GPU-accelerated Fully Homomorphic Encryption operations.
|
Package fhe provides GPU-accelerated Fully Homomorphic Encryption operations. |
|
lattice
Package lattice provides GPU-accelerated lattice cryptography operations.
|
Package lattice provides GPU-accelerated lattice cryptography operations. |
|
zk
Package zk provides GPU-accelerated zero-knowledge proof operations.
|
Package zk provides GPU-accelerated zero-knowledge proof operations. |