webgpu

package
v0.7.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 6, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Rendered for windows/amd64

Overview

Package webgpu implements the WebGPU backend for GPU-accelerated tensor operations. Uses go-webgpu (github.com/go-webgpu/webgpu) for zero-CGO WebGPU bindings.

Package webgpu implements the WebGPU backend for GPU-accelerated tensor operations.

Package webgpu implements the WebGPU backend for GPU-accelerated tensor operations.

Package webgpu implements the WebGPU backend for GPU-accelerated tensor operations.

Package webgpu provides embedded WGSL compute shaders for tensor operations.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsAvailable

func IsAvailable() (available bool)

IsAvailable checks if WebGPU is available on this system.

func ListAdapters

func ListAdapters() (adapters []*wgpu.AdapterInfoGo, err error)

ListAdapters returns information about all available GPU adapters.

Types

type Backend

type Backend struct {

	// Lazy mode: when true, operations return lazy tensors that keep data on GPU
	// until Data() is explicitly called. This is the key optimization for
	// Phase 3 Integration - eliminates readBuffer() bottleneck.
	// Default: true for optimal performance.
	LazyMode bool
	// contains filtered or unexported fields
}

Backend implements tensor operations on GPU using WebGPU.

func New

func New() (backend *Backend, err error)

New creates a new WebGPU backend. Returns an error if WebGPU is not available or initialization fails.

func (*Backend) AdapterInfo

func (b *Backend) AdapterInfo() *wgpu.AdapterInfoGo

AdapterInfo returns information about the GPU adapter.

func (*Backend) Add

func (b *Backend) Add(a, other *tensor.RawTensor) *tensor.RawTensor

Add performs element-wise addition on GPU. Supports float32 and int32 dtypes. In LazyMode (default), returns a lazy tensor that keeps data on GPU.

func (*Backend) AddBackwardGPU added in v0.6.0

func (b *Backend) AddBackwardGPU(_, _, grad *GPUTensor) (*GPUTensor, *GPUTensor)

AddBackwardGPU computes gradients for element-wise addition. d(a+b)/da = 1, d(a+b)/db = 1.

func (*Backend) AddGPU added in v0.6.0

func (b *Backend) AddGPU(a, c *GPUTensor) *GPUTensor

AddGPU performs element-wise addition on GPU tensors. Data stays on GPU - no CPU transfer occurs.

func (*Backend) AddScalar added in v0.3.0

func (b *Backend) AddScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

AddScalar adds a scalar to tensor elements on GPU.

func (*Backend) And added in v0.3.0

func (b *Backend) And(a, other *tensor.RawTensor) *tensor.RawTensor

And performs element-wise logical AND on GPU. Supports mixed dtypes by casting to float32 (for boolean tensors from different sources).

func (*Backend) Argmax added in v0.3.0

func (b *Backend) Argmax(x *tensor.RawTensor, dim int) *tensor.RawTensor

Argmax returns indices of maximum values along dimension on GPU.

func (*Backend) BatchMatMul added in v0.4.0

func (b *Backend) BatchMatMul(a, other *tensor.RawTensor) *tensor.RawTensor

BatchMatMul performs batched matrix multiplication on GPU. Supports 3D tensors [batch, M, K] @ [batch, K, N] -> [batch, M, N] and 4D tensors [batch, heads, M, K] @ [batch, heads, K, N].

func (*Backend) Cast added in v0.3.0

func (b *Backend) Cast(x *tensor.RawTensor, dtype tensor.DataType) *tensor.RawTensor

Cast converts tensor to different data type. Supports float32 and int32 as target types.

func (*Backend) Cat added in v0.3.0

func (b *Backend) Cat(tensors []*tensor.RawTensor, dim int) *tensor.RawTensor

Cat concatenates tensors along the specified dimension.

func (*Backend) Chunk added in v0.3.0

func (b *Backend) Chunk(x *tensor.RawTensor, n, dim int) []*tensor.RawTensor

Chunk splits tensor into n equal parts along the specified dimension.

func (*Backend) Conv2D

func (b *Backend) Conv2D(input, kernel *tensor.RawTensor, stride, padding int) *tensor.RawTensor

Conv2D performs 2D convolution on GPU. Input shape: [batch, in_channels, height, width]. Kernel shape: [out_channels, in_channels, kH, kW].

func (*Backend) Conv2DInputBackward added in v0.7.1

func (b *Backend) Conv2DInputBackward(input, kernel, grad *tensor.RawTensor, stride, padding int) *tensor.RawTensor

Conv2DInputBackward computes gradient with respect to input for Conv2D. Not yet implemented for WebGPU backend.

func (*Backend) Conv2DKernelBackward added in v0.7.1

func (b *Backend) Conv2DKernelBackward(input, kernel, grad *tensor.RawTensor, stride, padding int) *tensor.RawTensor

Conv2DKernelBackward computes gradient with respect to kernel for Conv2D. Not yet implemented for WebGPU backend.

func (*Backend) Cos added in v0.3.0

func (b *Backend) Cos(x *tensor.RawTensor) *tensor.RawTensor

Cos computes element-wise cosine on GPU.

func (*Backend) Device

func (b *Backend) Device() tensor.Device

Device returns the compute device.

func (*Backend) Div

func (b *Backend) Div(a, other *tensor.RawTensor) *tensor.RawTensor

Div performs element-wise division on GPU. Supports float32 and int32 dtypes. In LazyMode (default), returns a lazy tensor that keeps data on GPU.

func (*Backend) DivBackwardGPU added in v0.6.0

func (b *Backend) DivBackwardGPU(a, c, grad *GPUTensor) (*GPUTensor, *GPUTensor)

DivBackwardGPU computes gradients for element-wise division. d(a/b)/da = 1/b, d(a/b)/db = -a/b^2.

func (*Backend) DivGPU added in v0.6.0

func (b *Backend) DivGPU(a, c *GPUTensor) *GPUTensor

DivGPU performs element-wise division on GPU tensors. Data stays on GPU - no CPU transfer occurs.

func (*Backend) DivScalar added in v0.3.0

func (b *Backend) DivScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

DivScalar divides tensor elements by a scalar on GPU.

func (*Backend) Embedding added in v0.5.1

func (b *Backend) Embedding(weight, indices *tensor.RawTensor) *tensor.RawTensor

Embedding performs embedding lookup on GPU. weight: [num_embeddings, embedding_dim], indices: int32 tensor. Returns: [...indices_shape, embedding_dim].

func (*Backend) Equal added in v0.3.0

func (b *Backend) Equal(a, other *tensor.RawTensor) *tensor.RawTensor

Equal performs element-wise equality comparison on GPU. Always returns float32 tensor (0.0 for false, 1.0 for true).

func (*Backend) Exp added in v0.3.0

func (b *Backend) Exp(x *tensor.RawTensor) *tensor.RawTensor

Exp computes element-wise exponential on GPU.

func (*Backend) Expand added in v0.3.0

func (b *Backend) Expand(x *tensor.RawTensor, newShape tensor.Shape) *tensor.RawTensor

Expand broadcasts tensor to new shape. GPU-accelerated for up to 6D tensors.

func (*Backend) FlashAttentionGPU added in v0.7.0

func (b *Backend) FlashAttentionGPU(
	q, k, v *tensor.RawTensor,
	scale float32,
	causal bool,
	blockSize int,
) (*tensor.RawTensor, error)

FlashAttentionGPU executes Flash Attention 2 on GPU using WebGPU.

This implementation uses tiled computation with online softmax to achieve O(N) memory complexity instead of O(N²) for standard attention.

Parameters:

  • q: Query tensor [batch, seqLen, numHeads, headDim]
  • k: Key tensor [batch, kvLen, numHeads, headDim]
  • v: Value tensor [batch, kvLen, numHeads, headDim]
  • scale: Attention scale factor (typically 1/sqrt(headDim))
  • causal: Whether to apply causal masking
  • blockSize: Tile size for blocked computation (64 or 128)

Returns:

  • *tensor.RawTensor: Output tensor [batch, seqLen, numHeads, headDim]

func (*Backend) FlushCommands added in v0.6.0

func (b *Backend) FlushCommands()

FlushCommands submits all pending command buffers to the GPU queue. Call this when you need to ensure all queued operations are executed. Note: This is called automatically before reading data from GPU buffers.

func (*Backend) FromRawTensor added in v0.6.0

func (b *Backend) FromRawTensor(t *tensor.RawTensor) *GPUTensor

FromRawTensor uploads a CPU tensor to GPU memory. This creates a new GPUTensor with data copied from the RawTensor.

func (*Backend) Gather added in v0.3.0

func (b *Backend) Gather(input *tensor.RawTensor, dim int, indices *tensor.RawTensor) *tensor.RawTensor

Gather selects elements along dim using index tensor on GPU.

func (*Backend) Greater added in v0.3.0

func (b *Backend) Greater(a, other *tensor.RawTensor) *tensor.RawTensor

Greater performs element-wise greater-than comparison on GPU. Always returns float32 tensor (0.0 for false, 1.0 for true).

func (*Backend) GreaterEqual added in v0.3.0

func (b *Backend) GreaterEqual(a, other *tensor.RawTensor) *tensor.RawTensor

GreaterEqual performs element-wise greater-or-equal comparison on GPU. Always returns float32 tensor (0.0 for false, 1.0 for true).

func (*Backend) Log added in v0.3.0

func (b *Backend) Log(x *tensor.RawTensor) *tensor.RawTensor

Log computes natural logarithm element-wise on GPU.

func (*Backend) Lower added in v0.3.0

func (b *Backend) Lower(a, other *tensor.RawTensor) *tensor.RawTensor

Lower performs element-wise less-than comparison on GPU. Always returns float32 tensor (0.0 for false, 1.0 for true).

func (*Backend) LowerEqual added in v0.3.0

func (b *Backend) LowerEqual(a, other *tensor.RawTensor) *tensor.RawTensor

LowerEqual performs element-wise less-or-equal comparison on GPU. Always returns float32 tensor (0.0 for false, 1.0 for true).

func (*Backend) MatMul

func (b *Backend) MatMul(a, other *tensor.RawTensor) *tensor.RawTensor

MatMul performs matrix multiplication on GPU.

func (*Backend) MatMulBackwardGPU added in v0.6.0

func (b *Backend) MatMulBackwardGPU(a, c, grad *GPUTensor) (*GPUTensor, *GPUTensor)

MatMulBackwardGPU computes gradients for matrix multiplication. d(A@B)/dA = grad@B^T, d(A@B)/dB = A^T@grad.

func (*Backend) MatMulGPU added in v0.6.0

func (b *Backend) MatMulGPU(a, c *GPUTensor) *GPUTensor

MatMulGPU performs matrix multiplication on GPU tensors. Data stays on GPU - no CPU transfer occurs.

func (*Backend) MaxPool2D

func (b *Backend) MaxPool2D(input *tensor.RawTensor, kernelSize, stride int) *tensor.RawTensor

MaxPool2D performs 2D max pooling on GPU. Input shape: [batch, channels, height, width].

func (*Backend) MaxPool2DBackward added in v0.7.1

func (b *Backend) MaxPool2DBackward(input, grad *tensor.RawTensor, maxIndices []int, kernelSize, stride int) *tensor.RawTensor

MaxPool2DBackward computes gradient with respect to input for MaxPool2D. Not yet implemented for WebGPU backend.

func (*Backend) MeanDim added in v0.3.0

func (b *Backend) MeanDim(x *tensor.RawTensor, dim int, keepDim bool) *tensor.RawTensor

MeanDim computes mean along a dimension.

func (*Backend) MemoryStats

func (b *Backend) MemoryStats() MemoryStats

MemoryStats returns current GPU memory usage statistics.

func (*Backend) Mul

func (b *Backend) Mul(a, other *tensor.RawTensor) *tensor.RawTensor

Mul performs element-wise multiplication on GPU. Supports float32 and int32 dtypes. In LazyMode (default), returns a lazy tensor that keeps data on GPU.

func (*Backend) MulBackwardGPU added in v0.6.0

func (b *Backend) MulBackwardGPU(a, c, grad *GPUTensor) (*GPUTensor, *GPUTensor)

MulBackwardGPU computes gradients for element-wise multiplication. d(a*b)/da = b, d(a*b)/db = a.

func (*Backend) MulGPU added in v0.6.0

func (b *Backend) MulGPU(a, c *GPUTensor) *GPUTensor

MulGPU performs element-wise multiplication on GPU tensors. Data stays on GPU - no CPU transfer occurs.

func (*Backend) MulScalar added in v0.3.0

func (b *Backend) MulScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

MulScalar multiplies tensor elements by a scalar on GPU.

func (*Backend) Name

func (b *Backend) Name() string

Name returns the backend name.

func (*Backend) NewBatch added in v0.6.0

func (b *Backend) NewBatch() *CommandBatch

NewBatch creates a new command batch for accumulating operations. The batch will use a single CommandEncoder for all operations.

func (*Backend) Not added in v0.3.0

func (b *Backend) Not(x *tensor.RawTensor) *tensor.RawTensor

Not performs element-wise logical NOT on GPU.

func (*Backend) NotEqual added in v0.3.0

func (b *Backend) NotEqual(a, other *tensor.RawTensor) *tensor.RawTensor

NotEqual performs element-wise inequality comparison on GPU. Always returns float32 tensor (0.0 for false, 1.0 for true).

func (*Backend) OnesGPU added in v0.6.0

func (b *Backend) OnesGPU(shape tensor.Shape, dtype tensor.DataType) *GPUTensor

OnesGPU creates a GPU tensor filled with ones. Data is initialized to ones on CPU then uploaded to GPU.

func (*Backend) Or added in v0.3.0

func (b *Backend) Or(a, other *tensor.RawTensor) *tensor.RawTensor

Or performs element-wise logical OR on GPU. Supports mixed dtypes by casting to float32 (for boolean tensors from different sources).

func (*Backend) RandGPU added in v0.6.0

func (b *Backend) RandGPU(shape tensor.Shape, dtype tensor.DataType) *GPUTensor

RandGPU creates a random GPU tensor with uniform distribution [0, 1). Data is generated on CPU using math/rand then uploaded to GPU.

func (*Backend) ReLU

func (b *Backend) ReLU(x *tensor.RawTensor) *tensor.RawTensor

ReLU applies ReLU activation: max(0, x).

func (*Backend) ReLUBackwardGPU added in v0.6.0

func (b *Backend) ReLUBackwardGPU(input, grad *GPUTensor) *GPUTensor

ReLUBackwardGPU computes gradients for ReLU activation. d(ReLU(x))/dx = 1 if x > 0, else 0. grad_input = grad * (input > 0).

func (*Backend) ReLUGPU added in v0.6.0

func (b *Backend) ReLUGPU(t *GPUTensor) *GPUTensor

ReLUGPU applies ReLU activation on GPU: max(0, x). Data stays on GPU - no CPU transfer occurs.

func (*Backend) ReadGPUBuffer added in v0.6.0

func (b *Backend) ReadGPUBuffer(bufferPtr unsafe.Pointer, size uint64) ([]byte, error)

ReadGPUBuffer implements tensor.LazyBackend interface. Reads data from a GPU buffer to CPU memory. bufferPtr must be *wgpu.Buffer.

func (*Backend) Release

func (b *Backend) Release()

Release releases all WebGPU resources. Must be called when the backend is no longer needed.

func (*Backend) ReleaseGPUBuffer added in v0.6.0

func (b *Backend) ReleaseGPUBuffer(bufferPtr unsafe.Pointer)

ReleaseGPUBuffer implements tensor.LazyBackend interface. Releases a GPU buffer when no longer needed. bufferPtr must be *wgpu.Buffer.

func (*Backend) Reshape

func (b *Backend) Reshape(t *tensor.RawTensor, newShape tensor.Shape) *tensor.RawTensor

Reshape returns a tensor with new shape. This is typically a metadata-only operation (zero-copy).

func (*Backend) Rsqrt added in v0.3.0

func (b *Backend) Rsqrt(x *tensor.RawTensor) *tensor.RawTensor

Rsqrt computes element-wise reciprocal square root on GPU.

func (*Backend) SetLazyMode added in v0.6.0

func (b *Backend) SetLazyMode(enabled bool)

SetLazyMode enables or disables lazy evaluation mode. When enabled (default), operations return lazy tensors that keep data on GPU until Data() is explicitly called. This dramatically improves performance by eliminating unnecessary GPU→CPU transfers. When disabled, operations immediately transfer results to CPU (slower).

func (*Backend) SetMaxBatchSize added in v0.6.0

func (b *Backend) SetMaxBatchSize(size int)

SetMaxBatchSize sets the maximum number of commands to accumulate before auto-flush. Set to 0 (default) to disable auto-flush limit. Typical values: 32-128 for balanced latency/throughput.

func (*Backend) Sigmoid

func (b *Backend) Sigmoid(x *tensor.RawTensor) *tensor.RawTensor

Sigmoid applies sigmoid activation: 1 / (1 + exp(-x)).

func (*Backend) SigmoidBackwardGPU added in v0.6.0

func (b *Backend) SigmoidBackwardGPU(output, grad *GPUTensor) *GPUTensor

SigmoidBackwardGPU computes gradients for sigmoid activation. d(sigmoid(x))/dx = sigmoid(x) * (1 - sigmoid(x)).

func (*Backend) SigmoidGPU added in v0.6.0

func (b *Backend) SigmoidGPU(t *GPUTensor) *GPUTensor

SigmoidGPU applies sigmoid activation on GPU: 1 / (1 + exp(-x)). Data stays on GPU - no CPU transfer occurs.

func (*Backend) Sin added in v0.3.0

func (b *Backend) Sin(x *tensor.RawTensor) *tensor.RawTensor

Sin computes element-wise sine on GPU.

func (*Backend) Softmax

func (b *Backend) Softmax(x *tensor.RawTensor, dim int) *tensor.RawTensor

Softmax applies softmax along the specified dimension. Supports N-dimensional tensors with dim=-1 (last dimension).

func (*Backend) SoftmaxBackwardGPU added in v0.6.0

func (b *Backend) SoftmaxBackwardGPU(output, grad *GPUTensor, dim int) *GPUTensor

SoftmaxBackwardGPU computes gradients for softmax activation. d_input[i] = s[i] * (grad[i] - sum(s * grad)) where s = softmax output.

func (*Backend) SoftmaxGPU added in v0.6.0

func (b *Backend) SoftmaxGPU(t *GPUTensor, dim int) *GPUTensor

SoftmaxGPU applies softmax activation along the specified dimension. For now, only last dimension (dim=-1) is supported efficiently on GPU. Data stays on GPU - no CPU transfer occurs.

func (*Backend) Sqrt added in v0.3.0

func (b *Backend) Sqrt(x *tensor.RawTensor) *tensor.RawTensor

Sqrt computes element-wise square root on GPU.

func (*Backend) Squeeze added in v0.3.0

func (b *Backend) Squeeze(x *tensor.RawTensor, dim int) *tensor.RawTensor

Squeeze removes a dimension of size 1 at the specified position.

func (*Backend) Sub

func (b *Backend) Sub(a, other *tensor.RawTensor) *tensor.RawTensor

Sub performs element-wise subtraction on GPU. Supports float32 and int32 dtypes. In LazyMode (default), returns a lazy tensor that keeps data on GPU.

func (*Backend) SubBackwardGPU added in v0.6.0

func (b *Backend) SubBackwardGPU(_, _, grad *GPUTensor) (*GPUTensor, *GPUTensor)

SubBackwardGPU computes gradients for element-wise subtraction. d(a-b)/da = 1, d(a-b)/db = -1.

func (*Backend) SubGPU added in v0.6.0

func (b *Backend) SubGPU(a, c *GPUTensor) *GPUTensor

SubGPU performs element-wise subtraction on GPU tensors. Data stays on GPU - no CPU transfer occurs.

func (*Backend) SubScalar added in v0.3.0

func (b *Backend) SubScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

SubScalar subtracts a scalar from tensor elements on GPU.

func (*Backend) Sum added in v0.3.0

func (b *Backend) Sum(x *tensor.RawTensor) *tensor.RawTensor

Sum computes the sum of all elements on GPU.

func (*Backend) SumDim added in v0.3.0

func (b *Backend) SumDim(x *tensor.RawTensor, dim int, keepDim bool) *tensor.RawTensor

SumDim sums along a dimension. Implemented on CPU as reduction operations are complex on GPU.

func (*Backend) SumDimGPU added in v0.6.0

func (b *Backend) SumDimGPU(t *GPUTensor, dim int, keepDim bool) *GPUTensor

SumDimGPU computes sum along the last dimension. Input: [batch, dim], Output: [batch].

func (*Backend) Tanh

func (b *Backend) Tanh(x *tensor.RawTensor) *tensor.RawTensor

Tanh applies tanh activation.

func (*Backend) TanhBackwardGPU added in v0.6.0

func (b *Backend) TanhBackwardGPU(output, grad *GPUTensor) *GPUTensor

TanhBackwardGPU computes gradients for tanh activation. d(tanh(x))/dx = 1 - tanh(x)^2.

func (*Backend) TanhGPU added in v0.6.0

func (b *Backend) TanhGPU(t *GPUTensor) *GPUTensor

TanhGPU applies tanh activation on GPU. Data stays on GPU - no CPU transfer occurs.

func (*Backend) Transpose

func (b *Backend) Transpose(t *tensor.RawTensor, axes ...int) *tensor.RawTensor

Transpose transposes the tensor by permuting its dimensions. GPU-accelerated for 2D (optimized) and ND tensors (up to 6D).

func (*Backend) TransposeGPU added in v0.6.0

func (b *Backend) TransposeGPU(t *GPUTensor, axes ...int) *GPUTensor

TransposeGPU transposes a 2D tensor on GPU. Data stays on GPU - no CPU transfer occurs.

func (*Backend) Unsqueeze added in v0.3.0

func (b *Backend) Unsqueeze(x *tensor.RawTensor, dim int) *tensor.RawTensor

Unsqueeze adds a dimension of size 1 at the specified position.

func (*Backend) UploadTensor added in v0.6.0

func (b *Backend) UploadTensor(raw *tensor.RawTensor) *GPUTensor

UploadTensor uploads a CPU tensor to GPU memory. Returns a GPUTensor that can be used for lazy GPU operations.

func (*Backend) Where added in v0.3.0

func (b *Backend) Where(condition, x, y *tensor.RawTensor) *tensor.RawTensor

Where performs conditional element selection on GPU. result[i] = condition[i] != 0 ? x[i] : y[i].

func (*Backend) ZerosGPU added in v0.6.0

func (b *Backend) ZerosGPU(shape tensor.Shape, dtype tensor.DataType) *GPUTensor

ZerosGPU creates a zero-filled GPU tensor. Data is initialized to zeros on GPU.

type BufferPool

type BufferPool struct {
	// contains filtered or unexported fields
}

BufferPool manages GPU buffer reuse to reduce allocation overhead. Buffers are categorized by size and usage flags.

func NewBufferPool

func NewBufferPool(device *wgpu.Device) *BufferPool

NewBufferPool creates a new buffer pool for the given device.

func (*BufferPool) Acquire

func (p *BufferPool) Acquire(size uint64, usage wgpu.BufferUsage) *wgpu.Buffer

Acquire gets a buffer from the pool or creates a new one. Returns a buffer that matches or exceeds the requested size and usage.

func (*BufferPool) Clear

func (p *BufferPool) Clear()

Clear releases all pooled buffers. Should be called when the backend is released.

func (*BufferPool) Release

func (p *BufferPool) Release(buffer *wgpu.Buffer, size uint64, usage wgpu.BufferUsage)

Release returns a buffer to the pool for reuse. If the pool is full, the buffer is immediately released.

func (*BufferPool) Stats

func (p *BufferPool) Stats() (allocated, released, hits, misses uint64, pooledCount int)

Stats returns statistics about buffer pool usage.

type BufferSize

type BufferSize int

BufferSize represents different buffer size categories for pooling.

const (
	// SmallBuffer for tensors < 4KB.
	SmallBuffer BufferSize = iota
	// MediumBuffer for tensors 4KB-1MB.
	MediumBuffer
	// LargeBuffer for tensors > 1MB.
	LargeBuffer
)

type CommandBatch added in v0.6.0

type CommandBatch struct {
	// contains filtered or unexported fields
}

CommandBatch accumulates GPU operations for single submission. Instead of submitting each operation separately (causing GPU overhead), we collect all operations in a batch and submit them together.

func (*CommandBatch) Add added in v0.6.0

func (batch *CommandBatch) Add(name string, output *GPUTensor, execFunc func()) *CommandBatch

Add adds an operation to the batch. The operation function should encode the compute pass but NOT submit it. Returns the batch for method chaining.

func (*CommandBatch) Count added in v0.6.0

func (batch *CommandBatch) Count() int

Count returns the number of operations in the batch.

func (*CommandBatch) Submit added in v0.6.0

func (batch *CommandBatch) Submit()

Submit executes all batched operations in a single GPU submission. This dramatically reduces GPU overhead compared to submitting each operation separately.

Example performance difference:

3 separate submissions: encode → submit → wait (×3) = ~1.5ms overhead
1 batched submission:   encode → encode → encode → submit → wait = ~0.5ms overhead

The batch is consumed after Submit() and cannot be reused.

type GPUTape added in v0.6.0

type GPUTape struct {
	// contains filtered or unexported fields
}

GPUTape records GPU operations for backward pass. All operations and gradients stay on GPU for maximum performance.

func NewGPUTape added in v0.6.0

func NewGPUTape(b *Backend) *GPUTape

NewGPUTape creates a new gradient tape for GPU operations.

func (*GPUTape) Backward added in v0.6.0

func (tape *GPUTape) Backward(loss *GPUTensor) map[*GPUTensor]*GPUTensor

Backward computes gradients for all inputs by walking the tape in reverse. All operations stay on GPU - no CPU transfers occur.

Algorithm:

  1. Start with loss gradient (typically ones for scalar loss)
  2. Walk operations in reverse order
  3. For each operation, compute input gradients using chain rule
  4. Accumulate gradients when the same tensor is used multiple times

Returns a map from GPUTensor to its accumulated gradient (also GPUTensor).

func (*GPUTape) Clear added in v0.6.0

func (tape *GPUTape) Clear()

Clear resets the tape, removing all recorded operations. Recording state is preserved.

func (*GPUTape) Disable added in v0.6.0

func (tape *GPUTape) Disable()

Disable disables operation recording.

func (*GPUTape) Enable added in v0.6.0

func (tape *GPUTape) Enable()

Enable enables operation recording.

func (*GPUTape) IsEnabled added in v0.6.0

func (tape *GPUTape) IsEnabled() bool

IsEnabled returns true if the tape is currently recording operations.

func (*GPUTape) NumOps added in v0.6.0

func (tape *GPUTape) NumOps() int

NumOps returns the number of recorded operations.

func (*GPUTape) Record added in v0.6.0

func (tape *GPUTape) Record(name string, inputs []*GPUTensor, output *GPUTensor, backward func(*GPUTensor) []*GPUTensor)

Record records an operation for backward pass. The backward function should compute gradients for all inputs given the output gradient.

type GPUTensor added in v0.6.0

type GPUTensor struct {
	// contains filtered or unexported fields
}

GPUTensor holds tensor data in GPU memory without transferring to CPU. This enables efficient GPU-to-GPU operations without the overhead of readBuffer() calls.

func (*GPUTensor) Backward added in v0.6.0

func (t *GPUTensor) Backward()

Backward computes gradients for this tensor. This is a convenience method that creates a gradient of ones and calls tape.Backward().

func (*GPUTensor) Buffer added in v0.6.0

func (t *GPUTensor) Buffer() *wgpu.Buffer

Buffer returns the underlying GPU buffer. This is exposed for internal backend operations.

func (*GPUTensor) ByteSize added in v0.6.0

func (t *GPUTensor) ByteSize() uint64

ByteSize returns the total memory size in bytes.

func (*GPUTensor) DType added in v0.6.0

func (t *GPUTensor) DType() tensor.DataType

DType returns the tensor's data type.

func (*GPUTensor) Eval added in v0.6.0

func (t *GPUTensor) Eval() *GPUTensor

Eval forces computation of lazy tensor using batched submission. Collects all dependencies and submits them in a single GPU command buffer. This reduces GPU overhead compared to submitting each operation separately.

func (*GPUTensor) Grad added in v0.6.0

func (t *GPUTensor) Grad() *GPUTensor

Grad returns the accumulated gradient for this tensor. Returns nil if no gradient has been computed.

func (*GPUTensor) Item added in v0.6.0

func (t *GPUTensor) Item() float32

Item returns the single scalar value from a tensor. This is useful for extracting loss values during training. Panics if tensor has more than one element.

func (*GPUTensor) NumElements added in v0.6.0

func (t *GPUTensor) NumElements() int

NumElements returns the total number of elements in the tensor.

func (*GPUTensor) Release added in v0.6.0

func (t *GPUTensor) Release()

Release releases the GPU buffer and frees memory. This should be called when the tensor is no longer needed.

func (*GPUTensor) RequiresGrad added in v0.6.0

func (t *GPUTensor) RequiresGrad() bool

RequiresGrad returns whether this tensor requires gradient computation.

func (*GPUTensor) SetRequiresGrad added in v0.6.0

func (t *GPUTensor) SetRequiresGrad(requires bool) *GPUTensor

SetRequiresGrad sets whether this tensor requires gradient computation. Returns the tensor for method chaining. Note: PyTorch uses requires_grad_ (underscore suffix for in-place). In Go, we use SetRequiresGrad for clarity.

func (*GPUTensor) Shape added in v0.6.0

func (t *GPUTensor) Shape() tensor.Shape

Shape returns the tensor's shape.

func (*GPUTensor) ToCPU added in v0.6.0

func (t *GPUTensor) ToCPU() *tensor.RawTensor

ToCPU transfers tensor data from GPU to CPU memory. This is an expensive operation and should be used sparingly. Returns a new RawTensor with data copied from GPU.

func (*GPUTensor) ZeroGrad added in v0.6.0

func (t *GPUTensor) ZeroGrad()

ZeroGrad clears the accumulated gradient.

type MemoryStats

type MemoryStats struct {
	// Total bytes allocated since backend creation
	TotalAllocatedBytes uint64
	// Peak memory usage in bytes
	PeakMemoryBytes uint64
	// Number of currently active buffers
	ActiveBuffers int64
	// Buffer pool statistics
	PoolAllocated uint64
	PoolReleased  uint64
	PoolHits      uint64
	PoolMisses    uint64
	PooledBuffers int
}

MemoryStats represents GPU memory usage statistics.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL