gpu

package
v0.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 19, 2026 License: Apache-2.0 Imports: 6 Imported by: 0

README

Loom GPU Package

This package provides WebGPU acceleration for Loom neural network layers. It is designed to be a drop-in replacement for CPU layers where applicable, utilizing an optimized "Virtual Machine" architecture similar to Paragon v3.

Architecture

The GPU implementation uses a One Pipeline Per Layer approach with explicit resource binding. This avoids the complexity of generating a single monolithic shader for the entire network, while still maintaining high performance through separate dispatch.

Performance Notes

You may observe that for small configurations (e.g., shallow networks, small widths, Batch Size 1), the GPU implementation is slower than the CPU. This is expected behavior due to:

  1. Latency vs Throughput: GPUs are throughput devices. The overhead of submitting commands and transferring data (PCIe latency) dominates for small payloads. CPU has direct memory access (~nanoseconds), whereas GPU roundtrip is ~2-5ms.
  2. Setup Cost: Creating buffers and pipelines ("Mounting") takes significant time (hundreds of ms or seconds for shader compilation). This assumes the network is long-lived.

For larger batch sizes or deeper/wider networks, the parallel compute capabilities of the GPU will outscale the CPU.

Usage

import "github.com/openfluke/loom/gpu"

// Create specs
specs := []gpu.DenseLayerSpec{...}

// Create sequence
seq := gpu.NewDenseSequence(specs)
err := seq.Build() // Must call before Forward

// Execute
out, err := seq.Forward(input)

// Cleanup
seq.Cleanup()

Pipelining

For streaming applications, use ForwardPipelined. This submits command buffers incrementally, potentially allowing the CPU to prepare the next batch while the GPU processes the current layers (Overlap).

Documentation

Index

Constants

View Source
const (
	ActNone      = 0
	ActReLU      = 1
	ActLeakyReLU = 2
	ActSigmoid   = 3
	ActTanh      = 4
)

Activation constants matching ActCodeOf in generic sense

Variables

View Source
var Debug bool = false

Debug enables verbose logging for GPU operations

Functions

func EnsureGPU

func EnsureGPU() error

EnsureGPU ensures the GPU context is initialized

func Log

func Log(format string, args ...interface{})

Log prints a debug message if Debug is true

func NewFloatBuffer

func NewFloatBuffer(data []float32, usage wgpu.BufferUsage) (*wgpu.Buffer, error)

NewFloatBuffer creates a buffer with the given float32 data

func ReadBuffer

func ReadBuffer(buffer *wgpu.Buffer, size int) ([]float32, error)

ReadBuffer safely reads an entire buffer

func SetAdapterPreference

func SetAdapterPreference(name string)

SetAdapterPreference sets a substring to look for in adapter names

func SetDebug

func SetDebug(enabled bool)

SetDebug enables or disables verbose logging

Types

type Context

type Context struct {
	Instance *wgpu.Instance
	Adapter  *wgpu.Adapter
	Device   *wgpu.Device
	Queue    *wgpu.Queue
	// contains filtered or unexported fields
}

Context holds the single WebGPU context for the application

func GetContext

func GetContext() (*Context, error)

GetContext returns the singleton GPU context, initializing it if necessary

type Conv1DLayer

type Conv1DLayer struct {
	Spec Conv1DSpec

	BatchSize int // Number of samples per batch

	InputBuffer   *wgpu.Buffer
	OutputBuffer  *wgpu.Buffer
	StagingBuffer *wgpu.Buffer
	WeightBuffer  *wgpu.Buffer
	BiasBuffer    *wgpu.Buffer

	InputGradientBuffer  *wgpu.Buffer
	WeightGradientBuffer *wgpu.Buffer
	BiasGradientBuffer   *wgpu.Buffer

	// Gradient application bind groups (cached for training)
	GradientWeightBindGroup *wgpu.BindGroup
	GradientBiasBindGroup   *wgpu.BindGroup
	// contains filtered or unexported fields
}

Conv1DLayer holds GPU resources for 1D Convolution

func (*Conv1DLayer) AllocateBackwardBuffers

func (l *Conv1DLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*Conv1DLayer) AllocateBuffers

func (l *Conv1DLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*Conv1DLayer) Cleanup

func (l *Conv1DLayer) Cleanup()

func (*Conv1DLayer) Compile

func (l *Conv1DLayer) Compile(ctx *Context, labelPrefix string) error

func (*Conv1DLayer) CompileBackward

func (l *Conv1DLayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*Conv1DLayer) CreateBackwardBindGroup

func (l *Conv1DLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*Conv1DLayer) CreateBindGroup

func (l *Conv1DLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*Conv1DLayer) Dispatch

func (l *Conv1DLayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*Conv1DLayer) DispatchBackward

func (l *Conv1DLayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*Conv1DLayer) DownloadGradients

func (l *Conv1DLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*Conv1DLayer) DownloadWeights

func (l *Conv1DLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*Conv1DLayer) GenerateBackwardGradsShader

func (l *Conv1DLayer) GenerateBackwardGradsShader() string

func (*Conv1DLayer) GenerateBackwardShader

func (l *Conv1DLayer) GenerateBackwardShader() string

func (*Conv1DLayer) GenerateShader

func (l *Conv1DLayer) GenerateShader() string

func (*Conv1DLayer) GetInputBuffer

func (l *Conv1DLayer) GetInputBuffer() *wgpu.Buffer

func (*Conv1DLayer) GetInputGradientBuffer

func (l *Conv1DLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*Conv1DLayer) GetOutputBuffer

func (l *Conv1DLayer) GetOutputBuffer() *wgpu.Buffer

func (*Conv1DLayer) GetStagingBuffer

func (l *Conv1DLayer) GetStagingBuffer() *wgpu.Buffer

func (*Conv1DLayer) UploadWeights

func (l *Conv1DLayer) UploadWeights(ctx *Context)

func (*Conv1DLayer) ZeroGradients

func (l *Conv1DLayer) ZeroGradients(ctx *Context)

ZeroGradients for Conv1DLayer

type Conv1DSpec

type Conv1DSpec struct {
	InChannels  int       // Input channels
	OutChannels int       // Output channels (filters)
	KernelSize  int       // Kernel/filter size
	Stride      int       // Stride (default 1)
	Padding     int       // Padding (default 0)
	SeqLen      int       // Input sequence length
	Weights     []float32 // [OutChannels * InChannels * KernelSize]
	Bias        []float32 // [OutChannels]
	Activation  string    // "relu", "sigmoid", etc.
}

Conv1DSpec defines configuration for 1D Convolution layer

type Conv2DLayer

type Conv2DLayer struct {
	Spec Conv2DSpec

	BatchSize int // Number of samples per batch

	InputBuffer   *wgpu.Buffer
	OutputBuffer  *wgpu.Buffer
	StagingBuffer *wgpu.Buffer
	WeightBuffer  *wgpu.Buffer
	BiasBuffer    *wgpu.Buffer

	InputGradientBuffer  *wgpu.Buffer
	WeightGradientBuffer *wgpu.Buffer
	BiasGradientBuffer   *wgpu.Buffer
	// contains filtered or unexported fields
}

Conv2DLayer holds GPU resources for 2D Convolution

func (*Conv2DLayer) AllocateBackwardBuffers

func (l *Conv2DLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*Conv2DLayer) AllocateBuffers

func (l *Conv2DLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*Conv2DLayer) Cleanup

func (l *Conv2DLayer) Cleanup()

func (*Conv2DLayer) Compile

func (l *Conv2DLayer) Compile(ctx *Context, labelPrefix string) error

func (*Conv2DLayer) CompileBackward

func (l *Conv2DLayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*Conv2DLayer) CreateBackwardBindGroup

func (l *Conv2DLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*Conv2DLayer) CreateBindGroup

func (l *Conv2DLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*Conv2DLayer) Dispatch

func (l *Conv2DLayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*Conv2DLayer) DispatchBackward

func (l *Conv2DLayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*Conv2DLayer) DownloadGradients

func (l *Conv2DLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*Conv2DLayer) DownloadWeights

func (l *Conv2DLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*Conv2DLayer) GenerateBackwardGradsShader

func (l *Conv2DLayer) GenerateBackwardGradsShader() string

func (*Conv2DLayer) GenerateBackwardShader

func (l *Conv2DLayer) GenerateBackwardShader() string

func (*Conv2DLayer) GenerateShader

func (l *Conv2DLayer) GenerateShader() string

func (*Conv2DLayer) GetInputBuffer

func (l *Conv2DLayer) GetInputBuffer() *wgpu.Buffer

func (*Conv2DLayer) GetInputGradientBuffer

func (l *Conv2DLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*Conv2DLayer) GetOutputBuffer

func (l *Conv2DLayer) GetOutputBuffer() *wgpu.Buffer

func (*Conv2DLayer) GetStagingBuffer

func (l *Conv2DLayer) GetStagingBuffer() *wgpu.Buffer

func (*Conv2DLayer) UploadWeights

func (l *Conv2DLayer) UploadWeights(ctx *Context)

func (*Conv2DLayer) ZeroGradients

func (l *Conv2DLayer) ZeroGradients(ctx *Context)

ZeroGradients for Conv2DLayer (Empty as it overwrites d_input, no weight grads yet)

type Conv2DSpec

type Conv2DSpec struct {
	InChannels  int       // Input channels
	OutChannels int       // Output channels (filters)
	KernelSize  int       // Kernel size (squared)
	Stride      int       // Stride (default 1)
	Padding     int       // Padding (default 0)
	InputHeight int       // Input height
	InputWidth  int       // Input width
	Weights     []float32 // [OutChannels * InChannels * KernelSize * KernelSize]
	Bias        []float32 // [OutChannels]
	Activation  string    // "relu", "sigmoid", etc.
}

Conv2DSpec defines configuration for 2D Convolution layer

type DenseLayer

type DenseLayer struct {
	Spec      DenseLayerSpec
	BatchSize int

	InputBuffer   *wgpu.Buffer
	OutputBuffer  *wgpu.Buffer
	StagingBuffer *wgpu.Buffer // Only needed for last layer usually? Or debug.
	WeightBuffer  *wgpu.Buffer
	BiasBuffer    *wgpu.Buffer

	// Backward Buffers
	WeightGradientBuffer *wgpu.Buffer
	BiasGradientBuffer   *wgpu.Buffer
	InputGradientBuffer  *wgpu.Buffer // Computed gradient w.r.t input (dL/dInput)

	// Gradient application bind groups (cached for training)
	GradientWeightBindGroup *wgpu.BindGroup
	GradientBiasBindGroup   *wgpu.BindGroup

	WorkgroupsX uint32
	// contains filtered or unexported fields
}

DenseLayer holds resources for a single layer execution

func (*DenseLayer) AllocateBackwardBuffers

func (l *DenseLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*DenseLayer) AllocateBuffers

func (l *DenseLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*DenseLayer) Cleanup

func (l *DenseLayer) Cleanup()

func (*DenseLayer) Compile

func (l *DenseLayer) Compile(ctx *Context, labelPrefix string) error

func (*DenseLayer) CompileBackward

func (l *DenseLayer) CompileBackward(ctx *Context, labelPrefix string) error

CompileBackward creates pipelines for backward pass CompileBackward creates pipelines for backward pass with Explicit Layouts

func (*DenseLayer) CreateBackwardBindGroup

func (l *DenseLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

CreateBackwardBindGroup creates bind groups for backward pass

func (*DenseLayer) CreateBindGroup

func (l *DenseLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*DenseLayer) Dispatch

func (l *DenseLayer) Dispatch(pass *wgpu.ComputePassEncoder)

Dispatch records the compute pass for this layer

func (*DenseLayer) DispatchBackward

func (l *DenseLayer) DispatchBackward(enc *wgpu.CommandEncoder)

DispatchBackward adds backward pass commands to encoder

func (*DenseLayer) DownloadGradients

func (l *DenseLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*DenseLayer) DownloadWeights

func (l *DenseLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*DenseLayer) GenerateShader

func (l *DenseLayer) GenerateShader() string

generateShaderForLayer creates WGSL for a specific layer configuration

func (*DenseLayer) GetDZBuffer

func (l *DenseLayer) GetDZBuffer() *wgpu.Buffer

func (*DenseLayer) GetInputBuffer

func (l *DenseLayer) GetInputBuffer() *wgpu.Buffer

Interface Implementation

func (*DenseLayer) GetInputGradientBuffer

func (l *DenseLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*DenseLayer) GetOutputBuffer

func (l *DenseLayer) GetOutputBuffer() *wgpu.Buffer

func (*DenseLayer) GetStagingBuffer

func (l *DenseLayer) GetStagingBuffer() *wgpu.Buffer

func (*DenseLayer) UploadWeights

func (l *DenseLayer) UploadWeights(ctx *Context)

func (*DenseLayer) ZeroGradients

func (l *DenseLayer) ZeroGradients(ctx *Context)

ZeroGradients for DenseLayer (Empty as backward overwrites d_weights and d_bias)

type DenseLayerSpec

type DenseLayerSpec struct {
	InputSize  int
	OutputSize int
	Activation int       // ActXXX constant
	Weights    []float32 // Flattened [OutputSize * InputSize]
	Biases     []float32 // [OutputSize]
}

DenseLayerSpec defines the configuration for a single dense layer

type DenseSequence

type DenseSequence struct {
	Layers []*DenseLayer
	Debug  bool
}

DenseSequence manages a sequence of dense layers executed on GPU

func NewDenseSequence

func NewDenseSequence(specs []DenseLayerSpec) *DenseSequence

NewDenseSequence creates a new sequence handler

func (*DenseSequence) Build

func (s *DenseSequence) Build() error

Build initializes all GPU resources for all layers

func (*DenseSequence) Cleanup

func (s *DenseSequence) Cleanup()

Cleanup releases resources

func (*DenseSequence) Forward

func (s *DenseSequence) Forward(input []float32) ([]float32, error)

Forward executes the sequence on GPU

func (*DenseSequence) ForwardPipelined

func (s *DenseSequence) ForwardPipelined(input []float32) ([]float32, error)

ForwardPipelined executes the sequence using multiple command encoders

type EmbeddingLayer

type EmbeddingLayer struct {
	Spec EmbeddingSpec

	TokenBuffer   *wgpu.Buffer // Input token IDs (u32)
	OutputBuffer  *wgpu.Buffer // Output embeddings
	WeightBuffer  *wgpu.Buffer // Embedding weights
	StagingBuffer *wgpu.Buffer

	// Backward
	WeightGradientBuffer *wgpu.Buffer
	// contains filtered or unexported fields
}

EmbeddingLayer holds GPU resources for Embedding lookup

func (*EmbeddingLayer) AllocateBackwardBuffers

func (l *EmbeddingLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*EmbeddingLayer) AllocateBuffers

func (l *EmbeddingLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*EmbeddingLayer) Cleanup

func (l *EmbeddingLayer) Cleanup()

func (*EmbeddingLayer) Compile

func (l *EmbeddingLayer) Compile(ctx *Context, labelPrefix string) error

func (*EmbeddingLayer) CompileBackward

func (l *EmbeddingLayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*EmbeddingLayer) CreateBackwardBindGroup

func (l *EmbeddingLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*EmbeddingLayer) CreateBindGroup

func (l *EmbeddingLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*EmbeddingLayer) Dispatch

func (l *EmbeddingLayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*EmbeddingLayer) DispatchBackward

func (l *EmbeddingLayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*EmbeddingLayer) DownloadGradients

func (l *EmbeddingLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*EmbeddingLayer) DownloadWeights

func (l *EmbeddingLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*EmbeddingLayer) GenerateBackwardShader

func (l *EmbeddingLayer) GenerateBackwardShader() string

func (*EmbeddingLayer) GenerateShader

func (l *EmbeddingLayer) GenerateShader() string

func (*EmbeddingLayer) GetInputBuffer

func (l *EmbeddingLayer) GetInputBuffer() *wgpu.Buffer

Interface implementations

func (*EmbeddingLayer) GetInputGradientBuffer

func (l *EmbeddingLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*EmbeddingLayer) GetOutputBuffer

func (l *EmbeddingLayer) GetOutputBuffer() *wgpu.Buffer

func (*EmbeddingLayer) GetStagingBuffer

func (l *EmbeddingLayer) GetStagingBuffer() *wgpu.Buffer

func (*EmbeddingLayer) UploadWeights

func (l *EmbeddingLayer) UploadWeights(ctx *Context)

func (*EmbeddingLayer) ZeroGradients

func (l *EmbeddingLayer) ZeroGradients(ctx *Context)

type EmbeddingSpec

type EmbeddingSpec struct {
	VocabSize    int       // Number of tokens in vocabulary
	EmbeddingDim int       // Dimension of each embedding vector
	SeqLength    int       // Sequence length (number of tokens to lookup)
	Weights      []float32 // [VocabSize * EmbeddingDim] - flattened embedding table
}

EmbeddingSpec defines configuration for Embedding layer

type GPULayer

type GPULayer interface {
	// Initialization
	AllocateBuffers(ctx *Context, labelPrefix string) error
	AllocateBackwardBuffers(ctx *Context, labelPrefix string) error // Ensure this exists or make optional?
	Compile(ctx *Context, labelPrefix string) error
	CompileBackward(ctx *Context, labelPrefix string) error
	CreateBindGroup(ctx *Context, labelPrefix string) error
	CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

	// Execution
	Dispatch(pass *wgpu.ComputePassEncoder)
	DispatchBackward(enc *wgpu.CommandEncoder)

	// Data Transfer
	UploadWeights(ctx *Context)
	DownloadWeights(ctx *Context) ([]float32, []float32, error) // Might need to be generic or interface{}? Or just return raw buffers?
	// For verification, we usually need specific things.
	// Dense: W, B
	// LayerNorm: Gamma, Beta
	// Let's keep it specific or use specialized accessors.
	// Actually for DownloadGradients, we return (grad1, grad2, gradInput).
	// Let's use DownloadGradients() ([]float32, []float32, []float32, error)
	// assuming most layers have at most 2 learnable params + input grad.
	DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

	// Resource Access (for chaining)
	GetInputBuffer() *wgpu.Buffer
	GetOutputBuffer() *wgpu.Buffer
	GetStagingBuffer() *wgpu.Buffer
	GetInputGradientBuffer() *wgpu.Buffer // For Backward Chaining
	ZeroGradients(ctx *Context)

	Cleanup()
}

GPULayer is the common interface for all GPU-accelerated layers

type InPlaceResidual

type InPlaceResidual struct {
	Size int
	// contains filtered or unexported fields
}

InPlaceResidual performs out += skip

func NewInPlaceResidual

func NewInPlaceResidual(ctx *Context, size int) (*InPlaceResidual, error)

func (*InPlaceResidual) Cleanup

func (r *InPlaceResidual) Cleanup()

func (*InPlaceResidual) Dispatch

func (r *InPlaceResidual) Dispatch(ctx *Context, pass *wgpu.ComputePassEncoder, outBuf, skipBuf *wgpu.Buffer) (*wgpu.BindGroup, error)

func (*InPlaceResidual) GetBindGroup

func (r *InPlaceResidual) GetBindGroup(ctx *Context, outBuf, skipBuf *wgpu.Buffer) (*wgpu.BindGroup, error)

type LSTMLayer

type LSTMLayer struct {
	Spec LSTMSpec

	BatchSize int // Number of samples per batch

	InputBuffer   *wgpu.Buffer // [SeqLen * InputSize]
	OutputBuffer  *wgpu.Buffer // [SeqLen * HiddenSize]
	StagingBuffer *wgpu.Buffer
	HiddenBuffer  *wgpu.Buffer   // Hidden state [BatchSize * HiddenSize]
	CellBuffer    *wgpu.Buffer   // Cell state [BatchSize * SeqLen * HiddenSize]
	StepBuffers   []*wgpu.Buffer // Uniform buffers for each step

	// Unified Weight Buffer (concatenated [IH, HH, Bias])
	UnifiedWeightsBuffer         *wgpu.Buffer
	UnifiedWeightsGradientBuffer *wgpu.Buffer

	InputGradientBuffer *wgpu.Buffer

	// Gate Gradients Storage [SeqLen * 4 * HiddenSize * BatchSize]
	// Stores calculated delta (dL/dZ) for all 4 gates for all steps
	GateGradientsBuffer *wgpu.Buffer

	// Gradient application bind groups (cached for training)
	GradCombinedWeightsIHBindGroup *wgpu.BindGroup
	GradCombinedWeightsHHBindGroup *wgpu.BindGroup
	GradCombinedBiasesBindGroup    *wgpu.BindGroup
	// contains filtered or unexported fields
}

LSTMLayer holds GPU resources for LSTM Note: LSTM is inherently sequential across time steps but parallel within each step We process one time step per dispatch to avoid cross-workgroup synchronization issues

func (*LSTMLayer) AllocateBackwardBuffers

func (l *LSTMLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*LSTMLayer) AllocateBuffers

func (l *LSTMLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*LSTMLayer) Cleanup

func (l *LSTMLayer) Cleanup()

func (*LSTMLayer) Compile

func (l *LSTMLayer) Compile(ctx *Context, labelPrefix string) error

func (*LSTMLayer) CompileBackward

func (l *LSTMLayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*LSTMLayer) CreateBackwardBindGroup

func (l *LSTMLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*LSTMLayer) CreateBindGroup

func (l *LSTMLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*LSTMLayer) Dispatch

func (l *LSTMLayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*LSTMLayer) DispatchBackward

func (l *LSTMLayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*LSTMLayer) DownloadGradients

func (l *LSTMLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*LSTMLayer) DownloadWeights

func (l *LSTMLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*LSTMLayer) GenerateBackwardGateShader

func (l *LSTMLayer) GenerateBackwardGateShader() string

func (*LSTMLayer) GenerateBackwardGradsShader

func (l *LSTMLayer) GenerateBackwardGradsShader() string

func (*LSTMLayer) GenerateBackwardPrevShader

func (l *LSTMLayer) GenerateBackwardPrevShader() string

func (*LSTMLayer) GenerateShader

func (l *LSTMLayer) GenerateShader() string

func (*LSTMLayer) GetInputBuffer

func (l *LSTMLayer) GetInputBuffer() *wgpu.Buffer

func (*LSTMLayer) GetInputGradientBuffer

func (l *LSTMLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*LSTMLayer) GetOutputBuffer

func (l *LSTMLayer) GetOutputBuffer() *wgpu.Buffer

func (*LSTMLayer) GetStagingBuffer

func (l *LSTMLayer) GetStagingBuffer() *wgpu.Buffer

func (*LSTMLayer) UploadWeights

func (l *LSTMLayer) UploadWeights(ctx *Context)

func (*LSTMLayer) ZeroGradients

func (l *LSTMLayer) ZeroGradients(ctx *Context)

type LSTMSpec

type LSTMSpec struct {
	InputSize  int // Input feature size
	HiddenSize int // Hidden state size
	SeqLen     int // Sequence length

	// 4 gates: input (i), forget (f), cell (g), output (o)
	WeightIH_i []float32 // [HiddenSize * InputSize]
	WeightHH_i []float32 // [HiddenSize * HiddenSize]
	BiasH_i    []float32 // [HiddenSize]

	WeightIH_f []float32
	WeightHH_f []float32
	BiasH_f    []float32

	WeightIH_g []float32
	WeightHH_g []float32
	BiasH_g    []float32

	WeightIH_o []float32
	WeightHH_o []float32
	BiasH_o    []float32
}

LSTMSpec defines configuration for LSTM layer

type LayerNormLayer

type LayerNormLayer struct {
	Spec LayerNormSpec

	InputBuffer   *wgpu.Buffer
	OutputBuffer  *wgpu.Buffer
	StagingBuffer *wgpu.Buffer
	GammaBuffer   *wgpu.Buffer
	BetaBuffer    *wgpu.Buffer

	// Backward
	GammaGradientBuffer *wgpu.Buffer
	BetaGradientBuffer  *wgpu.Buffer
	InputGradientBuffer *wgpu.Buffer

	// Intermediate buffers for atomic-free reduction
	GammaBatchGradientBuffer *wgpu.Buffer
	BetaBatchGradientBuffer  *wgpu.Buffer

	WorkgroupsX uint32
	// contains filtered or unexported fields
}

func (*LayerNormLayer) AllocateBackwardBuffers

func (l *LayerNormLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*LayerNormLayer) AllocateBuffers

func (l *LayerNormLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*LayerNormLayer) Cleanup

func (l *LayerNormLayer) Cleanup()

func (*LayerNormLayer) Compile

func (l *LayerNormLayer) Compile(ctx *Context, labelPrefix string) error

func (*LayerNormLayer) CompileBackward

func (l *LayerNormLayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*LayerNormLayer) CreateBackwardBindGroup

func (l *LayerNormLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*LayerNormLayer) CreateBindGroup

func (l *LayerNormLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*LayerNormLayer) Dispatch

func (l *LayerNormLayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*LayerNormLayer) DispatchBackward

func (l *LayerNormLayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*LayerNormLayer) DownloadGradients

func (l *LayerNormLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*LayerNormLayer) DownloadWeights

func (l *LayerNormLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*LayerNormLayer) GenerateBackwardShader

func (l *LayerNormLayer) GenerateBackwardShader() string

func (*LayerNormLayer) GenerateReduceShader

func (l *LayerNormLayer) GenerateReduceShader() string

func (*LayerNormLayer) GenerateShader

func (l *LayerNormLayer) GenerateShader() string

func (*LayerNormLayer) GetInputBuffer

func (l *LayerNormLayer) GetInputBuffer() *wgpu.Buffer

Implement GPULayer Interface

func (*LayerNormLayer) GetInputGradientBuffer

func (l *LayerNormLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*LayerNormLayer) GetOutputBuffer

func (l *LayerNormLayer) GetOutputBuffer() *wgpu.Buffer

func (*LayerNormLayer) GetStagingBuffer

func (l *LayerNormLayer) GetStagingBuffer() *wgpu.Buffer

func (*LayerNormLayer) UploadWeights

func (l *LayerNormLayer) UploadWeights(ctx *Context)

func (*LayerNormLayer) ZeroGradients

func (l *LayerNormLayer) ZeroGradients(ctx *Context)

ZeroGradients zeroes gradient buffers before backward pass (required for atomic accumulation)

type LayerNormSpec

type LayerNormSpec struct {
	NormSize  int
	BatchSize int // Number of vectors to normalize (default 1)
	Epsilon   float32
	Gamma     []float32 // [NormSize]
	Beta      []float32 // [NormSize]
}

type MHALayer

type MHALayer struct {
	Spec MHASpec

	BatchSize int // Number of samples per batch

	// Buffers
	InputBuffer   *wgpu.Buffer
	OutputBuffer  *wgpu.Buffer
	StagingBuffer *wgpu.Buffer

	CombinedWeightsQKV *wgpu.Buffer
	CombinedBiasesQKV  *wgpu.Buffer
	OWeightBuffer      *wgpu.Buffer
	OBiasBuffer        *wgpu.Buffer

	QBuffer    *wgpu.Buffer // Projected queries
	KBuffer    *wgpu.Buffer // Projected keys
	VBuffer    *wgpu.Buffer // Projected values
	AttnBuffer *wgpu.Buffer // Attention output

	ParamsBuffer *wgpu.Buffer // Uniforms: [ActualSeqLen, ...]

	InputGradientBuffer *wgpu.Buffer
	// contains filtered or unexported fields
}

MHALayer holds GPU resources for Multi-Head Attention

func (*MHALayer) AllocateBackwardBuffers

func (l *MHALayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*MHALayer) AllocateBuffers

func (l *MHALayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*MHALayer) Cleanup

func (l *MHALayer) Cleanup()

func (*MHALayer) Compile

func (l *MHALayer) Compile(ctx *Context, labelPrefix string) error

func (*MHALayer) CompileBackward

func (l *MHALayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*MHALayer) CreateBackwardBindGroup

func (l *MHALayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*MHALayer) CreateBindGroup

func (l *MHALayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*MHALayer) Dispatch

func (l *MHALayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*MHALayer) DispatchBackward

func (l *MHALayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*MHALayer) DispatchFull

func (l *MHALayer) DispatchFull(enc *wgpu.CommandEncoder)

func (*MHALayer) DownloadGradients

func (l *MHALayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*MHALayer) DownloadWeights

func (l *MHALayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*MHALayer) GenerateAttnShader

func (l *MHALayer) GenerateAttnShader() string

func (*MHALayer) GenerateBackwardShader

func (l *MHALayer) GenerateBackwardShader() string

func (*MHALayer) GenerateOutShader

func (l *MHALayer) GenerateOutShader() string

func (*MHALayer) GenerateQKVShader

func (l *MHALayer) GenerateQKVShader() string

func (*MHALayer) GetInputBuffer

func (l *MHALayer) GetInputBuffer() *wgpu.Buffer

func (*MHALayer) GetInputGradientBuffer

func (l *MHALayer) GetInputGradientBuffer() *wgpu.Buffer

func (*MHALayer) GetOutputBuffer

func (l *MHALayer) GetOutputBuffer() *wgpu.Buffer

func (*MHALayer) GetStagingBuffer

func (l *MHALayer) GetStagingBuffer() *wgpu.Buffer

func (*MHALayer) SetActualSeqLen

func (l *MHALayer) SetActualSeqLen(ctx *Context, length int)

func (*MHALayer) UploadWeights

func (l *MHALayer) UploadWeights(ctx *Context)

func (*MHALayer) ZeroGradients

func (l *MHALayer) ZeroGradients(ctx *Context)

ZeroGradients for MHALayer (Empty as it overwrites)

type MHASpec

type MHASpec struct {
	DModel       int       // Model dimension (embedding size)
	NumHeads     int       // Number of attention heads
	NumKVHeads   int       // Number of key/value heads (for GQA)
	SeqLen       int       // Sequence length
	HeadDim      int       // Dimension per head (DModel / NumHeads)
	QWeights     []float32 // Query projection [DModel * DModel]
	KWeights     []float32 // Key projection [DModel * D_KV]
	VWeights     []float32 // Value projection [DModel * D_KV]
	OWeights     []float32 // Output projection [DModel * DModel]
	QBias        []float32 // [DModel]
	KBias        []float32 // [D_KV]
	VBias        []float32 // [D_KV]
	OBias        []float32 // [DModel]
	RoPEFreqBase float32   // Base frequency for RoPE (default 10000.0)
}

MHASpec defines configuration for Multi-Head Attention layer

type ParallelLayer

type ParallelLayer struct {
	CombineMode string // "concat", "add", "avg"
	Branches    []GPULayer

	// Resources
	InputBuffer   *wgpu.Buffer
	OutputBuffer  *wgpu.Buffer
	StagingBuffer *wgpu.Buffer

	// Backward Resources
	InputGradientBuffer *wgpu.Buffer

	BatchSize int
	// contains filtered or unexported fields
}

ParallelLayer orchestrates execution of multiple branches

func NewParallelLayer

func NewParallelLayer(branches []GPULayer, mode string, batchSize int) *ParallelLayer

NewParallelLayer creates a new parallel container

func (*ParallelLayer) AllocateBackwardBuffers

func (l *ParallelLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*ParallelLayer) AllocateBuffers

func (l *ParallelLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*ParallelLayer) Cleanup

func (l *ParallelLayer) Cleanup()

func (*ParallelLayer) Compile

func (l *ParallelLayer) Compile(ctx *Context, labelPrefix string) error

func (*ParallelLayer) CompileBackward

func (l *ParallelLayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*ParallelLayer) CreateBackwardBindGroup

func (l *ParallelLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*ParallelLayer) CreateBindGroup

func (l *ParallelLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*ParallelLayer) Dispatch

func (l *ParallelLayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*ParallelLayer) DispatchBackward

func (l *ParallelLayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*ParallelLayer) DownloadGradients

func (l *ParallelLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*ParallelLayer) DownloadWeights

func (l *ParallelLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*ParallelLayer) GetInputBuffer

func (l *ParallelLayer) GetInputBuffer() *wgpu.Buffer

func (*ParallelLayer) GetInputGradientBuffer

func (l *ParallelLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*ParallelLayer) GetOutputBuffer

func (l *ParallelLayer) GetOutputBuffer() *wgpu.Buffer

func (*ParallelLayer) GetStagingBuffer

func (l *ParallelLayer) GetStagingBuffer() *wgpu.Buffer

func (*ParallelLayer) UploadWeights

func (l *ParallelLayer) UploadWeights(ctx *Context)

func (*ParallelLayer) ZeroGradients

func (l *ParallelLayer) ZeroGradients(ctx *Context)

type ParallelLayerExtras

type ParallelLayerExtras struct {
	// contains filtered or unexported fields
}

Extra fields helper

type RMSNormLayer

type RMSNormLayer struct {
	Spec RMSNormSpec

	InputBuffer   *wgpu.Buffer
	OutputBuffer  *wgpu.Buffer
	StagingBuffer *wgpu.Buffer
	GammaBuffer   *wgpu.Buffer

	// Backward
	GammaGradientBuffer *wgpu.Buffer
	InputGradientBuffer *wgpu.Buffer

	// NEW: Batch Gradients for atomic-free reduction
	GammaBatchGradientBuffer *wgpu.Buffer
	// contains filtered or unexported fields
}

RMSNormLayer holds GPU resources for RMS Normalization

func (*RMSNormLayer) AllocateBackwardBuffers

func (l *RMSNormLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*RMSNormLayer) AllocateBuffers

func (l *RMSNormLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*RMSNormLayer) Cleanup

func (l *RMSNormLayer) Cleanup()

func (*RMSNormLayer) Compile

func (l *RMSNormLayer) Compile(ctx *Context, labelPrefix string) error

func (*RMSNormLayer) CompileBackward

func (l *RMSNormLayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*RMSNormLayer) CreateBackwardBindGroup

func (l *RMSNormLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*RMSNormLayer) CreateBindGroup

func (l *RMSNormLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*RMSNormLayer) Dispatch

func (l *RMSNormLayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*RMSNormLayer) DispatchBackward

func (l *RMSNormLayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*RMSNormLayer) DownloadGradients

func (l *RMSNormLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*RMSNormLayer) DownloadWeights

func (l *RMSNormLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*RMSNormLayer) GenerateBackwardShader

func (l *RMSNormLayer) GenerateBackwardShader() string

func (*RMSNormLayer) GenerateReduceShader

func (l *RMSNormLayer) GenerateReduceShader() string

func (*RMSNormLayer) GenerateShader

func (l *RMSNormLayer) GenerateShader() string

func (*RMSNormLayer) GetInputBuffer

func (l *RMSNormLayer) GetInputBuffer() *wgpu.Buffer

Interface implementations

func (*RMSNormLayer) GetInputGradientBuffer

func (l *RMSNormLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*RMSNormLayer) GetOutputBuffer

func (l *RMSNormLayer) GetOutputBuffer() *wgpu.Buffer

func (*RMSNormLayer) GetStagingBuffer

func (l *RMSNormLayer) GetStagingBuffer() *wgpu.Buffer

func (*RMSNormLayer) UploadWeights

func (l *RMSNormLayer) UploadWeights(ctx *Context)

func (*RMSNormLayer) ZeroGradients

func (l *RMSNormLayer) ZeroGradients(ctx *Context)

type RMSNormSpec

type RMSNormSpec struct {
	NormSize  int
	BatchSize int // Number of vectors to normalize (default 1)
	Epsilon   float32
	Gamma     []float32 // [NormSize] - scale parameters
}

RMSNormSpec defines configuration for RMS Normalization layer RMSNorm is simpler than LayerNorm - only gamma, no beta, no mean subtraction

type RNNLayer

type RNNLayer struct {
	Spec RNNSpec

	BatchSize int // Number of samples per batch

	InputBuffer    *wgpu.Buffer // [SeqLen * InputSize]
	OutputBuffer   *wgpu.Buffer // [SeqLen * HiddenSize]
	StagingBuffer  *wgpu.Buffer
	HiddenBuffer   *wgpu.Buffer   // Current hidden state [HiddenSize]
	StepBuffers    []*wgpu.Buffer // Uniform buffers for each step
	WeightIHBuffer *wgpu.Buffer
	WeightHHBuffer *wgpu.Buffer
	BiasBuffer     *wgpu.Buffer

	InputGradientBuffer    *wgpu.Buffer
	WeightIHGradientBuffer *wgpu.Buffer
	WeightHHGradientBuffer *wgpu.Buffer
	BiasGradientBuffer     *wgpu.Buffer
	// contains filtered or unexported fields
}

RNNLayer holds GPU resources for RNN Note: RNN is inherently sequential across time steps but parallel within each step We process one time step per dispatch to avoid cross-workgroup synchronization issues

func (*RNNLayer) AllocateBackwardBuffers

func (l *RNNLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*RNNLayer) AllocateBuffers

func (l *RNNLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*RNNLayer) Cleanup

func (l *RNNLayer) Cleanup()

func (*RNNLayer) Compile

func (l *RNNLayer) Compile(ctx *Context, labelPrefix string) error

func (*RNNLayer) CompileBackward

func (l *RNNLayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*RNNLayer) CreateBackwardBindGroup

func (l *RNNLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*RNNLayer) CreateBindGroup

func (l *RNNLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*RNNLayer) Dispatch

func (l *RNNLayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*RNNLayer) DispatchBackward

func (l *RNNLayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*RNNLayer) DownloadGradients

func (l *RNNLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*RNNLayer) DownloadWeights

func (l *RNNLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*RNNLayer) GenerateBackwardGradsShader

func (l *RNNLayer) GenerateBackwardGradsShader() string

func (*RNNLayer) GenerateBackwardShader

func (l *RNNLayer) GenerateBackwardShader() string

func (*RNNLayer) GenerateShader

func (l *RNNLayer) GenerateShader() string

func (*RNNLayer) GetInputBuffer

func (l *RNNLayer) GetInputBuffer() *wgpu.Buffer

func (*RNNLayer) GetInputGradientBuffer

func (l *RNNLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*RNNLayer) GetOutputBuffer

func (l *RNNLayer) GetOutputBuffer() *wgpu.Buffer

func (*RNNLayer) GetStagingBuffer

func (l *RNNLayer) GetStagingBuffer() *wgpu.Buffer

func (*RNNLayer) UploadWeights

func (l *RNNLayer) UploadWeights(ctx *Context)

func (*RNNLayer) ZeroGradients

func (l *RNNLayer) ZeroGradients(ctx *Context)

ZeroGradients for RNNLayer (Empty as it overwrites)

type RNNSpec

type RNNSpec struct {
	InputSize  int       // Input feature size
	HiddenSize int       // Hidden state size
	SeqLen     int       // Sequence length
	WeightIH   []float32 // Input-to-hidden weights [HiddenSize * InputSize]
	WeightHH   []float32 // Hidden-to-hidden weights [HiddenSize * HiddenSize]
	BiasH      []float32 // Hidden bias [HiddenSize]
}

RNNSpec defines configuration for RNN layer

type ResidualLayer

type ResidualLayer struct {
	Spec ResidualSpec

	InputBuffer   *wgpu.Buffer // Primary input
	SkipBuffer    *wgpu.Buffer // Residual/skip input
	OutputBuffer  *wgpu.Buffer
	StagingBuffer *wgpu.Buffer

	// Backward
	InputGradientBuffer *wgpu.Buffer
	SkipGradientBuffer  *wgpu.Buffer
	// contains filtered or unexported fields
}

ResidualLayer holds GPU resources for Residual addition

func (*ResidualLayer) AllocateBackwardBuffers

func (l *ResidualLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*ResidualLayer) AllocateBuffers

func (l *ResidualLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*ResidualLayer) Cleanup

func (l *ResidualLayer) Cleanup()

func (*ResidualLayer) Compile

func (l *ResidualLayer) Compile(ctx *Context, labelPrefix string) error

func (*ResidualLayer) CompileBackward

func (l *ResidualLayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*ResidualLayer) CreateBackwardBindGroup

func (l *ResidualLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*ResidualLayer) CreateBindGroup

func (l *ResidualLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*ResidualLayer) Dispatch

func (l *ResidualLayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*ResidualLayer) DispatchBackward

func (l *ResidualLayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*ResidualLayer) DownloadGradients

func (l *ResidualLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*ResidualLayer) DownloadWeights

func (l *ResidualLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*ResidualLayer) GenerateBackwardShader

func (l *ResidualLayer) GenerateBackwardShader() string

func (*ResidualLayer) GenerateShader

func (l *ResidualLayer) GenerateShader() string

func (*ResidualLayer) GetInputBuffer

func (l *ResidualLayer) GetInputBuffer() *wgpu.Buffer

Interface implementations

func (*ResidualLayer) GetInputGradientBuffer

func (l *ResidualLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*ResidualLayer) GetOutputBuffer

func (l *ResidualLayer) GetOutputBuffer() *wgpu.Buffer

func (*ResidualLayer) GetSkipGradientBuffer

func (l *ResidualLayer) GetSkipGradientBuffer() *wgpu.Buffer

func (*ResidualLayer) GetStagingBuffer

func (l *ResidualLayer) GetStagingBuffer() *wgpu.Buffer

func (*ResidualLayer) UploadWeights

func (l *ResidualLayer) UploadWeights(ctx *Context)

func (*ResidualLayer) ZeroGradients

func (l *ResidualLayer) ZeroGradients(ctx *Context)

ZeroGradients for ResidualLayer (Empty as it overwrites)

type ResidualSpec

type ResidualSpec struct {
	Size int // Number of elements
}

ResidualSpec defines configuration for Residual (skip connection) layer

type SoftmaxLayer

type SoftmaxLayer struct {
	Spec SoftmaxSpec

	InputBuffer   *wgpu.Buffer
	OutputBuffer  *wgpu.Buffer
	StagingBuffer *wgpu.Buffer

	// Backward
	InputGradientBuffer *wgpu.Buffer
	// contains filtered or unexported fields
}

SoftmaxLayer holds GPU resources for Softmax

func (*SoftmaxLayer) AllocateBackwardBuffers

func (l *SoftmaxLayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*SoftmaxLayer) AllocateBuffers

func (l *SoftmaxLayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*SoftmaxLayer) Cleanup

func (l *SoftmaxLayer) Cleanup()

func (*SoftmaxLayer) Compile

func (l *SoftmaxLayer) Compile(ctx *Context, labelPrefix string) error

func (*SoftmaxLayer) CompileBackward

func (l *SoftmaxLayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*SoftmaxLayer) CreateBackwardBindGroup

func (l *SoftmaxLayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*SoftmaxLayer) CreateBindGroup

func (l *SoftmaxLayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*SoftmaxLayer) Dispatch

func (l *SoftmaxLayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*SoftmaxLayer) DispatchBackward

func (l *SoftmaxLayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*SoftmaxLayer) DownloadGradients

func (l *SoftmaxLayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*SoftmaxLayer) DownloadWeights

func (l *SoftmaxLayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*SoftmaxLayer) GenerateBackwardShader

func (l *SoftmaxLayer) GenerateBackwardShader() string

func (*SoftmaxLayer) GenerateShader

func (l *SoftmaxLayer) GenerateShader() string

func (*SoftmaxLayer) GetInputBuffer

func (l *SoftmaxLayer) GetInputBuffer() *wgpu.Buffer

Interface implementations

func (*SoftmaxLayer) GetInputGradientBuffer

func (l *SoftmaxLayer) GetInputGradientBuffer() *wgpu.Buffer

func (*SoftmaxLayer) GetOutputBuffer

func (l *SoftmaxLayer) GetOutputBuffer() *wgpu.Buffer

func (*SoftmaxLayer) GetStagingBuffer

func (l *SoftmaxLayer) GetStagingBuffer() *wgpu.Buffer

func (*SoftmaxLayer) UploadWeights

func (l *SoftmaxLayer) UploadWeights(ctx *Context)

func (*SoftmaxLayer) ZeroGradients

func (l *SoftmaxLayer) ZeroGradients(ctx *Context)

ZeroGradients for SoftmaxLayer (Empty as it overwrites)

type SoftmaxSpec

type SoftmaxSpec struct {
	Size        int     // Number of elements per softmax operation
	BatchSize   int     // Number of independent softmax operations
	Temperature float32 // Temperature scaling (default 1.0)
}

SoftmaxSpec defines configuration for Softmax layer

type SwiGLULayer

type SwiGLULayer struct {
	Spec SwiGLUSpec

	// Buffers
	InputBuffer        *wgpu.Buffer
	OutputBuffer       *wgpu.Buffer
	StagingBuffer      *wgpu.Buffer
	GateWeightBuffer   *wgpu.Buffer
	UpWeightBuffer     *wgpu.Buffer
	DownWeightBuffer   *wgpu.Buffer
	GateBiasBuffer     *wgpu.Buffer
	UpBiasBuffer       *wgpu.Buffer
	DownBiasBuffer     *wgpu.Buffer
	GateOutBuffer      *wgpu.Buffer // Intermediate: gate projection output
	UpOutBuffer        *wgpu.Buffer // Intermediate: up projection output
	IntermediateBuffer *wgpu.Buffer // After activation

	// Backward (simplified - just input gradient for now)
	InputGradientBuffer *wgpu.Buffer
	// contains filtered or unexported fields
}

SwiGLULayer holds GPU resources for SwiGLU

func (*SwiGLULayer) AllocateBackwardBuffers

func (l *SwiGLULayer) AllocateBackwardBuffers(ctx *Context, labelPrefix string) error

func (*SwiGLULayer) AllocateBuffers

func (l *SwiGLULayer) AllocateBuffers(ctx *Context, labelPrefix string) error

func (*SwiGLULayer) Cleanup

func (l *SwiGLULayer) Cleanup()

func (*SwiGLULayer) Compile

func (l *SwiGLULayer) Compile(ctx *Context, labelPrefix string) error

func (*SwiGLULayer) CompileBackward

func (l *SwiGLULayer) CompileBackward(ctx *Context, labelPrefix string) error

func (*SwiGLULayer) CreateBackwardBindGroup

func (l *SwiGLULayer) CreateBackwardBindGroup(ctx *Context, labelPrefix string, dOutputBuffer *wgpu.Buffer) error

func (*SwiGLULayer) CreateBindGroup

func (l *SwiGLULayer) CreateBindGroup(ctx *Context, labelPrefix string) error

func (*SwiGLULayer) Dispatch

func (l *SwiGLULayer) Dispatch(pass *wgpu.ComputePassEncoder)

func (*SwiGLULayer) DispatchBackward

func (l *SwiGLULayer) DispatchBackward(enc *wgpu.CommandEncoder)

func (*SwiGLULayer) DispatchFull

func (l *SwiGLULayer) DispatchFull(enc *wgpu.CommandEncoder)

func (*SwiGLULayer) DownloadGradients

func (l *SwiGLULayer) DownloadGradients(ctx *Context) ([]float32, []float32, []float32, error)

func (*SwiGLULayer) DownloadWeights

func (l *SwiGLULayer) DownloadWeights(ctx *Context) ([]float32, []float32, error)

func (*SwiGLULayer) GenerateActivateShader

func (l *SwiGLULayer) GenerateActivateShader() string

func (*SwiGLULayer) GenerateBackwardShader

func (l *SwiGLULayer) GenerateBackwardShader() string

func (*SwiGLULayer) GenerateDownShader

func (l *SwiGLULayer) GenerateDownShader() string

func (*SwiGLULayer) GenerateGateUpShader

func (l *SwiGLULayer) GenerateGateUpShader() string

func (*SwiGLULayer) GetInputBuffer

func (l *SwiGLULayer) GetInputBuffer() *wgpu.Buffer

func (*SwiGLULayer) GetInputGradientBuffer

func (l *SwiGLULayer) GetInputGradientBuffer() *wgpu.Buffer

func (*SwiGLULayer) GetOutputBuffer

func (l *SwiGLULayer) GetOutputBuffer() *wgpu.Buffer

func (*SwiGLULayer) GetStagingBuffer

func (l *SwiGLULayer) GetStagingBuffer() *wgpu.Buffer

func (*SwiGLULayer) UploadWeights

func (l *SwiGLULayer) UploadWeights(ctx *Context)

func (*SwiGLULayer) ZeroGradients

func (l *SwiGLULayer) ZeroGradients(ctx *Context)

ZeroGradients for SwiGLULayer (Empty as it overwrites/not fully implemented backward)

type SwiGLUSpec

type SwiGLUSpec struct {
	InputSize        int       // Hidden size (e.g., 768)
	IntermediateSize int       // Intermediate size (e.g., 3072)
	SeqLen           int       // Sequence length / batch size
	GateWeights      []float32 // [InputSize * IntermediateSize]
	UpWeights        []float32 // [InputSize * IntermediateSize]
	DownWeights      []float32 // [IntermediateSize * InputSize]
	GateBias         []float32 // [IntermediateSize]
	UpBias           []float32 // [IntermediateSize]
	DownBias         []float32 // [InputSize]
}

SwiGLUSpec defines configuration for SwiGLU gated activation layer SwiGLU: down_proj(silu(gate_proj(x)) * up_proj(x))

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL