ops

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 17, 2025 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package ops defines operation interfaces and implementations for automatic differentiation.

Each operation implements the Operation interface, which provides:

  • Forward pass: computed by the backend
  • Backward pass: computes gradients for inputs given output gradient

Supported operations:

  • AddOp: element-wise addition (d(a+b)/da = 1, d(a+b)/db = 1)
  • SubOp: element-wise subtraction
  • MulOp: element-wise multiplication (d(a*b)/da = b, d(a*b)/db = a)
  • DivOp: element-wise division
  • MatMulOp: matrix multiplication (d(A@B)/dA = grad@B^T, d(A@B)/dB = A^T@grad)
  • ReLUOp: rectified linear unit activation (d(ReLU(x))/dx = 1 if x > 0, else 0)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CrossEntropyForward

func CrossEntropyForward(logits, targets *tensor.RawTensor, device tensor.Device) *tensor.RawTensor

CrossEntropyForward computes cross-entropy loss (helper function).

This is a helper for use outside autodiff context. For autodiff support, use AutodiffBackend with CrossEntropyOp.

Parameters:

  • logits: [batch_size, num_classes]
  • targets: [batch_size] (class indices)

Returns:

  • Scalar loss tensor (mean over batch)

func Exp

func Exp(input *tensor.RawTensor, device tensor.Device) *tensor.RawTensor

Exp computes element-wise exponential (helper for softmax).

Forward: output = exp(input) Backward: ∂L/∂input = ∂L/∂output * exp(input) = ∂L/∂output * output

Note: This is a helper function, not a full Operation. For autodiff support, use ExpOp (to be implemented if needed).

func Log

func Log(input *tensor.RawTensor, device tensor.Device) *tensor.RawTensor

Log computes element-wise natural logarithm (helper function).

Forward: output = log(input)

Note: This is a helper function for use outside autodiff. For autodiff support, use backend.Log() which records LogOp.

func Softmax

func Softmax(input *tensor.RawTensor, device tensor.Device) *tensor.RawTensor

Softmax computes softmax along last dimension (helper function).

This is a helper for use outside autodiff. For autodiff support, use backend.Softmax() which records SoftmaxOp.

Types

type AddOp

type AddOp struct {
	// contains filtered or unexported fields
}

AddOp represents an element-wise addition operation: output = a + b.

Backward pass:

  • d(a+b)/da = 1, so grad_a = outputGrad
  • d(a+b)/db = 1, so grad_b = outputGrad

Note: If broadcasting was used in forward pass, gradients must be reduced (summed) along the broadcast dimensions to match input shapes.

func NewAddOp

func NewAddOp(a, b, output *tensor.RawTensor) *AddOp

NewAddOp creates a new AddOp.

func (*AddOp) Backward

func (op *AddOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes input gradients for addition. Since d(a+b)/da = d(a+b)/db = 1, the gradient flows equally to both inputs.

func (*AddOp) Inputs

func (op *AddOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors [a, b].

func (*AddOp) Output

func (op *AddOp) Output() *tensor.RawTensor

Output returns the output tensor a + b.

type Conv2DOp

type Conv2DOp struct {
	// contains filtered or unexported fields
}

Conv2DOp records a 2D convolution operation for autodiff.

Forward: output = Conv2D(input, kernel, stride, padding)

Backward (gradients):

  • d_input: "transposed convolution" or "deconvolution" of d_output with kernel
  • d_kernel: convolution of input with d_output

References:

  • "A guide to convolution arithmetic for deep learning" (Dumoulin & Visin, 2016)
  • CS231n: Convolutional Neural Networks for Visual Recognition

func NewConv2DOp

func NewConv2DOp(input, kernel, output *tensor.RawTensor, stride, padding int) *Conv2DOp

NewConv2DOp creates a new Conv2D operation.

func (*Conv2DOp) Backward

func (op *Conv2DOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes gradients for Conv2D.

Given:

  • outputGrad: ∂L/∂output [N, C_out, H_out, W_out]

Compute:

  • inputGrad: ∂L/∂input [N, C_in, H, W]
  • kernelGrad: ∂L/∂kernel [C_out, C_in, K_h, K_w]

Gradient formulas:

  1. Input gradient (transposed convolution): ∂L/∂input = TransposedConv2D(∂L/∂output, kernel, stride, padding)

    This is essentially a "backward" convolution where we propagate the output gradients back to input positions using the same kernel.

  2. Kernel gradient (convolution): ∂L/∂kernel[c_out, c_in, kh, kw] = Σ_{n,h,w} input[n,c_in,h+kh,w+kw] * ∂L/∂output[n,c_out,h,w]

    This computes how much each kernel weight contributed to the loss by correlating input patches with output gradients.

func (*Conv2DOp) Inputs

func (op *Conv2DOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*Conv2DOp) Output

func (op *Conv2DOp) Output() *tensor.RawTensor

Output returns the output tensor.

type CrossEntropyOp

type CrossEntropyOp struct {
	// contains filtered or unexported fields
}

CrossEntropyOp represents the cross-entropy loss operation.

Forward:

Loss = mean(-log_softmax(logits)[targets])

Where log_softmax uses the log-sum-exp trick for numerical stability:

log_softmax(z) = z - (max(z) + log(Σ exp(z - max(z))))

Backward:

∂L/∂logits = (softmax(logits) - y_one_hot) / batch_size

This elegant gradient formula is the key reason why softmax + cross-entropy are often fused together in modern frameworks (PyTorch, TensorFlow, Burn).

Assumptions:

  • Logits shape: [batch_size, num_classes] (2D)
  • Targets shape: [batch_size] (1D, class indices)
  • Output: scalar loss (mean over batch)

func NewCrossEntropyOp

func NewCrossEntropyOp(logits, targets, output *tensor.RawTensor) *CrossEntropyOp

NewCrossEntropyOp creates a new cross-entropy operation.

func (*CrossEntropyOp) Backward

func (op *CrossEntropyOp) Backward(outputGrad *tensor.RawTensor, _ tensor.Backend) []*tensor.RawTensor

Backward computes the gradient with respect to logits.

Gradient formula:

∂L/∂logits[b,i] = (softmax(logits[b])[i] - y_one_hot[b,i]) / batch_size

Where y_one_hot[b,i] = 1 if i == targets[b], else 0.

Note: The gradient is averaged over the batch size because the forward pass computes mean loss.

func (*CrossEntropyOp) Inputs

func (op *CrossEntropyOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*CrossEntropyOp) Output

func (op *CrossEntropyOp) Output() *tensor.RawTensor

Output returns the output tensor.

type DivOp

type DivOp struct {
	// contains filtered or unexported fields
}

DivOp represents an element-wise division operation: output = a / b.

Backward pass:

  • d(a/b)/da = 1/b, so grad_a = outputGrad / b
  • d(a/b)/db = -a/b², so grad_b = -outputGrad * a / b²

func NewDivOp

func NewDivOp(a, b, output *tensor.RawTensor) *DivOp

NewDivOp creates a new DivOp.

func (*DivOp) Backward

func (op *DivOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes input gradients for division.

func (*DivOp) Inputs

func (op *DivOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors [a, b].

func (*DivOp) Output

func (op *DivOp) Output() *tensor.RawTensor

Output returns the output tensor a / b.

type LogOp

type LogOp struct {
	// contains filtered or unexported fields
}

LogOp represents element-wise natural logarithm operation.

Forward:

output = log(input)

Backward:

∂L/∂input = ∂L/∂output * (1 / input)

The gradient is the reciprocal of the input, scaled by the output gradient.

func NewLogOp

func NewLogOp(input, output *tensor.RawTensor) *LogOp

NewLogOp creates a new log operation.

func (*LogOp) Backward

func (op *LogOp) Backward(outputGrad *tensor.RawTensor, _ tensor.Backend) []*tensor.RawTensor

Backward computes the gradient with respect to input.

Gradient formula:

∂L/∂input[i] = ∂L/∂output[i] * (1 / input[i])

Note: This assumes input > 0 (log is only defined for positive values). In practice, a small epsilon (e.g., 1e-8) is often added for numerical stability.

func (*LogOp) Inputs

func (op *LogOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*LogOp) Output

func (op *LogOp) Output() *tensor.RawTensor

Output returns the output tensor.

type LogSoftmaxOp

type LogSoftmaxOp struct {
	// contains filtered or unexported fields
}

LogSoftmaxOp represents the log-softmax operation.

Forward:

log_softmax(x)_i = x_i - max(x) - log(Σ_j exp(x_j - max(x)))

This is more numerically stable than computing softmax then log.

Backward:

∂L/∂x_j = ∂L/∂log_softmax_j - softmax_j * Σ_i ∂L/∂log_softmax_i

Note: We need to cache both log_softmax (output) and softmax for backward.

func NewLogSoftmaxOp

func NewLogSoftmaxOp(input, output *tensor.RawTensor, softmaxData []float32) *LogSoftmaxOp

NewLogSoftmaxOp creates a new log-softmax operation.

Parameters:

  • input: Input logits
  • output: Log-softmax output
  • softmaxData: Pre-computed softmax (needed for backward)

func (*LogSoftmaxOp) Backward

func (op *LogSoftmaxOp) Backward(outputGrad *tensor.RawTensor, _ tensor.Backend) []*tensor.RawTensor

Backward computes gradient for log-softmax.

Formula:

∂L/∂x[b,j] = ∂L/∂log_softmax[b,j] - softmax[b,j] * Σ_i ∂L/∂log_softmax[b,i]

func (*LogSoftmaxOp) Inputs

func (op *LogSoftmaxOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*LogSoftmaxOp) Output

func (op *LogSoftmaxOp) Output() *tensor.RawTensor

Output returns the output tensor.

type LogWithEpsilonOp

type LogWithEpsilonOp struct {
	// contains filtered or unexported fields
}

LogWithEpsilonOp represents log with numerical stability epsilon.

Forward:

output = log(input + epsilon)

This is numerically more stable when input might be very close to zero.

func NewLogWithEpsilonOp

func NewLogWithEpsilonOp(input, output *tensor.RawTensor, epsilon float64) *LogWithEpsilonOp

NewLogWithEpsilonOp creates a log operation with epsilon for stability.

func (*LogWithEpsilonOp) Backward

func (op *LogWithEpsilonOp) Backward(outputGrad *tensor.RawTensor, _ tensor.Backend) []*tensor.RawTensor

Backward computes gradient: ∂L/∂input = ∂L/∂output / (input + epsilon).

func (*LogWithEpsilonOp) Inputs

func (op *LogWithEpsilonOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*LogWithEpsilonOp) Output

func (op *LogWithEpsilonOp) Output() *tensor.RawTensor

Output returns the output tensor.

type MatMulOp

type MatMulOp struct {
	// contains filtered or unexported fields
}

MatMulOp represents a matrix multiplication operation: output = a @ b.

Backward pass:

  • d(A@B)/dA = outputGrad @ B^T
  • d(A@B)/dB = A^T @ outputGrad

Where @ denotes matrix multiplication and ^T denotes transpose.

func NewMatMulOp

func NewMatMulOp(a, b, output *tensor.RawTensor) *MatMulOp

NewMatMulOp creates a new MatMulOp.

func (*MatMulOp) Backward

func (op *MatMulOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes input gradients for matrix multiplication.

func (*MatMulOp) Inputs

func (op *MatMulOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors [a, b].

func (*MatMulOp) Output

func (op *MatMulOp) Output() *tensor.RawTensor

Output returns the output tensor a @ b.

type MaxPool2DOp

type MaxPool2DOp struct {
	// contains filtered or unexported fields
}

MaxPool2DOp records a max pooling operation for autodiff.

Forward:

output[n,c,h,w] = max(input[n,c,h*stride+kh,w*stride+kw] for kh,kw in kernel)

Backward:

  • Input gradient: Gradients flow only to positions that had the max value
  • For each output position, only one input position receives gradient
  • All other positions in pooling window receive zero gradient

Example (2x2 pool, stride=2):

Input:  [[1, 2],  Output: [4]  Input Grad: [[0, 0],
         [3, 4]]                             [0, grad]]

Unlike Conv2D which has learnable parameters, MaxPool2D only has input gradients.

func NewMaxPool2DOp

func NewMaxPool2DOp(input, output *tensor.RawTensor, kernelSize, stride int) *MaxPool2DOp

NewMaxPool2DOp creates a new MaxPool2D operation.

CRITICAL: Must compute and store max indices during forward pass! Without max indices, backward pass cannot route gradients correctly.

func (*MaxPool2DOp) Backward

func (op *MaxPool2DOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes gradients for MaxPool2D.

Gradient routing:

  1. Initialize input gradient to zeros
  2. For each output gradient value
  3. Route it to the input position that had the max value (stored in maxIndices)
  4. All other positions in pooling window remain zero

This implements the subgradient of the max function:

∂max(x_i)/∂x_j = 1 if j = argmax(x_i), else 0

func (*MaxPool2DOp) Inputs

func (op *MaxPool2DOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*MaxPool2DOp) Output

func (op *MaxPool2DOp) Output() *tensor.RawTensor

Output returns the output tensor.

type MulOp

type MulOp struct {
	// contains filtered or unexported fields
}

MulOp represents an element-wise multiplication operation: output = a * b.

Backward pass:

  • d(a*b)/da = b, so grad_a = outputGrad * b
  • d(a*b)/db = a, so grad_b = outputGrad * a

func NewMulOp

func NewMulOp(a, b, output *tensor.RawTensor) *MulOp

NewMulOp creates a new MulOp.

func (*MulOp) Backward

func (op *MulOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes input gradients for multiplication.

func (*MulOp) Inputs

func (op *MulOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors [a, b].

func (*MulOp) Output

func (op *MulOp) Output() *tensor.RawTensor

Output returns the output tensor a * b.

type Operation

type Operation interface {
	// Backward computes gradients for inputs given the output gradient.
	// Returns a slice of gradients corresponding to each input tensor.
	//
	// Example for AddOp:
	//   inputs: [a, b]
	//   outputGrad: dL/d(a+b)
	//   returns: [dL/d(a+b), dL/d(a+b)] (gradient flows equally to both inputs)
	Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

	// Inputs returns the input tensors for this operation.
	Inputs() []*tensor.RawTensor

	// Output returns the output tensor produced by this operation.
	Output() *tensor.RawTensor
}

Operation represents a differentiable operation in the computation graph. Each operation records its inputs and output during the forward pass, and computes input gradients during the backward pass.

type ReLUOp

type ReLUOp struct {
	// contains filtered or unexported fields
}

ReLUOp represents a ReLU (Rectified Linear Unit) activation: output = max(0, x).

Backward pass:

  • d(ReLU(x))/dx = 1 if x > 0, else 0

The gradient is computed by creating a mask where input > 0, then multiplying the output gradient by this mask.

func NewReLUOp

func NewReLUOp(input, output *tensor.RawTensor) *ReLUOp

NewReLUOp creates a new ReLUOp.

func (*ReLUOp) Backward

func (op *ReLUOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes input gradient for ReLU.

func (*ReLUOp) Inputs

func (op *ReLUOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensor [x].

func (*ReLUOp) Output

func (op *ReLUOp) Output() *tensor.RawTensor

Output returns the output tensor max(0, x).

type ReshapeOp

type ReshapeOp struct {
	// contains filtered or unexported fields
}

ReshapeOp records a reshape operation for autodiff.

Forward: output = Reshape(input, newShape)

Backward:

  • d_input: Reshape(d_output, input.shape())

Reshape backward is simple: reshape the output gradient back to the original input shape.

func NewReshapeOp

func NewReshapeOp(input, output *tensor.RawTensor) *ReshapeOp

NewReshapeOp creates a new Reshape operation.

func (*ReshapeOp) Backward

func (op *ReshapeOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes gradients for Reshape.

The gradient of reshape is simple: reshape the output gradient back to the input shape. No actual computation needed.

func (*ReshapeOp) Inputs

func (op *ReshapeOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*ReshapeOp) Output

func (op *ReshapeOp) Output() *tensor.RawTensor

Output returns the output tensor.

type SigmoidOp

type SigmoidOp struct {
	// contains filtered or unexported fields
}

SigmoidOp represents the sigmoid activation operation: σ(x) = 1 / (1 + exp(-x)).

func NewSigmoidOp

func NewSigmoidOp(input, output *tensor.RawTensor) *SigmoidOp

NewSigmoidOp creates a new sigmoid operation.

func (*SigmoidOp) Backward

func (op *SigmoidOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes the gradient for sigmoid.

For σ(x) = 1 / (1 + exp(-x)): dσ/dx = σ(x) * (1 - σ(x))

Since we have the output σ(x) already computed, we can use it: grad_input = grad_output * output * (1 - output).

func (*SigmoidOp) Inputs

func (op *SigmoidOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*SigmoidOp) Output

func (op *SigmoidOp) Output() *tensor.RawTensor

Output returns the output tensor.

type SoftmaxOp

type SoftmaxOp struct {
	// contains filtered or unexported fields
}

SoftmaxOp represents the softmax operation along the last dimension.

Forward (for each row):

softmax(x)_i = exp(x_i - max(x)) / Σ_j exp(x_j - max(x))

The max-shifting ensures numerical stability (prevents overflow).

Backward:

The Jacobian of softmax is:
∂softmax_i/∂x_j = softmax_i * (δ_ij - softmax_j)

Chain rule gives:
∂L/∂x_j = Σ_i (∂L/∂softmax_i) * softmax_i * (δ_ij - softmax_j)
        = softmax_j * (∂L/∂softmax_j - Σ_i (∂L/∂softmax_i * softmax_i))

Assumptions:

  • Input shape: [batch_size, num_classes] (2D)
  • Softmax applied along last dimension (classes)

func NewSoftmaxOp

func NewSoftmaxOp(input, output *tensor.RawTensor) *SoftmaxOp

NewSoftmaxOp creates a new softmax operation.

func (*SoftmaxOp) Backward

func (op *SoftmaxOp) Backward(outputGrad *tensor.RawTensor, _ tensor.Backend) []*tensor.RawTensor

Backward computes the gradient with respect to input.

Uses the simplified formula for batched softmax:

∂L/∂x[b,j] = softmax[b,j] * (∂L/∂softmax[b,j] - dot(∂L/∂softmax[b,:], softmax[b,:]))

Where:

  • b is the batch index
  • j is the class index
  • dot is the dot product over the class dimension

func (*SoftmaxOp) Inputs

func (op *SoftmaxOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*SoftmaxOp) Output

func (op *SoftmaxOp) Output() *tensor.RawTensor

Output returns the output tensor.

type SubOp

type SubOp struct {
	// contains filtered or unexported fields
}

SubOp represents an element-wise subtraction operation: output = a - b.

Backward pass:

  • d(a-b)/da = 1, so grad_a = outputGrad
  • d(a-b)/db = -1, so grad_b = -outputGrad

func NewSubOp

func NewSubOp(a, b, output *tensor.RawTensor) *SubOp

NewSubOp creates a new SubOp.

func (*SubOp) Backward

func (op *SubOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes input gradients for subtraction.

func (*SubOp) Inputs

func (op *SubOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors [a, b].

func (*SubOp) Output

func (op *SubOp) Output() *tensor.RawTensor

Output returns the output tensor a - b.

type TanhOp

type TanhOp struct {
	// contains filtered or unexported fields
}

TanhOp represents the hyperbolic tangent activation: tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)).

func NewTanhOp

func NewTanhOp(input, output *tensor.RawTensor) *TanhOp

NewTanhOp creates a new tanh operation.

func (*TanhOp) Backward

func (op *TanhOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes the gradient for tanh.

For tanh(x): d(tanh(x))/dx = 1 - tanh²(x)

Since we have the output tanh(x) already computed: grad_input = grad_output * (1 - output²).

func (*TanhOp) Inputs

func (op *TanhOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*TanhOp) Output

func (op *TanhOp) Output() *tensor.RawTensor

Output returns the output tensor.

type TransposeOp

type TransposeOp struct {
	// contains filtered or unexported fields
}

TransposeOp represents a transpose operation.

Forward:

output = transpose(input, axes)

Backward:

∂L/∂input = transpose(∂L/∂output, inverse_axes)

The gradient of transpose is transpose with inverse axes.

func NewTransposeOp

func NewTransposeOp(input, output *tensor.RawTensor, axes []int) *TransposeOp

NewTransposeOp creates a new TransposeOp.

func (*TransposeOp) Backward

func (op *TransposeOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor

Backward computes input gradient for transpose.

The gradient of transpose is transpose with inverted axes. For example, if forward uses axes [1, 0] (swap), then backward also uses [1, 0].

func (*TransposeOp) Inputs

func (op *TransposeOp) Inputs() []*tensor.RawTensor

Inputs returns the input tensors.

func (*TransposeOp) Output

func (op *TransposeOp) Output() *tensor.RawTensor

Output returns the output tensor.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL