Documentation
¶
Overview ¶
Package ops defines operation interfaces and implementations for automatic differentiation.
Each operation implements the Operation interface, which provides:
- Forward pass: computed by the backend
- Backward pass: computes gradients for inputs given output gradient
Supported operations:
- AddOp: element-wise addition (d(a+b)/da = 1, d(a+b)/db = 1)
- SubOp: element-wise subtraction
- MulOp: element-wise multiplication (d(a*b)/da = b, d(a*b)/db = a)
- DivOp: element-wise division
- MatMulOp: matrix multiplication (d(A@B)/dA = grad@B^T, d(A@B)/dB = A^T@grad)
- ReLUOp: rectified linear unit activation (d(ReLU(x))/dx = 1 if x > 0, else 0)
Index ¶
- func CrossEntropyForward(logits, targets *tensor.RawTensor, device tensor.Device) *tensor.RawTensor
- func Exp(input *tensor.RawTensor, device tensor.Device) *tensor.RawTensor
- func Log(input *tensor.RawTensor, device tensor.Device) *tensor.RawTensor
- func Softmax(input *tensor.RawTensor, device tensor.Device) *tensor.RawTensor
- type AddOp
- type BatchMatMulOp
- type Conv2DOp
- type CosOp
- type CrossEntropyOp
- type DivOp
- type EmbeddingOp
- type ExpOp
- type LogOp
- type LogSoftmaxOp
- type LogWithEpsilonOp
- type MatMulOp
- type MaxPool2DOp
- type MeanDimOp
- type MulOp
- type Operation
- type ReLUOp
- type ReshapeOp
- type RsqrtOp
- type SiLUOp
- type SigmoidOp
- type SinOp
- type SoftmaxOp
- type SqrtOp
- type SubOp
- type SumDimOp
- type TanhOp
- type TransposeOp
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CrossEntropyForward ¶
CrossEntropyForward computes cross-entropy loss (helper function).
This is a helper for use outside autodiff context. For autodiff support, use AutodiffBackend with CrossEntropyOp.
Parameters:
- logits: [batch_size, num_classes]
- targets: [batch_size] (class indices)
Returns:
- Scalar loss tensor (mean over batch)
func Exp ¶
Exp computes element-wise exponential (helper for softmax).
Forward: output = exp(input) Backward: ∂L/∂input = ∂L/∂output * exp(input) = ∂L/∂output * output
Note: This is a helper function, not a full Operation. For autodiff support, use ExpOp (to be implemented if needed).
Types ¶
type AddOp ¶
type AddOp struct {
// contains filtered or unexported fields
}
AddOp represents an element-wise addition operation: output = a + b.
Backward pass:
- d(a+b)/da = 1, so grad_a = outputGrad
- d(a+b)/db = 1, so grad_b = outputGrad
Note: If broadcasting was used in forward pass, gradients must be reduced (summed) along the broadcast dimensions to match input shapes.
func (*AddOp) Backward ¶
Backward computes input gradients for addition. Since d(a+b)/da = d(a+b)/db = 1, the gradient flows equally to both inputs.
type BatchMatMulOp ¶ added in v0.4.0
type BatchMatMulOp struct {
// contains filtered or unexported fields
}
BatchMatMulOp represents a batched matrix multiplication operation: output = a @ b.
Backward pass:
- d(A@B)/dA = outputGrad @ B^T
- d(A@B)/dB = A^T @ outputGrad
Where @ denotes batched matrix multiplication and ^T denotes transpose.
func NewBatchMatMulOp ¶ added in v0.4.0
func NewBatchMatMulOp(a, b, output *tensor.RawTensor) *BatchMatMulOp
NewBatchMatMulOp creates a new BatchMatMulOp.
func (*BatchMatMulOp) Backward ¶ added in v0.4.0
func (op *BatchMatMulOp) Backward(grad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes gradients for batch matmul. Given C = A @ B:
dL/dA = dL/dC @ B^T dL/dB = A^T @ dL/dC
func (*BatchMatMulOp) Inputs ¶ added in v0.4.0
func (op *BatchMatMulOp) Inputs() []*tensor.RawTensor
Inputs returns the input tensors [a, b].
func (*BatchMatMulOp) Output ¶ added in v0.4.0
func (op *BatchMatMulOp) Output() *tensor.RawTensor
Output returns the output tensor a @ b.
type Conv2DOp ¶
type Conv2DOp struct {
// contains filtered or unexported fields
}
Conv2DOp records a 2D convolution operation for autodiff.
Forward: output = Conv2D(input, kernel, stride, padding)
Backward (gradients):
- d_input: "transposed convolution" or "deconvolution" of d_output with kernel
- d_kernel: convolution of input with d_output
References:
- "A guide to convolution arithmetic for deep learning" (Dumoulin & Visin, 2016)
- CS231n: Convolutional Neural Networks for Visual Recognition
func NewConv2DOp ¶
NewConv2DOp creates a new Conv2D operation.
func (*Conv2DOp) Backward ¶
func (op *Conv2DOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes gradients for Conv2D.
Given:
- outputGrad: ∂L/∂output [N, C_out, H_out, W_out]
Compute:
- inputGrad: ∂L/∂input [N, C_in, H, W]
- kernelGrad: ∂L/∂kernel [C_out, C_in, K_h, K_w]
Gradient formulas:
Input gradient (transposed convolution): ∂L/∂input = TransposedConv2D(∂L/∂output, kernel, stride, padding)
This is essentially a "backward" convolution where we propagate the output gradients back to input positions using the same kernel.
Kernel gradient (convolution): ∂L/∂kernel[c_out, c_in, kh, kw] = Σ_{n,h,w} input[n,c_in,h+kh,w+kw] * ∂L/∂output[n,c_out,h,w]
This computes how much each kernel weight contributed to the loss by correlating input patches with output gradients.
type CosOp ¶ added in v0.3.0
type CosOp struct {
// contains filtered or unexported fields
}
CosOp represents the cosine operation: y = cos(x).
Backward pass:
- d(cos(x))/dx = -sin(x)
- grad_input = grad_output * (-sin(input))
func (*CosOp) Backward ¶ added in v0.3.0
Backward computes input gradient for cos.
Since d(cos(x))/dx = -sin(x): grad_input = grad_output * (-sin(input)).
type CrossEntropyOp ¶
type CrossEntropyOp struct {
// contains filtered or unexported fields
}
CrossEntropyOp represents the cross-entropy loss operation.
Forward:
Loss = mean(-log_softmax(logits)[targets])
Where log_softmax uses the log-sum-exp trick for numerical stability:
log_softmax(z) = z - (max(z) + log(Σ exp(z - max(z))))
Backward:
∂L/∂logits = (softmax(logits) - y_one_hot) / batch_size
This elegant gradient formula is the key reason why softmax + cross-entropy are often fused together in modern frameworks (PyTorch, TensorFlow, Burn).
Assumptions:
- Logits shape: [batch_size, num_classes] (2D)
- Targets shape: [batch_size] (1D, class indices)
- Output: scalar loss (mean over batch)
func NewCrossEntropyOp ¶
func NewCrossEntropyOp(logits, targets, output *tensor.RawTensor) *CrossEntropyOp
NewCrossEntropyOp creates a new cross-entropy operation.
func (*CrossEntropyOp) Backward ¶
func (op *CrossEntropyOp) Backward(outputGrad *tensor.RawTensor, _ tensor.Backend) []*tensor.RawTensor
Backward computes the gradient with respect to logits.
Gradient formula:
∂L/∂logits[b,i] = (softmax(logits[b])[i] - y_one_hot[b,i]) / batch_size
Where y_one_hot[b,i] = 1 if i == targets[b], else 0.
Note: The gradient is averaged over the batch size because the forward pass computes mean loss.
func (*CrossEntropyOp) Inputs ¶
func (op *CrossEntropyOp) Inputs() []*tensor.RawTensor
Inputs returns the input tensors.
func (*CrossEntropyOp) Output ¶
func (op *CrossEntropyOp) Output() *tensor.RawTensor
Output returns the output tensor.
type DivOp ¶
type DivOp struct {
// contains filtered or unexported fields
}
DivOp represents an element-wise division operation: output = a / b.
Backward pass:
- d(a/b)/da = 1/b, so grad_a = outputGrad / b
- d(a/b)/db = -a/b², so grad_b = -outputGrad * a / b²
type EmbeddingOp ¶ added in v0.3.0
type EmbeddingOp struct {
// contains filtered or unexported fields
}
EmbeddingOp represents an embedding lookup operation.
Forward: output[i] = weight[indices[i]]
Backward:
For each index i, accumulate grad_output[i] to grad_weight[indices[i]] This is a scatter-add operation where gradients for the same index are summed.
Example:
indices = [0, 1, 0] // index 0 appears twice grad_output = [[1,2], [3,4], [5,6]] grad_weight[0] = [1,2] + [5,6] = [6,8] // Accumulated! grad_weight[1] = [3,4]
func NewEmbeddingOp ¶ added in v0.3.0
func NewEmbeddingOp(weight, indices, output *tensor.RawTensor) *EmbeddingOp
NewEmbeddingOp creates a new embedding operation.
func (*EmbeddingOp) Backward ¶ added in v0.3.0
func (op *EmbeddingOp) Backward(gradOutput *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes gradients for the embedding weights.
Gradient computation:
- For each position i in output, grad_output[i] flows back to weight[indices[i]]
- Multiple indices pointing to the same embedding accumulate gradients
Algorithm:
- Create grad_weight tensor (same shape as weight) initialized to zeros
- For each index i: - Read index value: idx = indices[i] - Add grad_output[i] to grad_weight[idx]
- Return grad_weight
func (*EmbeddingOp) Inputs ¶ added in v0.3.0
func (op *EmbeddingOp) Inputs() []*tensor.RawTensor
Inputs returns the input tensors (weight and indices). Note: Only weight needs gradient; indices are integer indices.
func (*EmbeddingOp) Output ¶ added in v0.3.0
func (op *EmbeddingOp) Output() *tensor.RawTensor
Output returns the output tensor.
type ExpOp ¶ added in v0.3.0
type ExpOp struct {
// contains filtered or unexported fields
}
ExpOp represents the exponential operation: y = exp(x).
Backward pass:
- d(exp(x))/dx = exp(x) = y
- grad_input = grad_output * output
func (*ExpOp) Backward ¶ added in v0.3.0
Backward computes input gradient for exp.
Since d(exp(x))/dx = exp(x), and we already have exp(x) as output: grad_input = grad_output * output.
type LogOp ¶
type LogOp struct {
// contains filtered or unexported fields
}
LogOp represents element-wise natural logarithm operation.
Forward:
output = log(input)
Backward:
∂L/∂input = ∂L/∂output * (1 / input)
The gradient is the reciprocal of the input, scaled by the output gradient.
func (*LogOp) Backward ¶
Backward computes the gradient with respect to input.
Gradient formula:
∂L/∂input[i] = ∂L/∂output[i] * (1 / input[i])
Note: This assumes input > 0 (log is only defined for positive values). In practice, a small epsilon (e.g., 1e-8) is often added for numerical stability.
type LogSoftmaxOp ¶
type LogSoftmaxOp struct {
// contains filtered or unexported fields
}
LogSoftmaxOp represents the log-softmax operation.
Forward:
log_softmax(x)_i = x_i - max(x) - log(Σ_j exp(x_j - max(x)))
This is more numerically stable than computing softmax then log.
Backward:
∂L/∂x_j = ∂L/∂log_softmax_j - softmax_j * Σ_i ∂L/∂log_softmax_i
Note: We need to cache both log_softmax (output) and softmax for backward.
func NewLogSoftmaxOp ¶
func NewLogSoftmaxOp(input, output *tensor.RawTensor, softmaxData []float32) *LogSoftmaxOp
NewLogSoftmaxOp creates a new log-softmax operation.
Parameters:
- input: Input logits
- output: Log-softmax output
- softmaxData: Pre-computed softmax (needed for backward)
func (*LogSoftmaxOp) Backward ¶
func (op *LogSoftmaxOp) Backward(outputGrad *tensor.RawTensor, _ tensor.Backend) []*tensor.RawTensor
Backward computes gradient for log-softmax.
Formula:
∂L/∂x[b,j] = ∂L/∂log_softmax[b,j] - softmax[b,j] * Σ_i ∂L/∂log_softmax[b,i]
func (*LogSoftmaxOp) Inputs ¶
func (op *LogSoftmaxOp) Inputs() []*tensor.RawTensor
Inputs returns the input tensors.
func (*LogSoftmaxOp) Output ¶
func (op *LogSoftmaxOp) Output() *tensor.RawTensor
Output returns the output tensor.
type LogWithEpsilonOp ¶
type LogWithEpsilonOp struct {
// contains filtered or unexported fields
}
LogWithEpsilonOp represents log with numerical stability epsilon.
Forward:
output = log(input + epsilon)
This is numerically more stable when input might be very close to zero.
func NewLogWithEpsilonOp ¶
func NewLogWithEpsilonOp(input, output *tensor.RawTensor, epsilon float64) *LogWithEpsilonOp
NewLogWithEpsilonOp creates a log operation with epsilon for stability.
func (*LogWithEpsilonOp) Backward ¶
func (op *LogWithEpsilonOp) Backward(outputGrad *tensor.RawTensor, _ tensor.Backend) []*tensor.RawTensor
Backward computes gradient: ∂L/∂input = ∂L/∂output / (input + epsilon).
func (*LogWithEpsilonOp) Inputs ¶
func (op *LogWithEpsilonOp) Inputs() []*tensor.RawTensor
Inputs returns the input tensors.
func (*LogWithEpsilonOp) Output ¶
func (op *LogWithEpsilonOp) Output() *tensor.RawTensor
Output returns the output tensor.
type MatMulOp ¶
type MatMulOp struct {
// contains filtered or unexported fields
}
MatMulOp represents a matrix multiplication operation: output = a @ b.
Backward pass:
- d(A@B)/dA = outputGrad @ B^T
- d(A@B)/dB = A^T @ outputGrad
Where @ denotes matrix multiplication and ^T denotes transpose.
func NewMatMulOp ¶
NewMatMulOp creates a new MatMulOp.
func (*MatMulOp) Backward ¶
func (op *MatMulOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes input gradients for matrix multiplication.
type MaxPool2DOp ¶
type MaxPool2DOp struct {
// contains filtered or unexported fields
}
MaxPool2DOp records a max pooling operation for autodiff.
Forward:
output[n,c,h,w] = max(input[n,c,h*stride+kh,w*stride+kw] for kh,kw in kernel)
Backward:
- Input gradient: Gradients flow only to positions that had the max value
- For each output position, only one input position receives gradient
- All other positions in pooling window receive zero gradient
Example (2x2 pool, stride=2):
Input: [[1, 2], Output: [4] Input Grad: [[0, 0],
[3, 4]] [0, grad]]
Unlike Conv2D which has learnable parameters, MaxPool2D only has input gradients.
func NewMaxPool2DOp ¶
func NewMaxPool2DOp(input, output *tensor.RawTensor, kernelSize, stride int) *MaxPool2DOp
NewMaxPool2DOp creates a new MaxPool2D operation.
CRITICAL: Must compute and store max indices during forward pass! Without max indices, backward pass cannot route gradients correctly.
func (*MaxPool2DOp) Backward ¶
func (op *MaxPool2DOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes gradients for MaxPool2D.
Gradient routing:
- Initialize input gradient to zeros
- For each output gradient value
- Route it to the input position that had the max value (stored in maxIndices)
- All other positions in pooling window remain zero
This implements the subgradient of the max function:
∂max(x_i)/∂x_j = 1 if j = argmax(x_i), else 0
func (*MaxPool2DOp) Inputs ¶
func (op *MaxPool2DOp) Inputs() []*tensor.RawTensor
Inputs returns the input tensors.
func (*MaxPool2DOp) Output ¶
func (op *MaxPool2DOp) Output() *tensor.RawTensor
Output returns the output tensor.
type MeanDimOp ¶ added in v0.3.0
type MeanDimOp struct {
// contains filtered or unexported fields
}
MeanDimOp represents a reduction mean operation along a dimension: output = mean(x, dim).
Forward:
y = mean(x, dim, keepDim) = sum(x, dim, keepDim) / size[dim]
Backward:
grad_x = broadcast(grad_y, x.shape) / size[dim]
If keepDim=false, we need to unsqueeze grad_y first to match broadcasting requirements.
func NewMeanDimOp ¶ added in v0.3.0
NewMeanDimOp creates a new MeanDimOp.
func (*MeanDimOp) Backward ¶ added in v0.3.0
func (op *MeanDimOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes input gradients for mean reduction.
The gradient flows by broadcasting grad_output to match input shape, then dividing by the size of the reduced dimension.
type MulOp ¶
type MulOp struct {
// contains filtered or unexported fields
}
MulOp represents an element-wise multiplication operation: output = a * b.
Backward pass:
- d(a*b)/da = b, so grad_a = outputGrad * b
- d(a*b)/db = a, so grad_b = outputGrad * a
type Operation ¶
type Operation interface {
// Backward computes gradients for inputs given the output gradient.
// Returns a slice of gradients corresponding to each input tensor.
//
// Example for AddOp:
// inputs: [a, b]
// outputGrad: dL/d(a+b)
// returns: [dL/d(a+b), dL/d(a+b)] (gradient flows equally to both inputs)
Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
// Inputs returns the input tensors for this operation.
Inputs() []*tensor.RawTensor
// Output returns the output tensor produced by this operation.
Output() *tensor.RawTensor
}
Operation represents a differentiable operation in the computation graph. Each operation records its inputs and output during the forward pass, and computes input gradients during the backward pass.
type ReLUOp ¶
type ReLUOp struct {
// contains filtered or unexported fields
}
ReLUOp represents a ReLU (Rectified Linear Unit) activation: output = max(0, x).
Backward pass:
- d(ReLU(x))/dx = 1 if x > 0, else 0
The gradient is computed by creating a mask where input > 0, then multiplying the output gradient by this mask.
func (*ReLUOp) Backward ¶
func (op *ReLUOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes input gradient for ReLU.
type ReshapeOp ¶
type ReshapeOp struct {
// contains filtered or unexported fields
}
ReshapeOp records a reshape operation for autodiff.
Forward: output = Reshape(input, newShape)
Backward:
- d_input: Reshape(d_output, input.shape())
Reshape backward is simple: reshape the output gradient back to the original input shape.
func NewReshapeOp ¶
NewReshapeOp creates a new Reshape operation.
func (*ReshapeOp) Backward ¶
func (op *ReshapeOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes gradients for Reshape.
The gradient of reshape is simple: reshape the output gradient back to the input shape. No actual computation needed.
type RsqrtOp ¶ added in v0.3.0
type RsqrtOp struct {
// contains filtered or unexported fields
}
RsqrtOp represents the reciprocal square root operation: y = 1/sqrt(x).
Backward pass:
- d(1/sqrt(x))/dx = -0.5 * x^(-3/2) = -0.5 * (1/sqrt(x))^3 = -0.5 * y^3
- grad_input = grad_output * (-0.5) * output^3
func NewRsqrtOp ¶ added in v0.3.0
NewRsqrtOp creates a new RsqrtOp.
func (*RsqrtOp) Backward ¶ added in v0.3.0
func (op *RsqrtOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes input gradient for rsqrt.
Since d(1/sqrt(x))/dx = -0.5 * y^3, where y = 1/sqrt(x): grad_input = grad_output * (-0.5) * output^3.
type SiLUOp ¶ added in v0.3.0
type SiLUOp struct {
// contains filtered or unexported fields
}
SiLUOp represents the SiLU (Swish) activation operation: y = x * sigmoid(x).
Also known as Swish activation, widely used in modern transformers (LLaMA, Mistral, GPT-Neo).
func (*SiLUOp) Backward ¶ added in v0.3.0
func (op *SiLUOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes the gradient for SiLU.
For y = x * sigmoid(x):
dy/dx = sigmoid(x) + x * sigmoid(x) * (1 - sigmoid(x))
= sigmoid(x) * (1 + x * (1 - sigmoid(x)))
We compute the gradient directly for numerical accuracy.
type SigmoidOp ¶
type SigmoidOp struct {
// contains filtered or unexported fields
}
SigmoidOp represents the sigmoid activation operation: σ(x) = 1 / (1 + exp(-x)).
func NewSigmoidOp ¶
NewSigmoidOp creates a new sigmoid operation.
func (*SigmoidOp) Backward ¶
func (op *SigmoidOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes the gradient for sigmoid.
For σ(x) = 1 / (1 + exp(-x)): dσ/dx = σ(x) * (1 - σ(x))
Since we have the output σ(x) already computed, we can use it: grad_input = grad_output * output * (1 - output).
type SinOp ¶ added in v0.3.0
type SinOp struct {
// contains filtered or unexported fields
}
SinOp represents the sine operation: y = sin(x).
Backward pass:
- d(sin(x))/dx = cos(x)
- grad_input = grad_output * cos(input)
func (*SinOp) Backward ¶ added in v0.3.0
Backward computes input gradient for sin.
Since d(sin(x))/dx = cos(x): grad_input = grad_output * cos(input).
type SoftmaxOp ¶
type SoftmaxOp struct {
// contains filtered or unexported fields
}
SoftmaxOp represents the softmax operation along a specified dimension.
Forward (for each slice along dim):
softmax(x)_i = exp(x_i - max(x)) / Σ_j exp(x_j - max(x))
The max-shifting ensures numerical stability (prevents overflow).
Backward:
The Jacobian of softmax is:
∂softmax_i/∂x_j = softmax_i * (δ_ij - softmax_j)
Chain rule gives:
∂L/∂x_j = Σ_i (∂L/∂softmax_i) * softmax_i * (δ_ij - softmax_j)
= softmax_j * (∂L/∂softmax_j - Σ_i (∂L/∂softmax_i * softmax_i))
Simplified formula:
∂L/∂x = y * (upstream_grad - sum(y * upstream_grad, dim=axis, keepdim=True))
Supports:
- N-dimensional tensors (2D, 3D, 4D, etc.)
- Softmax applied along any dimension (positive or negative indexing)
func NewSoftmaxOp ¶
NewSoftmaxOp creates a new softmax operation.
func (*SoftmaxOp) Backward ¶
func (op *SoftmaxOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes the gradient with respect to input.
Uses the simplified formula:
∂L/∂x = y * (upstream_grad - sum(y * upstream_grad, dim=axis, keepdim=True))
Where:
- y is the softmax output (op.output)
- upstream_grad is the gradient from the next layer
- sum is performed along the same dimension as softmax (op.dim)
This formula works for N-dimensional tensors (2D, 3D, 4D, etc.).
type SqrtOp ¶ added in v0.3.0
type SqrtOp struct {
// contains filtered or unexported fields
}
SqrtOp represents the square root operation: y = sqrt(x).
Backward pass:
- d(sqrt(x))/dx = 1 / (2 * sqrt(x)) = 0.5 / y
- grad_input = grad_output * 0.5 / output
func (*SqrtOp) Backward ¶ added in v0.3.0
func (op *SqrtOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes input gradient for sqrt.
Since d(sqrt(x))/dx = 0.5 / sqrt(x), and we have sqrt(x) as output: grad_input = grad_output * 0.5 / output.
type SubOp ¶
type SubOp struct {
// contains filtered or unexported fields
}
SubOp represents an element-wise subtraction operation: output = a - b.
Backward pass:
- d(a-b)/da = 1, so grad_a = outputGrad
- d(a-b)/db = -1, so grad_b = -outputGrad
type SumDimOp ¶ added in v0.3.0
type SumDimOp struct {
// contains filtered or unexported fields
}
SumDimOp represents a reduction sum operation along a dimension: output = sum(x, dim).
Forward:
y = sum(x, dim, keepDim)
Backward:
grad_x = broadcast(grad_y, x.shape)
If keepDim=false, we need to unsqueeze grad_y first to match broadcasting requirements.
func NewSumDimOp ¶ added in v0.3.0
NewSumDimOp creates a new SumDimOp.
func (*SumDimOp) Backward ¶ added in v0.3.0
func (op *SumDimOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes input gradients for sum reduction.
The gradient flows by broadcasting grad_output to match input shape. Since sum just accumulates values, each input element contributes 1.0 to the output, so the gradient is simply broadcast back.
type TanhOp ¶
type TanhOp struct {
// contains filtered or unexported fields
}
TanhOp represents the hyperbolic tangent activation: tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)).
func (*TanhOp) Backward ¶
func (op *TanhOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes the gradient for tanh.
For tanh(x): d(tanh(x))/dx = 1 - tanh²(x)
Since we have the output tanh(x) already computed: grad_input = grad_output * (1 - output²).
type TransposeOp ¶
type TransposeOp struct {
// contains filtered or unexported fields
}
TransposeOp represents a transpose operation.
Forward:
output = transpose(input, axes)
Backward:
∂L/∂input = transpose(∂L/∂output, inverse_axes)
The gradient of transpose is transpose with inverse axes.
func NewTransposeOp ¶
func NewTransposeOp(input, output *tensor.RawTensor, axes []int) *TransposeOp
NewTransposeOp creates a new TransposeOp.
func (*TransposeOp) Backward ¶
func (op *TransposeOp) Backward(outputGrad *tensor.RawTensor, backend tensor.Backend) []*tensor.RawTensor
Backward computes input gradient for transpose.
The gradient of transpose is transpose with inverted axes. For example, if forward uses axes [1, 0] (swap), then backward also uses [1, 0].
func (*TransposeOp) Inputs ¶
func (op *TransposeOp) Inputs() []*tensor.RawTensor
Inputs returns the input tensors.
func (*TransposeOp) Output ¶
func (op *TransposeOp) Output() *tensor.RawTensor
Output returns the output tensor.