Documentation
¶
Overview ¶
Package cpu implements the CPU backend with SIMD optimizations and BLAS integration.
Index ¶
- type CPUBackend
- func (cpu *CPUBackend) Add(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) AddScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor
- func (cpu *CPUBackend) And(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Argmax(x *tensor.RawTensor, dim int) *tensor.RawTensor
- func (cpu *CPUBackend) BatchMatMul(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Cast(x *tensor.RawTensor, dtype tensor.DataType) *tensor.RawTensor
- func (cpu *CPUBackend) Cat(tensors []*tensor.RawTensor, dim int) *tensor.RawTensor
- func (cpu *CPUBackend) Chunk(x *tensor.RawTensor, n, dim int) []*tensor.RawTensor
- func (cpu *CPUBackend) Conv2D(input, kernel *tensor.RawTensor, stride, padding int) *tensor.RawTensor
- func (cpu *CPUBackend) Conv2DInputBackward(input, kernel, grad *tensor.RawTensor, stride, padding int) *tensor.RawTensor
- func (cpu *CPUBackend) Conv2DKernelBackward(input, kernel, grad *tensor.RawTensor, stride, padding int) *tensor.RawTensor
- func (cpu *CPUBackend) Cos(x *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Device() tensor.Device
- func (cpu *CPUBackend) Div(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) DivScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor
- func (cpu *CPUBackend) Embedding(weight, indices *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Equal(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Exp(x *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Expand(x *tensor.RawTensor, newShape tensor.Shape) *tensor.RawTensor
- func (cpu *CPUBackend) Gather(x *tensor.RawTensor, dim int, index *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Greater(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) GreaterEqual(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Log(x *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Lower(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) LowerEqual(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) MatMul(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) MaxPool2D(input *tensor.RawTensor, kernelSize, stride int) *tensor.RawTensor
- func (cpu *CPUBackend) MaxPool2DBackward(input, grad *tensor.RawTensor, maxIndices []int, kernelSize, stride int) *tensor.RawTensor
- func (cpu *CPUBackend) MeanDim(x *tensor.RawTensor, dim int, keepDim bool) *tensor.RawTensor
- func (cpu *CPUBackend) Mul(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) MulScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor
- func (cpu *CPUBackend) Name() string
- func (cpu *CPUBackend) Not(x *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) NotEqual(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Or(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) ReLU(x *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Reshape(t *tensor.RawTensor, newShape tensor.Shape) *tensor.RawTensor
- func (cpu *CPUBackend) Rsqrt(x *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Sin(x *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Softmax(x *tensor.RawTensor, dim int) *tensor.RawTensor
- func (cpu *CPUBackend) Sqrt(x *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) Squeeze(x *tensor.RawTensor, dim int) *tensor.RawTensor
- func (cpu *CPUBackend) Sub(a, b *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) SubScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor
- func (cpu *CPUBackend) Sum(x *tensor.RawTensor) *tensor.RawTensor
- func (cpu *CPUBackend) SumDim(x *tensor.RawTensor, dim int, keepDim bool) *tensor.RawTensor
- func (cpu *CPUBackend) Transpose(t *tensor.RawTensor, axes ...int) *tensor.RawTensor
- func (cpu *CPUBackend) Unsqueeze(x *tensor.RawTensor, dim int) *tensor.RawTensor
- func (cpu *CPUBackend) Where(condition, x, y *tensor.RawTensor) *tensor.RawTensor
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CPUBackend ¶
type CPUBackend struct {
// contains filtered or unexported fields
}
CPUBackend implements tensor operations on CPU with optional SIMD and BLAS optimizations.
func (*CPUBackend) Add ¶
func (cpu *CPUBackend) Add(a, b *tensor.RawTensor) *tensor.RawTensor
Add performs element-wise addition with NumPy-style broadcasting.
func (*CPUBackend) AddScalar ¶ added in v0.3.0
AddScalar adds a scalar value to each element of the tensor.
func (*CPUBackend) And ¶ added in v0.3.0
func (cpu *CPUBackend) And(a, b *tensor.RawTensor) *tensor.RawTensor
And computes element-wise logical AND.
func (*CPUBackend) Argmax ¶ added in v0.3.0
Argmax returns the index of the maximum value along the specified dimension.
func (*CPUBackend) BatchMatMul ¶ added in v0.4.0
func (cpu *CPUBackend) BatchMatMul(a, b *tensor.RawTensor) *tensor.RawTensor
BatchMatMul performs batched matrix multiplication. Supports 3D and 4D tensors with batch dimensions.
For 3D: [B, M, K] @ [B, K, N] -> [B, M, N] For 4D: [B, H, M, K] @ [B, H, K, N] -> [B, H, M, N]
The last two dimensions are treated as matrix dimensions. All leading dimensions must match (batch dimensions).
func (*CPUBackend) Cat ¶ added in v0.3.0
Cat concatenates tensors along the specified dimension.
All tensors must have the same shape except along the concatenation dimension. Supports negative dim indexing (-1 = last dimension).
Example:
a := tensor.Randn[float32]([]int{2, 3}, backend)
b := tensor.Randn[float32]([]int{2, 5}, backend)
c := backend.Cat([]*RawTensor{a, b}, 1) // Shape: [2, 8]
func (*CPUBackend) Chunk ¶ added in v0.3.0
Chunk splits tensor into n equal parts along the specified dimension.
The dimension size must be divisible by n. Supports negative dim indexing (-1 = last dimension).
Example:
x := tensor.Randn[float32]([]int{2, 3, 6}, backend)
parts := backend.Chunk(x, 3, -1) // 3 tensors of shape [2, 3, 2]
func (*CPUBackend) Conv2D ¶
func (cpu *CPUBackend) Conv2D(input, kernel *tensor.RawTensor, stride, padding int) *tensor.RawTensor
Conv2D performs 2D convolution using im2col algorithm.
Input shape: [batch, in_channels, height, width] Kernel shape: [out_channels, in_channels, kernel_h, kernel_w] Output shape: [batch, out_channels, out_h, out_w]
Parameters:
- input: Input tensor [N, C_in, H, W]
- kernel: Convolution kernel [C_out, C_in, K_h, K_w]
- stride: Stride for convolution (default: 1)
- padding: Padding to apply (default: 0)
Algorithm: Im2col
- Transform input patches into columns (im2col)
- Reshape kernel into matrix
- Perform matrix multiplication
- Reshape output to [N, C_out, H_out, W_out]
Im2col is efficient because:
- Converts convolution to matmul (highly optimized)
- Cache-friendly memory access
- Reuses existing matmul code
Reference: "High Performance Convolutional Neural Networks for Document Processing" (Chellapilla et al., 2006).
func (*CPUBackend) Conv2DInputBackward ¶ added in v0.7.1
func (cpu *CPUBackend) Conv2DInputBackward(input, kernel, grad *tensor.RawTensor, stride, padding int) *tensor.RawTensor
Conv2DInputBackward computes gradient w.r.t. input using transposed convolution.
Algorithm: Transposed convolution (full convolution).
- For each input position (n, c_in, h, w):
- Sum contributions from all output positions that used this input
- Each contribution is: grad[n, c_out, h_out, w_out] * kernel[c_out, c_in, kh, kw]
References:
- Burn framework: crates/burn-autodiff/src/ops/module.rs (conv2d_x_backward)
- "A guide to convolution arithmetic for deep learning" (Dumoulin & Visin, 2016)
func (*CPUBackend) Conv2DKernelBackward ¶ added in v0.7.1
func (cpu *CPUBackend) Conv2DKernelBackward(input, kernel, grad *tensor.RawTensor, stride, padding int) *tensor.RawTensor
Conv2DKernelBackward computes gradient w.r.t. kernel.
Algorithm: Convolution of input with grad.
- For each kernel position (c_out, c_in, kh, kw):
- Sum over all batch samples and output positions
- Each contribution is: input[n, c_in, h, w] * grad[n, c_out, h_out, w_out]
- Where h = h_out * stride - padding + kh, w = w_out * stride - padding + kw
References:
- Burn framework: crates/burn-autodiff/src/ops/module.rs (conv2d_weight_backward)
func (*CPUBackend) Cos ¶ added in v0.3.0
func (cpu *CPUBackend) Cos(x *tensor.RawTensor) *tensor.RawTensor
Cos computes element-wise cosine: cos(x).
func (*CPUBackend) Device ¶
func (cpu *CPUBackend) Device() tensor.Device
Device returns the compute device.
func (*CPUBackend) Div ¶
func (cpu *CPUBackend) Div(a, b *tensor.RawTensor) *tensor.RawTensor
Div performs element-wise division with broadcasting.
func (*CPUBackend) DivScalar ¶ added in v0.3.0
DivScalar divides each element of the tensor by a scalar value.
func (*CPUBackend) Embedding ¶ added in v0.5.1
func (cpu *CPUBackend) Embedding(weight, indices *tensor.RawTensor) *tensor.RawTensor
Embedding performs embedding lookup. weight: [numEmbeddings, embeddingDim] indices: any shape of int32 indices output: [...indices.shape, embeddingDim]
Similar to PyTorch's F.embedding or nn.Embedding.
func (*CPUBackend) Equal ¶ added in v0.3.0
func (cpu *CPUBackend) Equal(a, b *tensor.RawTensor) *tensor.RawTensor
Equal returns a == b element-wise.
func (*CPUBackend) Exp ¶ added in v0.3.0
func (cpu *CPUBackend) Exp(x *tensor.RawTensor) *tensor.RawTensor
Exp computes element-wise exponential: exp(x).
func (*CPUBackend) Gather ¶ added in v0.3.0
func (cpu *CPUBackend) Gather(x *tensor.RawTensor, dim int, index *tensor.RawTensor) *tensor.RawTensor
Gather selects elements along dim using index tensor. Similar to torch.gather(input, dim, index).
The index tensor must have dtype int32 and its shape must match input shape except at the gather dimension, where it can differ.
Example:
input: [3, 4, 5] with values index: [3, 4, 2] (int32 indices) dim: 2 output: [3, 4, 2] where output[i,j,k] = input[i,j,index[i,j,k]]
func (*CPUBackend) Greater ¶ added in v0.3.0
func (cpu *CPUBackend) Greater(a, b *tensor.RawTensor) *tensor.RawTensor
Greater returns a > b element-wise.
func (*CPUBackend) GreaterEqual ¶ added in v0.3.0
func (cpu *CPUBackend) GreaterEqual(a, b *tensor.RawTensor) *tensor.RawTensor
GreaterEqual returns a >= b element-wise.
func (*CPUBackend) Log ¶ added in v0.3.0
func (cpu *CPUBackend) Log(x *tensor.RawTensor) *tensor.RawTensor
Log computes element-wise natural logarithm: ln(x).
func (*CPUBackend) Lower ¶ added in v0.3.0
func (cpu *CPUBackend) Lower(a, b *tensor.RawTensor) *tensor.RawTensor
Lower returns a < b element-wise.
func (*CPUBackend) LowerEqual ¶ added in v0.3.0
func (cpu *CPUBackend) LowerEqual(a, b *tensor.RawTensor) *tensor.RawTensor
LowerEqual returns a <= b element-wise.
func (*CPUBackend) MatMul ¶
func (cpu *CPUBackend) MatMul(a, b *tensor.RawTensor) *tensor.RawTensor
MatMul performs matrix multiplication. For 2D tensors: (M, K) @ (K, N) -> (M, N) Uses naive O(n³) implementation for Phase 1. TODO: Integrate with gonum/blas for better performance in Phase 2.
func (*CPUBackend) MaxPool2D ¶
MaxPool2D performs 2D max pooling.
Max pooling reduces spatial dimensions by taking the maximum value in each pooling window. Unlike Conv2D, MaxPool2D has no learnable parameters.
Input shape: [batch, channels, height, width] Output shape: [batch, channels, out_height, out_width]
Where:
out_height = (height - kernelSize) / stride + 1 out_width = (width - kernelSize) / stride + 1
Algorithm:
- For each batch and channel
- Slide kernelSize x kernelSize window with given stride
- Take maximum value in each window
- Output max value
Example (2x2 pool, stride=2):
Input: [[1,2,3,4], Output: [[4,6],
[5,6,7,8], [12,14]]
[9,10,11,12],
[13,14,15,16]]
func (*CPUBackend) MaxPool2DBackward ¶ added in v0.7.1
func (cpu *CPUBackend) MaxPool2DBackward(input, grad *tensor.RawTensor, maxIndices []int, kernelSize, stride int) *tensor.RawTensor
MaxPool2DBackward computes gradient w.r.t. input for MaxPool2D.
Algorithm: Route gradients to max positions.
- Gradients flow only to positions that had the max value in forward pass
- For each output position, only ONE input position receives gradient
- All other positions in pooling window receive zero gradient
Example (2x2 pool, stride=2):
Input: [[1, 2], Output: [4] Input Grad: [[0, 0],
[3, 4]] [0, grad]]
References:
- Burn framework: crates/burn-autodiff/src/ops/module.rs (max_pool2d_backward)
- CS231n: Backprop for pooling layers
func (*CPUBackend) MeanDim ¶ added in v0.3.0
MeanDim computes the mean of tensor elements along the specified dimension.
Parameters:
- dim: dimension to reduce (supports negative indexing: -1 = last dim)
- keepDim: if true, keep the reduced dimension with size 1; if false, remove it
Example:
x := tensor.Randn[float32]([]int{2, 3, 4}, backend)
y := backend.MeanDim(x, -1, true) // shape: [2, 3, 1]
z := backend.MeanDim(x, -1, false) // shape: [2, 3]
func (*CPUBackend) Mul ¶
func (cpu *CPUBackend) Mul(a, b *tensor.RawTensor) *tensor.RawTensor
Mul performs element-wise multiplication with broadcasting.
func (*CPUBackend) MulScalar ¶ added in v0.3.0
MulScalar multiplies each element of the tensor by a scalar value.
func (*CPUBackend) Not ¶ added in v0.3.0
func (cpu *CPUBackend) Not(x *tensor.RawTensor) *tensor.RawTensor
Not computes element-wise logical NOT.
func (*CPUBackend) NotEqual ¶ added in v0.3.0
func (cpu *CPUBackend) NotEqual(a, b *tensor.RawTensor) *tensor.RawTensor
NotEqual returns a != b element-wise.
func (*CPUBackend) Or ¶ added in v0.3.0
func (cpu *CPUBackend) Or(a, b *tensor.RawTensor) *tensor.RawTensor
Or computes element-wise logical OR.
func (*CPUBackend) ReLU ¶ added in v0.2.0
func (cpu *CPUBackend) ReLU(x *tensor.RawTensor) *tensor.RawTensor
ReLU applies ReLU activation: max(0, x).
func (*CPUBackend) Rsqrt ¶ added in v0.3.0
func (cpu *CPUBackend) Rsqrt(x *tensor.RawTensor) *tensor.RawTensor
Rsqrt computes element-wise reciprocal square root: 1/sqrt(x). This is optimized for use in normalization layers (RMSNorm, LayerNorm).
func (*CPUBackend) Sin ¶ added in v0.3.0
func (cpu *CPUBackend) Sin(x *tensor.RawTensor) *tensor.RawTensor
Sin computes element-wise sine: sin(x).
func (*CPUBackend) Softmax ¶ added in v0.2.0
Softmax computes softmax along the specified dimension. Softmax(x_i) = exp(x_i) / sum(exp(x_j)) for all j in dimension.
func (*CPUBackend) Sqrt ¶ added in v0.3.0
func (cpu *CPUBackend) Sqrt(x *tensor.RawTensor) *tensor.RawTensor
Sqrt computes element-wise square root: sqrt(x).
func (*CPUBackend) Squeeze ¶ added in v0.3.0
Squeeze removes a dimension of size 1 at the specified position.
Panics if the dimension size is not 1. Supports negative dim indexing. This is a view operation (reshape).
Example:
x := tensor.Randn[float32]([]int{2, 1, 3}, backend)
y := backend.Squeeze(x, 1) // Shape: [2, 3]
func (*CPUBackend) Sub ¶
func (cpu *CPUBackend) Sub(a, b *tensor.RawTensor) *tensor.RawTensor
Sub performs element-wise subtraction with broadcasting.
func (*CPUBackend) SubScalar ¶ added in v0.3.0
SubScalar subtracts a scalar value from each element of the tensor.
func (*CPUBackend) Sum ¶ added in v0.3.0
func (cpu *CPUBackend) Sum(x *tensor.RawTensor) *tensor.RawTensor
Sum computes the total sum of all elements in the tensor (scalar result).
func (*CPUBackend) SumDim ¶ added in v0.3.0
SumDim sums tensor elements along the specified dimension.
Parameters:
- dim: dimension to reduce (supports negative indexing: -1 = last dim)
- keepDim: if true, keep the reduced dimension with size 1; if false, remove it
Example:
x := tensor.Randn[float32]([]int{2, 3, 4}, backend)
y := backend.SumDim(x, -1, true) // shape: [2, 3, 1]
z := backend.SumDim(x, -1, false) // shape: [2, 3]
func (*CPUBackend) Unsqueeze ¶ added in v0.3.0
Unsqueeze adds a dimension of size 1 at the specified position.
Supports negative dim indexing. This is a view operation (reshape).
Example:
x := tensor.Randn[float32]([]int{2, 3}, backend)
y := backend.Unsqueeze(x, 1) // Shape: [2, 1, 3]
func (*CPUBackend) Where ¶ added in v0.3.0
func (cpu *CPUBackend) Where(condition, x, y *tensor.RawTensor) *tensor.RawTensor
Where performs conditional element selection. Similar to torch.where(condition, x, y).
Returns a tensor where each element is selected from x if condition is true, otherwise from y. All tensors must have compatible shapes (broadcasting supported).
Example:
condition: [3, 4] (bool tensor) x: [3, 4] (float32) y: [3, 4] (float32) output: [3, 4] where output[i,j] = condition[i,j] ? x[i,j] : y[i,j]