cpu

package
v0.7.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 6, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package cpu implements the CPU backend with SIMD optimizations and BLAS integration.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CPUBackend

type CPUBackend struct {
	// contains filtered or unexported fields
}

CPUBackend implements tensor operations on CPU with optional SIMD and BLAS optimizations.

func New

func New() *CPUBackend

New creates a new CPU backend.

func (*CPUBackend) Add

func (cpu *CPUBackend) Add(a, b *tensor.RawTensor) *tensor.RawTensor

Add performs element-wise addition with NumPy-style broadcasting.

func (*CPUBackend) AddScalar added in v0.3.0

func (cpu *CPUBackend) AddScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

AddScalar adds a scalar value to each element of the tensor.

func (*CPUBackend) And added in v0.3.0

func (cpu *CPUBackend) And(a, b *tensor.RawTensor) *tensor.RawTensor

And computes element-wise logical AND.

func (*CPUBackend) Argmax added in v0.3.0

func (cpu *CPUBackend) Argmax(x *tensor.RawTensor, dim int) *tensor.RawTensor

Argmax returns the index of the maximum value along the specified dimension.

func (*CPUBackend) BatchMatMul added in v0.4.0

func (cpu *CPUBackend) BatchMatMul(a, b *tensor.RawTensor) *tensor.RawTensor

BatchMatMul performs batched matrix multiplication. Supports 3D and 4D tensors with batch dimensions.

For 3D: [B, M, K] @ [B, K, N] -> [B, M, N] For 4D: [B, H, M, K] @ [B, H, K, N] -> [B, H, M, N]

The last two dimensions are treated as matrix dimensions. All leading dimensions must match (batch dimensions).

func (*CPUBackend) Cast added in v0.3.0

func (cpu *CPUBackend) Cast(x *tensor.RawTensor, dtype tensor.DataType) *tensor.RawTensor

Cast converts the tensor to a different data type.

func (*CPUBackend) Cat added in v0.3.0

func (cpu *CPUBackend) Cat(tensors []*tensor.RawTensor, dim int) *tensor.RawTensor

Cat concatenates tensors along the specified dimension.

All tensors must have the same shape except along the concatenation dimension. Supports negative dim indexing (-1 = last dimension).

Example:

a := tensor.Randn[float32]([]int{2, 3}, backend)
b := tensor.Randn[float32]([]int{2, 5}, backend)
c := backend.Cat([]*RawTensor{a, b}, 1) // Shape: [2, 8]

func (*CPUBackend) Chunk added in v0.3.0

func (cpu *CPUBackend) Chunk(x *tensor.RawTensor, n, dim int) []*tensor.RawTensor

Chunk splits tensor into n equal parts along the specified dimension.

The dimension size must be divisible by n. Supports negative dim indexing (-1 = last dimension).

Example:

x := tensor.Randn[float32]([]int{2, 3, 6}, backend)
parts := backend.Chunk(x, 3, -1) // 3 tensors of shape [2, 3, 2]

func (*CPUBackend) Conv2D

func (cpu *CPUBackend) Conv2D(input, kernel *tensor.RawTensor, stride, padding int) *tensor.RawTensor

Conv2D performs 2D convolution using im2col algorithm.

Input shape: [batch, in_channels, height, width] Kernel shape: [out_channels, in_channels, kernel_h, kernel_w] Output shape: [batch, out_channels, out_h, out_w]

Parameters:

  • input: Input tensor [N, C_in, H, W]
  • kernel: Convolution kernel [C_out, C_in, K_h, K_w]
  • stride: Stride for convolution (default: 1)
  • padding: Padding to apply (default: 0)

Algorithm: Im2col

  1. Transform input patches into columns (im2col)
  2. Reshape kernel into matrix
  3. Perform matrix multiplication
  4. Reshape output to [N, C_out, H_out, W_out]

Im2col is efficient because:

  • Converts convolution to matmul (highly optimized)
  • Cache-friendly memory access
  • Reuses existing matmul code

Reference: "High Performance Convolutional Neural Networks for Document Processing" (Chellapilla et al., 2006).

func (*CPUBackend) Conv2DInputBackward added in v0.7.1

func (cpu *CPUBackend) Conv2DInputBackward(input, kernel, grad *tensor.RawTensor, stride, padding int) *tensor.RawTensor

Conv2DInputBackward computes gradient w.r.t. input using transposed convolution.

Algorithm: Transposed convolution (full convolution).

  • For each input position (n, c_in, h, w):
  • Sum contributions from all output positions that used this input
  • Each contribution is: grad[n, c_out, h_out, w_out] * kernel[c_out, c_in, kh, kw]

References:

  • Burn framework: crates/burn-autodiff/src/ops/module.rs (conv2d_x_backward)
  • "A guide to convolution arithmetic for deep learning" (Dumoulin & Visin, 2016)

func (*CPUBackend) Conv2DKernelBackward added in v0.7.1

func (cpu *CPUBackend) Conv2DKernelBackward(input, kernel, grad *tensor.RawTensor, stride, padding int) *tensor.RawTensor

Conv2DKernelBackward computes gradient w.r.t. kernel.

Algorithm: Convolution of input with grad.

  • For each kernel position (c_out, c_in, kh, kw):
  • Sum over all batch samples and output positions
  • Each contribution is: input[n, c_in, h, w] * grad[n, c_out, h_out, w_out]
  • Where h = h_out * stride - padding + kh, w = w_out * stride - padding + kw

References:

  • Burn framework: crates/burn-autodiff/src/ops/module.rs (conv2d_weight_backward)

func (*CPUBackend) Cos added in v0.3.0

func (cpu *CPUBackend) Cos(x *tensor.RawTensor) *tensor.RawTensor

Cos computes element-wise cosine: cos(x).

func (*CPUBackend) Device

func (cpu *CPUBackend) Device() tensor.Device

Device returns the compute device.

func (*CPUBackend) Div

func (cpu *CPUBackend) Div(a, b *tensor.RawTensor) *tensor.RawTensor

Div performs element-wise division with broadcasting.

func (*CPUBackend) DivScalar added in v0.3.0

func (cpu *CPUBackend) DivScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

DivScalar divides each element of the tensor by a scalar value.

func (*CPUBackend) Embedding added in v0.5.1

func (cpu *CPUBackend) Embedding(weight, indices *tensor.RawTensor) *tensor.RawTensor

Embedding performs embedding lookup. weight: [numEmbeddings, embeddingDim] indices: any shape of int32 indices output: [...indices.shape, embeddingDim]

Similar to PyTorch's F.embedding or nn.Embedding.

func (*CPUBackend) Equal added in v0.3.0

func (cpu *CPUBackend) Equal(a, b *tensor.RawTensor) *tensor.RawTensor

Equal returns a == b element-wise.

func (*CPUBackend) Exp added in v0.3.0

func (cpu *CPUBackend) Exp(x *tensor.RawTensor) *tensor.RawTensor

Exp computes element-wise exponential: exp(x).

func (*CPUBackend) Expand added in v0.3.0

func (cpu *CPUBackend) Expand(x *tensor.RawTensor, newShape tensor.Shape) *tensor.RawTensor

Expand broadcasts the tensor to a new shape.

func (*CPUBackend) Gather added in v0.3.0

func (cpu *CPUBackend) Gather(x *tensor.RawTensor, dim int, index *tensor.RawTensor) *tensor.RawTensor

Gather selects elements along dim using index tensor. Similar to torch.gather(input, dim, index).

The index tensor must have dtype int32 and its shape must match input shape except at the gather dimension, where it can differ.

Example:

input: [3, 4, 5] with values
index: [3, 4, 2] (int32 indices)
dim: 2
output: [3, 4, 2] where output[i,j,k] = input[i,j,index[i,j,k]]

func (*CPUBackend) Greater added in v0.3.0

func (cpu *CPUBackend) Greater(a, b *tensor.RawTensor) *tensor.RawTensor

Greater returns a > b element-wise.

func (*CPUBackend) GreaterEqual added in v0.3.0

func (cpu *CPUBackend) GreaterEqual(a, b *tensor.RawTensor) *tensor.RawTensor

GreaterEqual returns a >= b element-wise.

func (*CPUBackend) Log added in v0.3.0

func (cpu *CPUBackend) Log(x *tensor.RawTensor) *tensor.RawTensor

Log computes element-wise natural logarithm: ln(x).

func (*CPUBackend) Lower added in v0.3.0

func (cpu *CPUBackend) Lower(a, b *tensor.RawTensor) *tensor.RawTensor

Lower returns a < b element-wise.

func (*CPUBackend) LowerEqual added in v0.3.0

func (cpu *CPUBackend) LowerEqual(a, b *tensor.RawTensor) *tensor.RawTensor

LowerEqual returns a <= b element-wise.

func (*CPUBackend) MatMul

func (cpu *CPUBackend) MatMul(a, b *tensor.RawTensor) *tensor.RawTensor

MatMul performs matrix multiplication. For 2D tensors: (M, K) @ (K, N) -> (M, N) Uses naive O(n³) implementation for Phase 1. TODO: Integrate with gonum/blas for better performance in Phase 2.

func (*CPUBackend) MaxPool2D

func (cpu *CPUBackend) MaxPool2D(input *tensor.RawTensor, kernelSize, stride int) *tensor.RawTensor

MaxPool2D performs 2D max pooling.

Max pooling reduces spatial dimensions by taking the maximum value in each pooling window. Unlike Conv2D, MaxPool2D has no learnable parameters.

Input shape: [batch, channels, height, width] Output shape: [batch, channels, out_height, out_width]

Where:

out_height = (height - kernelSize) / stride + 1
out_width = (width - kernelSize) / stride + 1

Algorithm:

  1. For each batch and channel
  2. Slide kernelSize x kernelSize window with given stride
  3. Take maximum value in each window
  4. Output max value

Example (2x2 pool, stride=2):

Input: [[1,2,3,4],    Output: [[4,6],
        [5,6,7,8],             [12,14]]
        [9,10,11,12],
        [13,14,15,16]]

func (*CPUBackend) MaxPool2DBackward added in v0.7.1

func (cpu *CPUBackend) MaxPool2DBackward(input, grad *tensor.RawTensor, maxIndices []int, kernelSize, stride int) *tensor.RawTensor

MaxPool2DBackward computes gradient w.r.t. input for MaxPool2D.

Algorithm: Route gradients to max positions.

  • Gradients flow only to positions that had the max value in forward pass
  • For each output position, only ONE input position receives gradient
  • All other positions in pooling window receive zero gradient

Example (2x2 pool, stride=2):

Input:  [[1, 2],  Output: [4]  Input Grad: [[0, 0],
         [3, 4]]                             [0, grad]]

References:

  • Burn framework: crates/burn-autodiff/src/ops/module.rs (max_pool2d_backward)
  • CS231n: Backprop for pooling layers

func (*CPUBackend) MeanDim added in v0.3.0

func (cpu *CPUBackend) MeanDim(x *tensor.RawTensor, dim int, keepDim bool) *tensor.RawTensor

MeanDim computes the mean of tensor elements along the specified dimension.

Parameters:

  • dim: dimension to reduce (supports negative indexing: -1 = last dim)
  • keepDim: if true, keep the reduced dimension with size 1; if false, remove it

Example:

x := tensor.Randn[float32]([]int{2, 3, 4}, backend)
y := backend.MeanDim(x, -1, true)   // shape: [2, 3, 1]
z := backend.MeanDim(x, -1, false)  // shape: [2, 3]

func (*CPUBackend) Mul

func (cpu *CPUBackend) Mul(a, b *tensor.RawTensor) *tensor.RawTensor

Mul performs element-wise multiplication with broadcasting.

func (*CPUBackend) MulScalar added in v0.3.0

func (cpu *CPUBackend) MulScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

MulScalar multiplies each element of the tensor by a scalar value.

func (*CPUBackend) Name

func (cpu *CPUBackend) Name() string

Name returns the backend name.

func (*CPUBackend) Not added in v0.3.0

func (cpu *CPUBackend) Not(x *tensor.RawTensor) *tensor.RawTensor

Not computes element-wise logical NOT.

func (*CPUBackend) NotEqual added in v0.3.0

func (cpu *CPUBackend) NotEqual(a, b *tensor.RawTensor) *tensor.RawTensor

NotEqual returns a != b element-wise.

func (*CPUBackend) Or added in v0.3.0

func (cpu *CPUBackend) Or(a, b *tensor.RawTensor) *tensor.RawTensor

Or computes element-wise logical OR.

func (*CPUBackend) ReLU added in v0.2.0

func (cpu *CPUBackend) ReLU(x *tensor.RawTensor) *tensor.RawTensor

ReLU applies ReLU activation: max(0, x).

func (*CPUBackend) Reshape

func (cpu *CPUBackend) Reshape(t *tensor.RawTensor, newShape tensor.Shape) *tensor.RawTensor

Reshape returns a tensor with the same data but different shape.

func (*CPUBackend) Rsqrt added in v0.3.0

func (cpu *CPUBackend) Rsqrt(x *tensor.RawTensor) *tensor.RawTensor

Rsqrt computes element-wise reciprocal square root: 1/sqrt(x). This is optimized for use in normalization layers (RMSNorm, LayerNorm).

func (*CPUBackend) Sin added in v0.3.0

func (cpu *CPUBackend) Sin(x *tensor.RawTensor) *tensor.RawTensor

Sin computes element-wise sine: sin(x).

func (*CPUBackend) Softmax added in v0.2.0

func (cpu *CPUBackend) Softmax(x *tensor.RawTensor, dim int) *tensor.RawTensor

Softmax computes softmax along the specified dimension. Softmax(x_i) = exp(x_i) / sum(exp(x_j)) for all j in dimension.

func (*CPUBackend) Sqrt added in v0.3.0

func (cpu *CPUBackend) Sqrt(x *tensor.RawTensor) *tensor.RawTensor

Sqrt computes element-wise square root: sqrt(x).

func (*CPUBackend) Squeeze added in v0.3.0

func (cpu *CPUBackend) Squeeze(x *tensor.RawTensor, dim int) *tensor.RawTensor

Squeeze removes a dimension of size 1 at the specified position.

Panics if the dimension size is not 1. Supports negative dim indexing. This is a view operation (reshape).

Example:

x := tensor.Randn[float32]([]int{2, 1, 3}, backend)
y := backend.Squeeze(x, 1)  // Shape: [2, 3]

func (*CPUBackend) Sub

func (cpu *CPUBackend) Sub(a, b *tensor.RawTensor) *tensor.RawTensor

Sub performs element-wise subtraction with broadcasting.

func (*CPUBackend) SubScalar added in v0.3.0

func (cpu *CPUBackend) SubScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

SubScalar subtracts a scalar value from each element of the tensor.

func (*CPUBackend) Sum added in v0.3.0

func (cpu *CPUBackend) Sum(x *tensor.RawTensor) *tensor.RawTensor

Sum computes the total sum of all elements in the tensor (scalar result).

func (*CPUBackend) SumDim added in v0.3.0

func (cpu *CPUBackend) SumDim(x *tensor.RawTensor, dim int, keepDim bool) *tensor.RawTensor

SumDim sums tensor elements along the specified dimension.

Parameters:

  • dim: dimension to reduce (supports negative indexing: -1 = last dim)
  • keepDim: if true, keep the reduced dimension with size 1; if false, remove it

Example:

x := tensor.Randn[float32]([]int{2, 3, 4}, backend)
y := backend.SumDim(x, -1, true)   // shape: [2, 3, 1]
z := backend.SumDim(x, -1, false)  // shape: [2, 3]

func (*CPUBackend) Transpose

func (cpu *CPUBackend) Transpose(t *tensor.RawTensor, axes ...int) *tensor.RawTensor

Transpose transposes the tensor by permuting its dimensions.

func (*CPUBackend) Unsqueeze added in v0.3.0

func (cpu *CPUBackend) Unsqueeze(x *tensor.RawTensor, dim int) *tensor.RawTensor

Unsqueeze adds a dimension of size 1 at the specified position.

Supports negative dim indexing. This is a view operation (reshape).

Example:

x := tensor.Randn[float32]([]int{2, 3}, backend)
y := backend.Unsqueeze(x, 1)  // Shape: [2, 1, 3]

func (*CPUBackend) Where added in v0.3.0

func (cpu *CPUBackend) Where(condition, x, y *tensor.RawTensor) *tensor.RawTensor

Where performs conditional element selection. Similar to torch.where(condition, x, y).

Returns a tensor where each element is selected from x if condition is true, otherwise from y. All tensors must have compatible shapes (broadcasting supported).

Example:

condition: [3, 4] (bool tensor)
x: [3, 4] (float32)
y: [3, 4] (float32)
output: [3, 4] where output[i,j] = condition[i,j] ? x[i,j] : y[i,j]

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL