cpu

package

v0.7.7 Latest Latest Go to latest Published: Jan 6, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/born-ml/born

Links

Open Source Insights

Documentation ¶

Overview ¶

Package cpu implements the CPU backend with SIMD optimizations and BLAS integration.

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type CPUBackend ¶

type CPUBackend struct {
	// contains filtered or unexported fields
}

CPUBackend implements tensor operations on CPU with optional SIMD and BLAS optimizations.

func New ¶

func New() *CPUBackend

New creates a new CPU backend.

func (*CPUBackend) Add ¶

func (cpu *CPUBackend) Add(a, b *tensor.RawTensor) *tensor.RawTensor

Add performs element-wise addition with NumPy-style broadcasting.

func (*CPUBackend) AddScalar ¶ added in v0.3.0

func (cpu *CPUBackend) AddScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

AddScalar adds a scalar value to each element of the tensor.

func (*CPUBackend) And ¶ added in v0.3.0

func (cpu *CPUBackend) And(a, b *tensor.RawTensor) *tensor.RawTensor

And computes element-wise logical AND.

func (*CPUBackend) Argmax ¶ added in v0.3.0

func (cpu *CPUBackend) Argmax(x *tensor.RawTensor, dim int) *tensor.RawTensor

Argmax returns the index of the maximum value along the specified dimension.

func (*CPUBackend) BatchMatMul ¶ added in v0.4.0

func (cpu *CPUBackend) BatchMatMul(a, b *tensor.RawTensor) *tensor.RawTensor

BatchMatMul performs batched matrix multiplication. Supports 3D and 4D tensors with batch dimensions.

For 3D: [B, M, K] @ [B, K, N] -> [B, M, N] For 4D: [B, H, M, K] @ [B, H, K, N] -> [B, H, M, N]

The last two dimensions are treated as matrix dimensions. All leading dimensions must match (batch dimensions).

func (*CPUBackend) Cast ¶ added in v0.3.0

func (cpu *CPUBackend) Cast(x *tensor.RawTensor, dtype tensor.DataType) *tensor.RawTensor

Cast converts the tensor to a different data type.

func (*CPUBackend) Cat ¶ added in v0.3.0

func (cpu *CPUBackend) Cat(tensors []*tensor.RawTensor, dim int) *tensor.RawTensor

Cat concatenates tensors along the specified dimension.

All tensors must have the same shape except along the concatenation dimension. Supports negative dim indexing (-1 = last dimension).

Example:

a := tensor.Randn[float32]([]int{2, 3}, backend)
b := tensor.Randn[float32]([]int{2, 5}, backend)
c := backend.Cat([]*RawTensor{a, b}, 1) // Shape: [2, 8]

func (*CPUBackend) Chunk ¶ added in v0.3.0

func (cpu *CPUBackend) Chunk(x *tensor.RawTensor, n, dim int) []*tensor.RawTensor

Chunk splits tensor into n equal parts along the specified dimension.

The dimension size must be divisible by n. Supports negative dim indexing (-1 = last dimension).

Example:

x := tensor.Randn[float32]([]int{2, 3, 6}, backend)
parts := backend.Chunk(x, 3, -1) // 3 tensors of shape [2, 3, 2]

func (*CPUBackend) Conv2D ¶

func (cpu *CPUBackend) Conv2D(input, kernel *tensor.RawTensor, stride, padding int) *tensor.RawTensor

Conv2D performs 2D convolution using im2col algorithm.

Input shape: [batch, in_channels, height, width] Kernel shape: [out_channels, in_channels, kernel_h, kernel_w] Output shape: [batch, out_channels, out_h, out_w]

Parameters:

input: Input tensor [N, C_in, H, W]
kernel: Convolution kernel [C_out, C_in, K_h, K_w]
stride: Stride for convolution (default: 1)
padding: Padding to apply (default: 0)

Algorithm: Im2col

Transform input patches into columns (im2col)
Reshape kernel into matrix
Perform matrix multiplication
Reshape output to [N, C_out, H_out, W_out]

Im2col is efficient because:

Converts convolution to matmul (highly optimized)
Cache-friendly memory access
Reuses existing matmul code

Reference: "High Performance Convolutional Neural Networks for Document Processing" (Chellapilla et al., 2006).

func (*CPUBackend) Conv2DInputBackward ¶ added in v0.7.1

func (cpu *CPUBackend) Conv2DInputBackward(input, kernel, grad *tensor.RawTensor, stride, padding int) *tensor.RawTensor

Conv2DInputBackward computes gradient w.r.t. input using transposed convolution.

Algorithm: Transposed convolution (full convolution).

For each input position (n, c_in, h, w):
Sum contributions from all output positions that used this input
Each contribution is: grad[n, c_out, h_out, w_out] * kernel[c_out, c_in, kh, kw]

References:

Burn framework: crates/burn-autodiff/src/ops/module.rs (conv2d_x_backward)
"A guide to convolution arithmetic for deep learning" (Dumoulin & Visin, 2016)

func (*CPUBackend) Conv2DKernelBackward ¶ added in v0.7.1

func (cpu *CPUBackend) Conv2DKernelBackward(input, kernel, grad *tensor.RawTensor, stride, padding int) *tensor.RawTensor

Conv2DKernelBackward computes gradient w.r.t. kernel.

Algorithm: Convolution of input with grad.

For each kernel position (c_out, c_in, kh, kw):
Sum over all batch samples and output positions
Each contribution is: input[n, c_in, h, w] * grad[n, c_out, h_out, w_out]
Where h = h_out * stride - padding + kh, w = w_out * stride - padding + kw

References:

Burn framework: crates/burn-autodiff/src/ops/module.rs (conv2d_weight_backward)

func (*CPUBackend) Cos ¶ added in v0.3.0

func (cpu *CPUBackend) Cos(x *tensor.RawTensor) *tensor.RawTensor

Cos computes element-wise cosine: cos(x).

func (*CPUBackend) Device ¶

func (cpu *CPUBackend) Device() tensor.Device

Device returns the compute device.

func (*CPUBackend) Div ¶

func (cpu *CPUBackend) Div(a, b *tensor.RawTensor) *tensor.RawTensor

Div performs element-wise division with broadcasting.

func (*CPUBackend) DivScalar ¶ added in v0.3.0

func (cpu *CPUBackend) DivScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

DivScalar divides each element of the tensor by a scalar value.

func (*CPUBackend) Embedding ¶ added in v0.5.1

func (cpu *CPUBackend) Embedding(weight, indices *tensor.RawTensor) *tensor.RawTensor

Embedding performs embedding lookup. weight: [numEmbeddings, embeddingDim] indices: any shape of int32 indices output: [...indices.shape, embeddingDim]

Similar to PyTorch's F.embedding or nn.Embedding.

func (*CPUBackend) Equal ¶ added in v0.3.0

func (cpu *CPUBackend) Equal(a, b *tensor.RawTensor) *tensor.RawTensor

Equal returns a == b element-wise.

func (*CPUBackend) Exp ¶ added in v0.3.0

func (cpu *CPUBackend) Exp(x *tensor.RawTensor) *tensor.RawTensor

Exp computes element-wise exponential: exp(x).

func (*CPUBackend) Expand ¶ added in v0.3.0

func (cpu *CPUBackend) Expand(x *tensor.RawTensor, newShape tensor.Shape) *tensor.RawTensor

Expand broadcasts the tensor to a new shape.

func (*CPUBackend) Gather ¶ added in v0.3.0

func (cpu *CPUBackend) Gather(x *tensor.RawTensor, dim int, index *tensor.RawTensor) *tensor.RawTensor

Gather selects elements along dim using index tensor. Similar to torch.gather(input, dim, index).

The index tensor must have dtype int32 and its shape must match input shape except at the gather dimension, where it can differ.

Example:

input: [3, 4, 5] with values
index: [3, 4, 2] (int32 indices)
dim: 2
output: [3, 4, 2] where output[i,j,k] = input[i,j,index[i,j,k]]

func (*CPUBackend) Greater ¶ added in v0.3.0

func (cpu *CPUBackend) Greater(a, b *tensor.RawTensor) *tensor.RawTensor

Greater returns a > b element-wise.

func (*CPUBackend) GreaterEqual ¶ added in v0.3.0

func (cpu *CPUBackend) GreaterEqual(a, b *tensor.RawTensor) *tensor.RawTensor

GreaterEqual returns a >= b element-wise.

func (*CPUBackend) Log ¶ added in v0.3.0

func (cpu *CPUBackend) Log(x *tensor.RawTensor) *tensor.RawTensor

Log computes element-wise natural logarithm: ln(x).

func (*CPUBackend) Lower ¶ added in v0.3.0

func (cpu *CPUBackend) Lower(a, b *tensor.RawTensor) *tensor.RawTensor

Lower returns a < b element-wise.

func (*CPUBackend) LowerEqual ¶ added in v0.3.0

func (cpu *CPUBackend) LowerEqual(a, b *tensor.RawTensor) *tensor.RawTensor

LowerEqual returns a <= b element-wise.

func (*CPUBackend) MatMul ¶

func (cpu *CPUBackend) MatMul(a, b *tensor.RawTensor) *tensor.RawTensor

MatMul performs matrix multiplication. For 2D tensors: (M, K) @ (K, N) -> (M, N) Uses naive O(n³) implementation for Phase 1. TODO: Integrate with gonum/blas for better performance in Phase 2.

func (*CPUBackend) MaxPool2D ¶

func (cpu *CPUBackend) MaxPool2D(input *tensor.RawTensor, kernelSize, stride int) *tensor.RawTensor

MaxPool2D performs 2D max pooling.

Max pooling reduces spatial dimensions by taking the maximum value in each pooling window. Unlike Conv2D, MaxPool2D has no learnable parameters.

Input shape: [batch, channels, height, width] Output shape: [batch, channels, out_height, out_width]

Where:

out_height = (height - kernelSize) / stride + 1
out_width = (width - kernelSize) / stride + 1

Algorithm:

For each batch and channel
Slide kernelSize x kernelSize window with given stride
Take maximum value in each window
Output max value

Example (2x2 pool, stride=2):

Input: [[1,2,3,4],    Output: [[4,6],
        [5,6,7,8],             [12,14]]
        [9,10,11,12],
        [13,14,15,16]]

func (*CPUBackend) MaxPool2DBackward ¶ added in v0.7.1

func (cpu *CPUBackend) MaxPool2DBackward(input, grad *tensor.RawTensor, maxIndices []int, kernelSize, stride int) *tensor.RawTensor

MaxPool2DBackward computes gradient w.r.t. input for MaxPool2D.

Algorithm: Route gradients to max positions.

Gradients flow only to positions that had the max value in forward pass
For each output position, only ONE input position receives gradient
All other positions in pooling window receive zero gradient

Example (2x2 pool, stride=2):

Input:  [[1, 2],  Output: [4]  Input Grad: [[0, 0],
         [3, 4]]                             [0, grad]]

References:

Burn framework: crates/burn-autodiff/src/ops/module.rs (max_pool2d_backward)
CS231n: Backprop for pooling layers

func (*CPUBackend) MeanDim ¶ added in v0.3.0

func (cpu *CPUBackend) MeanDim(x *tensor.RawTensor, dim int, keepDim bool) *tensor.RawTensor

MeanDim computes the mean of tensor elements along the specified dimension.

Parameters:

dim: dimension to reduce (supports negative indexing: -1 = last dim)
keepDim: if true, keep the reduced dimension with size 1; if false, remove it

Example:

x := tensor.Randn[float32]([]int{2, 3, 4}, backend)
y := backend.MeanDim(x, -1, true)   // shape: [2, 3, 1]
z := backend.MeanDim(x, -1, false)  // shape: [2, 3]

func (*CPUBackend) Mul ¶

func (cpu *CPUBackend) Mul(a, b *tensor.RawTensor) *tensor.RawTensor

Mul performs element-wise multiplication with broadcasting.

func (*CPUBackend) MulScalar ¶ added in v0.3.0

func (cpu *CPUBackend) MulScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

MulScalar multiplies each element of the tensor by a scalar value.

func (*CPUBackend) Name ¶

func (cpu *CPUBackend) Name() string

Name returns the backend name.

func (*CPUBackend) Not ¶ added in v0.3.0

func (cpu *CPUBackend) Not(x *tensor.RawTensor) *tensor.RawTensor

Not computes element-wise logical NOT.

func (*CPUBackend) NotEqual ¶ added in v0.3.0

func (cpu *CPUBackend) NotEqual(a, b *tensor.RawTensor) *tensor.RawTensor

NotEqual returns a != b element-wise.

func (*CPUBackend) Or ¶ added in v0.3.0

func (cpu *CPUBackend) Or(a, b *tensor.RawTensor) *tensor.RawTensor

Or computes element-wise logical OR.

func (*CPUBackend) ReLU ¶ added in v0.2.0

func (cpu *CPUBackend) ReLU(x *tensor.RawTensor) *tensor.RawTensor

ReLU applies ReLU activation: max(0, x).

func (*CPUBackend) Reshape ¶

func (cpu *CPUBackend) Reshape(t *tensor.RawTensor, newShape tensor.Shape) *tensor.RawTensor

Reshape returns a tensor with the same data but different shape.

func (*CPUBackend) Rsqrt ¶ added in v0.3.0

func (cpu *CPUBackend) Rsqrt(x *tensor.RawTensor) *tensor.RawTensor

Rsqrt computes element-wise reciprocal square root: 1/sqrt(x). This is optimized for use in normalization layers (RMSNorm, LayerNorm).

func (*CPUBackend) Sin ¶ added in v0.3.0

func (cpu *CPUBackend) Sin(x *tensor.RawTensor) *tensor.RawTensor

Sin computes element-wise sine: sin(x).

func (*CPUBackend) Softmax ¶ added in v0.2.0

func (cpu *CPUBackend) Softmax(x *tensor.RawTensor, dim int) *tensor.RawTensor

Softmax computes softmax along the specified dimension. Softmax(x_i) = exp(x_i) / sum(exp(x_j)) for all j in dimension.

func (*CPUBackend) Sqrt ¶ added in v0.3.0

func (cpu *CPUBackend) Sqrt(x *tensor.RawTensor) *tensor.RawTensor

Sqrt computes element-wise square root: sqrt(x).

func (*CPUBackend) Squeeze ¶ added in v0.3.0

func (cpu *CPUBackend) Squeeze(x *tensor.RawTensor, dim int) *tensor.RawTensor

Squeeze removes a dimension of size 1 at the specified position.

Panics if the dimension size is not 1. Supports negative dim indexing. This is a view operation (reshape).

Example:

x := tensor.Randn[float32]([]int{2, 1, 3}, backend)
y := backend.Squeeze(x, 1)  // Shape: [2, 3]

func (*CPUBackend) Sub ¶

func (cpu *CPUBackend) Sub(a, b *tensor.RawTensor) *tensor.RawTensor

Sub performs element-wise subtraction with broadcasting.

func (*CPUBackend) SubScalar ¶ added in v0.3.0

func (cpu *CPUBackend) SubScalar(x *tensor.RawTensor, scalar any) *tensor.RawTensor

SubScalar subtracts a scalar value from each element of the tensor.

func (*CPUBackend) Sum ¶ added in v0.3.0

func (cpu *CPUBackend) Sum(x *tensor.RawTensor) *tensor.RawTensor

Sum computes the total sum of all elements in the tensor (scalar result).

func (*CPUBackend) SumDim ¶ added in v0.3.0

func (cpu *CPUBackend) SumDim(x *tensor.RawTensor, dim int, keepDim bool) *tensor.RawTensor

SumDim sums tensor elements along the specified dimension.

Parameters:

dim: dimension to reduce (supports negative indexing: -1 = last dim)
keepDim: if true, keep the reduced dimension with size 1; if false, remove it

Example:

x := tensor.Randn[float32]([]int{2, 3, 4}, backend)
y := backend.SumDim(x, -1, true)   // shape: [2, 3, 1]
z := backend.SumDim(x, -1, false)  // shape: [2, 3]

func (*CPUBackend) Transpose ¶

func (cpu *CPUBackend) Transpose(t *tensor.RawTensor, axes ...int) *tensor.RawTensor

Transpose transposes the tensor by permuting its dimensions.

func (*CPUBackend) Unsqueeze ¶ added in v0.3.0

func (cpu *CPUBackend) Unsqueeze(x *tensor.RawTensor, dim int) *tensor.RawTensor

Unsqueeze adds a dimension of size 1 at the specified position.

Supports negative dim indexing. This is a view operation (reshape).

Example:

x := tensor.Randn[float32]([]int{2, 3}, backend)
y := backend.Unsqueeze(x, 1)  // Shape: [2, 1, 3]

func (*CPUBackend) Where ¶ added in v0.3.0

func (cpu *CPUBackend) Where(condition, x, y *tensor.RawTensor) *tensor.RawTensor

Where performs conditional element selection. Similar to torch.where(condition, x, y).

Returns a tensor where each element is selected from x if condition is true, otherwise from y. All tensors must have compatible shapes (broadcasting supported).

Example:

condition: [3, 4] (bool tensor)
x: [3, 4] (float32)
y: [3, 4] (float32)
output: [3, 4] where output[i,j] = condition[i,j] ? x[i,j] : y[i,j]

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

Types ¶

type CPUBackend ¶

func New ¶

func (*CPUBackend) Add ¶

func (*CPUBackend) AddScalar ¶ added in v0.3.0

func (*CPUBackend) And ¶ added in v0.3.0

func (*CPUBackend) Argmax ¶ added in v0.3.0

func (*CPUBackend) BatchMatMul ¶ added in v0.4.0

func (*CPUBackend) Cast ¶ added in v0.3.0

func (*CPUBackend) Cat ¶ added in v0.3.0

func (*CPUBackend) Chunk ¶ added in v0.3.0

func (*CPUBackend) Conv2D ¶

func (*CPUBackend) Conv2DInputBackward ¶ added in v0.7.1

func (*CPUBackend) Conv2DKernelBackward ¶ added in v0.7.1

func (*CPUBackend) Cos ¶ added in v0.3.0

func (*CPUBackend) Device ¶

func (*CPUBackend) Div ¶

func (*CPUBackend) DivScalar ¶ added in v0.3.0

func (*CPUBackend) Embedding ¶ added in v0.5.1

func (*CPUBackend) Equal ¶ added in v0.3.0

func (*CPUBackend) Exp ¶ added in v0.3.0

func (*CPUBackend) Expand ¶ added in v0.3.0

func (*CPUBackend) Gather ¶ added in v0.3.0

func (*CPUBackend) Greater ¶ added in v0.3.0

func (*CPUBackend) GreaterEqual ¶ added in v0.3.0

func (*CPUBackend) Log ¶ added in v0.3.0

func (*CPUBackend) Lower ¶ added in v0.3.0

func (*CPUBackend) LowerEqual ¶ added in v0.3.0

func (*CPUBackend) MatMul ¶

func (*CPUBackend) MaxPool2D ¶

func (*CPUBackend) MaxPool2DBackward ¶ added in v0.7.1

func (*CPUBackend) MeanDim ¶ added in v0.3.0

func (*CPUBackend) Mul ¶

func (*CPUBackend) MulScalar ¶ added in v0.3.0

func (*CPUBackend) Name ¶

func (*CPUBackend) Not ¶ added in v0.3.0

func (*CPUBackend) NotEqual ¶ added in v0.3.0

func (*CPUBackend) Or ¶ added in v0.3.0

func (*CPUBackend) ReLU ¶ added in v0.2.0

func (*CPUBackend) Reshape ¶

func (*CPUBackend) Rsqrt ¶ added in v0.3.0

func (*CPUBackend) Sin ¶ added in v0.3.0

func (*CPUBackend) Softmax ¶ added in v0.2.0

func (*CPUBackend) Sqrt ¶ added in v0.3.0

func (*CPUBackend) Squeeze ¶ added in v0.3.0

func (*CPUBackend) Sub ¶

func (*CPUBackend) SubScalar ¶ added in v0.3.0

func (*CPUBackend) Sum ¶ added in v0.3.0

func (*CPUBackend) SumDim ¶ added in v0.3.0

func (*CPUBackend) Transpose ¶

func (*CPUBackend) Unsqueeze ¶ added in v0.3.0

func (*CPUBackend) Where ¶ added in v0.3.0

Source Files ¶