cpu

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 17, 2025 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Overview

Package cpu implements the CPU backend with SIMD optimizations and BLAS integration.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CPUBackend

type CPUBackend struct {
	// contains filtered or unexported fields
}

CPUBackend implements tensor operations on CPU with optional SIMD and BLAS optimizations.

func New

func New() *CPUBackend

New creates a new CPU backend.

func (*CPUBackend) Add

func (cpu *CPUBackend) Add(a, b *tensor.RawTensor) *tensor.RawTensor

Add performs element-wise addition with NumPy-style broadcasting.

func (*CPUBackend) Conv2D

func (cpu *CPUBackend) Conv2D(input, kernel *tensor.RawTensor, stride, padding int) *tensor.RawTensor

Conv2D performs 2D convolution using im2col algorithm.

Input shape: [batch, in_channels, height, width] Kernel shape: [out_channels, in_channels, kernel_h, kernel_w] Output shape: [batch, out_channels, out_h, out_w]

Parameters:

  • input: Input tensor [N, C_in, H, W]
  • kernel: Convolution kernel [C_out, C_in, K_h, K_w]
  • stride: Stride for convolution (default: 1)
  • padding: Padding to apply (default: 0)

Algorithm: Im2col

  1. Transform input patches into columns (im2col)
  2. Reshape kernel into matrix
  3. Perform matrix multiplication
  4. Reshape output to [N, C_out, H_out, W_out]

Im2col is efficient because:

  • Converts convolution to matmul (highly optimized)
  • Cache-friendly memory access
  • Reuses existing matmul code

Reference: "High Performance Convolutional Neural Networks for Document Processing" (Chellapilla et al., 2006).

func (*CPUBackend) Device

func (cpu *CPUBackend) Device() tensor.Device

Device returns the compute device.

func (*CPUBackend) Div

func (cpu *CPUBackend) Div(a, b *tensor.RawTensor) *tensor.RawTensor

Div performs element-wise division with broadcasting.

func (*CPUBackend) MatMul

func (cpu *CPUBackend) MatMul(a, b *tensor.RawTensor) *tensor.RawTensor

MatMul performs matrix multiplication. For 2D tensors: (M, K) @ (K, N) -> (M, N) Uses naive O(n³) implementation for Phase 1. TODO: Integrate with gonum/blas for better performance in Phase 2.

func (*CPUBackend) MaxPool2D

func (cpu *CPUBackend) MaxPool2D(input *tensor.RawTensor, kernelSize, stride int) *tensor.RawTensor

MaxPool2D performs 2D max pooling.

Max pooling reduces spatial dimensions by taking the maximum value in each pooling window. Unlike Conv2D, MaxPool2D has no learnable parameters.

Input shape: [batch, channels, height, width] Output shape: [batch, channels, out_height, out_width]

Where:

out_height = (height - kernelSize) / stride + 1
out_width = (width - kernelSize) / stride + 1

Algorithm:

  1. For each batch and channel
  2. Slide kernelSize x kernelSize window with given stride
  3. Take maximum value in each window
  4. Output max value

Example (2x2 pool, stride=2):

Input: [[1,2,3,4],    Output: [[4,6],
        [5,6,7,8],             [12,14]]
        [9,10,11,12],
        [13,14,15,16]]

func (*CPUBackend) Mul

func (cpu *CPUBackend) Mul(a, b *tensor.RawTensor) *tensor.RawTensor

Mul performs element-wise multiplication with broadcasting.

func (*CPUBackend) Name

func (cpu *CPUBackend) Name() string

Name returns the backend name.

func (*CPUBackend) Reshape

func (cpu *CPUBackend) Reshape(t *tensor.RawTensor, newShape tensor.Shape) *tensor.RawTensor

Reshape returns a tensor with the same data but different shape.

func (*CPUBackend) Sub

func (cpu *CPUBackend) Sub(a, b *tensor.RawTensor) *tensor.RawTensor

Sub performs element-wise subtraction with broadcasting.

func (*CPUBackend) Transpose

func (cpu *CPUBackend) Transpose(t *tensor.RawTensor, axes ...int) *tensor.RawTensor

Transpose transposes the tensor by permuting its dimensions.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL