nn

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 17, 2025 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package nn implements neural network modules for the Born ML Framework.

This package provides building blocks for constructing neural networks:

  • Module interface: Base interface for all NN components
  • Parameter: Trainable parameters with gradient tracking
  • Linear: Fully connected layer
  • Activations: ReLU, Sigmoid, Tanh
  • Loss functions: MSE, CrossEntropy
  • Sequential: Container for stacking layers

Design inspired by PyTorch's nn.Module but adapted for Go generics.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Accuracy

func Accuracy[B tensor.Backend](
	logits *tensor.Tensor[float32, B],
	targets *tensor.Tensor[int32, B],
) float32

Accuracy computes classification accuracy for a batch.

Parameters:

  • logits: Model predictions [batch_size, num_classes]
  • targets: Ground truth class indices [batch_size]

Returns:

  • Accuracy as a float between 0 and 1.

func CrossEntropyBackward

func CrossEntropyBackward[B tensor.Backend](
	logits *tensor.Tensor[float32, B],
	targets *tensor.Tensor[int32, B],
	backend B,
) *tensor.Tensor[float32, B]

CrossEntropyBackward computes gradient of CrossEntropyLoss w.r.t. logits.

This function provides manual backward pass for CrossEntropyLoss. It will be integrated with autodiff in Phase 2.

Gradient Formula:

∂L/∂logits[i] = softmax(logits)[i] - y_one_hot[i]
              = probs[i] - (1 if i==target else 0)

For single class target:

∂L/∂logits[i] = probs[i]         if i ≠ target
∂L/∂logits[i] = probs[i] - 1     if i = target

Parameters:

  • logits: [batch_size, num_classes]
  • targets: [batch_size] (class indices)

Returns:

  • grads: [batch_size, num_classes] gradient tensor

Note: Gradients are automatically averaged over batch size.

func Ones

func Ones[B tensor.Backend](shape tensor.Shape, backend B) *tensor.Tensor[float32, B]

Ones creates a tensor filled with ones.

Parameters:

  • shape: Shape of the tensor
  • backend: Backend to use for tensor creation

Returns a tensor filled with ones.

func Randn

func Randn[B tensor.Backend](shape tensor.Shape, backend B) *tensor.Tensor[float32, B]

Randn creates a tensor with random values from standard normal distribution.

Values are drawn from N(0, 1).

Parameters:

  • shape: Shape of the tensor
  • backend: Backend to use for tensor creation

Returns a tensor with random normal values.

func Xavier

func Xavier[B tensor.Backend](fanIn, fanOut int, shape tensor.Shape, backend B) *tensor.Tensor[float32, B]

Xavier (Glorot) initialization for weights.

Initializes weights with values drawn from a uniform distribution: U(-sqrt(6/(fan_in + fan_out)), sqrt(6/(fan_in + fan_out)))

This initialization helps maintain variance of activations across layers.

Parameters:

  • fanIn: Number of input units
  • fanOut: Number of output units
  • shape: Shape of the weight tensor
  • backend: Backend to use for tensor creation

Returns a tensor initialized with Xavier distribution.

func Zeros

func Zeros[B tensor.Backend](shape tensor.Shape, backend B) *tensor.Tensor[float32, B]

Zeros creates a tensor filled with zeros.

This is commonly used for bias initialization.

Parameters:

  • shape: Shape of the tensor
  • backend: Backend to use for tensor creation

Returns a zero-filled tensor.

Types

type Conv2D

type Conv2D[B tensor.Backend] struct {
	// contains filtered or unexported fields
}

Conv2D is a 2D convolutional layer.

Performs convolution: output = Conv2D(input, weight) + bias

Input shape: [batch, in_channels, height, width] Weight shape: [out_channels, in_channels, kernel_h, kernel_w] Bias shape: [out_channels] Output shape: [batch, out_channels, out_h, out_w]

Where:

out_h = (height + 2*padding - kernel_h) / stride + 1
out_w = (width + 2*padding - kernel_w) / stride + 1

Example:

// Create 2D conv: 1 channel -> 6 channels, 5x5 kernel
conv := nn.NewConv2D(1, 6, 5, 5, 1, 0, true, backend)

// Forward pass
input := tensor.Zeros[float32](tensor.Shape{32, 1, 28, 28}, backend) // MNIST-like
output := conv.Forward(input) // [32, 6, 24, 24]

func NewConv2D

func NewConv2D[B tensor.Backend](
	inChannels, outChannels int,
	kernelH, kernelW int,
	stride, padding int,
	useBias bool,
	backend B,
) *Conv2D[B]

NewConv2D creates a new 2D convolutional layer with Xavier initialization.

Parameters:

  • inChannels: Number of input channels
  • outChannels: Number of output channels (number of filters)
  • kernelH, kernelW: Kernel dimensions
  • stride: Stride for convolution (commonly 1 or 2)
  • padding: Zero padding to apply to input (commonly 0, 1, 2)
  • useBias: Whether to include bias term
  • backend: Backend for computation

Initialization:

  • Weights: Xavier/Glorot uniform initialization
  • Bias: Zeros

func (*Conv2D[B]) ComputeOutputSize

func (c *Conv2D[B]) ComputeOutputSize(inputH, inputW int) [2]int

ComputeOutputSize computes output spatial dimensions for given input size.

Returns: [out_height, out_width].

func (*Conv2D[B]) Forward

func (c *Conv2D[B]) Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]

Forward performs the forward pass.

Input: [batch, in_channels, height, width] Output: [batch, out_channels, out_h, out_w].

func (*Conv2D[B]) InChannels

func (c *Conv2D[B]) InChannels() int

InChannels returns the number of input channels.

func (*Conv2D[B]) KernelSize

func (c *Conv2D[B]) KernelSize() [2]int

KernelSize returns the kernel size [height, width].

func (*Conv2D[B]) OutChannels

func (c *Conv2D[B]) OutChannels() int

OutChannels returns the number of output channels.

func (*Conv2D[B]) Padding

func (c *Conv2D[B]) Padding() int

Padding returns the padding.

func (*Conv2D[B]) Parameters

func (c *Conv2D[B]) Parameters() []*Parameter[B]

Parameters returns all trainable parameters.

func (*Conv2D[B]) Stride

func (c *Conv2D[B]) Stride() int

Stride returns the stride.

func (*Conv2D[B]) String

func (c *Conv2D[B]) String() string

String returns a string representation of the layer.

type CrossEntropyLoss

type CrossEntropyLoss[B tensor.Backend] struct {
	// contains filtered or unexported fields
}

CrossEntropyLoss computes cross-entropy loss for multi-class classification.

This implementation uses the LogSoftmax + NLLLoss decomposition for numerical stability, following modern best practices (PyTorch, Burn 2025).

Mathematical Formulation:

Loss = -log_probs[target]
where log_probs = LogSoftmax(logits)

Gradient (Backward):

∂L/∂logits = Softmax(logits) - y_one_hot

Usage:

criterion := nn.NewCrossEntropyLoss[Backend](backend)
logits := model.Forward(input)  // [batch_size, num_classes]
loss := criterion.Forward(logits, targets)  // targets: [batch_size] (class indices)

Key Properties:

  • Expects raw logits (unnormalized scores) as input
  • Uses log-sum-exp trick for numerical stability
  • Prevents overflow when logits > 88 (float32 limit)
  • Prevents underflow when all logits are very negative

References:

  • "Adam: A Method for Stochastic Optimization" (Kingma & Ba, 2014)
  • PyTorch CrossEntropyLoss documentation
  • Burn framework loss implementations

func NewCrossEntropyLoss

func NewCrossEntropyLoss[B tensor.Backend](backend B) *CrossEntropyLoss[B]

NewCrossEntropyLoss creates a new cross-entropy loss function.

func (*CrossEntropyLoss[B]) Forward

func (c *CrossEntropyLoss[B]) Forward(
	logits *tensor.Tensor[float32, B],
	targets *tensor.Tensor[int32, B],
) *tensor.Tensor[float32, B]

Forward computes cross-entropy loss.

Parameters:

  • logits: Model predictions (unnormalized scores) with shape [batch_size, num_classes]
  • targets: Ground truth class indices with shape [batch_size] (values in range [0, num_classes-1])

Returns:

  • Scalar loss value (mean over batch)

Note: This is a simplified implementation for Phase 1 (MNIST proof-of-concept). Full autodiff support for Softmax/Log operations will be added in Phase 2.

func (*CrossEntropyLoss[B]) Parameters

func (c *CrossEntropyLoss[B]) Parameters() []*Parameter[B]

Parameters returns an empty slice (loss functions have no trainable parameters).

type Linear

type Linear[B tensor.Backend] struct {
	// contains filtered or unexported fields
}

Linear implements a fully connected (dense) layer.

Performs the transformation: y = x @ W.T + b where:

  • x is the input tensor with shape [batch_size, in_features]
  • W is the weight matrix with shape [out_features, in_features]
  • b is the bias vector with shape [out_features]
  • y is the output tensor with shape [batch_size, out_features]

Weights are initialized using Xavier/Glorot initialization. Biases are initialized to zeros.

Example:

backend := cpu.New()
layer := nn.NewLinear(784, 128, backend)

input := tensor.Randn[float32](tensor.Shape{32, 784}, backend)  // batch_size=32
output := layer.Forward(input)  // shape: [32, 128]

func NewLinear

func NewLinear[B tensor.Backend](inFeatures, outFeatures int, backend B) *Linear[B]

NewLinear creates a new Linear layer.

Weights are initialized using Xavier/Glorot uniform distribution. Biases are initialized to zeros.

Parameters:

  • inFeatures: Number of input features
  • outFeatures: Number of output features
  • backend: Backend to use for tensor operations

Returns a new Linear layer.

func (*Linear[B]) Bias

func (l *Linear[B]) Bias() *Parameter[B]

Bias returns the bias parameter.

func (*Linear[B]) Forward

func (l *Linear[B]) Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]

Forward computes the output of the linear layer.

Performs: y = x @ W.T + b

Input shape: [batch_size, in_features] Output shape: [batch_size, out_features]

Parameters:

  • input: Input tensor with shape [batch_size, in_features]

Returns output tensor with shape [batch_size, out_features].

func (*Linear[B]) InFeatures

func (l *Linear[B]) InFeatures() int

InFeatures returns the number of input features.

func (*Linear[B]) OutFeatures

func (l *Linear[B]) OutFeatures() int

OutFeatures returns the number of output features.

func (*Linear[B]) Parameters

func (l *Linear[B]) Parameters() []*Parameter[B]

Parameters returns the trainable parameters of this layer.

Returns [weight, bias].

func (*Linear[B]) Weight

func (l *Linear[B]) Weight() *Parameter[B]

Weight returns the weight parameter.

type MSELoss

type MSELoss[B tensor.Backend] struct {
	// contains filtered or unexported fields
}

MSELoss computes Mean Squared Error loss.

Loss = mean((predictions - targets)²)

MSE is commonly used for regression tasks where the goal is to predict continuous values.

Example:

mse := nn.NewMSELoss[Backend]()
predictions := model.Forward(input)
loss := mse.Forward(predictions, targets)

func NewMSELoss

func NewMSELoss[B tensor.Backend](backend B) *MSELoss[B]

NewMSELoss creates a new MSE loss function.

func (*MSELoss[B]) Forward

func (m *MSELoss[B]) Forward(predictions, targets *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]

Forward computes the MSE loss.

Loss = mean((predictions - targets)²)

Parameters:

  • predictions: Model predictions with shape [batch_size, ...]
  • targets: Ground truth targets with same shape as predictions

Returns a scalar loss value (shape [1] or []).

func (*MSELoss[B]) Parameters

func (m *MSELoss[B]) Parameters() []*Parameter[B]

Parameters returns an empty slice (loss functions have no trainable parameters).

type MaxPool2D

type MaxPool2D[B tensor.Backend] struct {
	// contains filtered or unexported fields
}

MaxPool2D is a 2D max pooling layer.

Max pooling reduces spatial dimensions by taking the maximum value in each non-overlapping window. Unlike Conv2D, MaxPool2D has no learnable parameters.

Input shape: [batch, channels, height, width] Output shape: [batch, channels, out_height, out_width]

Where:

out_height = (height - kernelSize) / stride + 1
out_width = (width - kernelSize) / stride + 1

Common configurations:

  • 2x2 pool, stride=2: Reduces spatial dimensions by half (most common)
  • 3x3 pool, stride=2: Aggressive downsampling
  • 2x2 pool, stride=1: Overlapping pooling (less common)

Example:

// Create 2x2 max pooling with stride 2
pool := nn.NewMaxPool2D(2, 2, backend)

// Forward pass
input := tensor.Randn[float32](tensor.Shape{32, 64, 28, 28}, backend)
output := pool.Forward(input) // [32, 64, 14, 14]

func NewMaxPool2D

func NewMaxPool2D[B tensor.Backend](kernelSize, stride int, backend B) *MaxPool2D[B]

NewMaxPool2D creates a new 2D max pooling layer.

Parameters:

  • kernelSize: Size of pooling window (square)
  • stride: Stride for pooling (typically same as kernelSize for non-overlapping)
  • backend: Backend for computation

Common patterns:

  • NewMaxPool2D(2, 2, backend): Standard 2x2 non-overlapping pooling
  • NewMaxPool2D(3, 2, backend): Overlapping 3x3 pooling with stride 2

func (*MaxPool2D[B]) ComputeOutputSize

func (m *MaxPool2D[B]) ComputeOutputSize(inputH, inputW int) [2]int

ComputeOutputSize computes output spatial dimensions for given input size.

Returns: [out_height, out_width].

func (*MaxPool2D[B]) Forward

func (m *MaxPool2D[B]) Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]

Forward performs the forward pass.

Input: [batch, channels, height, width] Output: [batch, channels, out_height, out_width].

func (*MaxPool2D[B]) KernelSize

func (m *MaxPool2D[B]) KernelSize() int

KernelSize returns the pooling kernel size.

func (*MaxPool2D[B]) Parameters

func (m *MaxPool2D[B]) Parameters() []*Parameter[B]

Parameters returns all trainable parameters (empty for MaxPool2D).

MaxPool2D has no learnable parameters, so this always returns an empty slice.

func (*MaxPool2D[B]) Stride

func (m *MaxPool2D[B]) Stride() int

Stride returns the stride.

func (*MaxPool2D[B]) String

func (m *MaxPool2D[B]) String() string

String returns a string representation of the layer.

type Module

type Module[B tensor.Backend] interface {
	// Forward computes the output of the module given an input tensor.
	//
	// The input tensor should have the appropriate shape for this module.
	// For example, Linear expects [batch_size, in_features].
	//
	// Returns the output tensor with shape determined by the module type.
	Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]

	// Parameters returns all trainable parameters of this module.
	//
	// This includes weights, biases, and any nested module parameters.
	// Returns an empty slice for modules without trainable parameters
	// (e.g., activation functions).
	Parameters() []*Parameter[B]
}

Module is the base interface for all neural network components.

Every NN module must implement:

  • Forward: Compute output from input
  • Parameters: Return all trainable parameters

Modules can be composed to build complex architectures:

model := nn.Sequential[Backend](
    nn.NewLinear(784, 128, backend),
    nn.NewReLU(),
    nn.NewLinear(128, 10, backend),
)

Type parameter B must satisfy the tensor.Backend interface.

type Parameter

type Parameter[B tensor.Backend] struct {
	// contains filtered or unexported fields
}

Parameter represents a trainable parameter in a neural network.

Parameters are tensors that require gradient computation during training. They typically represent weights and biases of layers.

Example:

// Create a weight parameter
weight := nn.NewParameter("weight", weightTensor)

// Access the tensor
w := weight.Tensor()

// Get gradient after backward pass
grad := weight.Grad()

func NewParameter

func NewParameter[B tensor.Backend](name string, t *tensor.Tensor[float32, B]) *Parameter[B]

NewParameter creates a new trainable parameter.

The parameter tensor should be initialized before creating the Parameter. Gradient will be allocated during the first backward pass.

Parameters:

  • name: Descriptive name for this parameter (e.g., "linear1.weight")
  • tensor: The initialized parameter tensor

Returns a new Parameter.

func (*Parameter[B]) Grad

func (p *Parameter[B]) Grad() *tensor.Tensor[float32, B]

Grad returns the gradient tensor.

Returns nil if no gradient has been computed yet (before backward pass).

func (*Parameter[B]) Name

func (p *Parameter[B]) Name() string

Name returns the parameter name.

func (*Parameter[B]) SetGrad

func (p *Parameter[B]) SetGrad(grad *tensor.Tensor[float32, B])

SetGrad sets the gradient tensor.

This is typically called by the optimizer or during backward pass.

func (*Parameter[B]) Tensor

func (p *Parameter[B]) Tensor() *tensor.Tensor[float32, B]

Tensor returns the parameter tensor.

func (*Parameter[B]) ZeroGrad

func (p *Parameter[B]) ZeroGrad()

ZeroGrad clears the gradient tensor.

This should be called before each training iteration to avoid accumulating gradients from previous iterations.

type ReLU

type ReLU[B tensor.Backend] struct{}

ReLU is a Rectified Linear Unit activation module.

Applies the element-wise function: f(x) = max(0, x)

ReLU is the most commonly used activation function in deep learning. It helps with the vanishing gradient problem and is computationally efficient.

Example:

relu := nn.NewReLU[Backend]()
output := relu.Forward(input)  // All negative values become 0

func NewReLU

func NewReLU[B tensor.Backend]() *ReLU[B]

NewReLU creates a new ReLU activation module.

func (*ReLU[B]) Forward

func (r *ReLU[B]) Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]

Forward applies ReLU activation: f(x) = max(0, x).

func (*ReLU[B]) Parameters

func (r *ReLU[B]) Parameters() []*Parameter[B]

Parameters returns an empty slice (ReLU has no trainable parameters).

type ReLUBackend

type ReLUBackend interface {
	ReLU(*tensor.RawTensor) *tensor.RawTensor
}

ReLUBackend is an interface for backends that support ReLU activation.

type Sequential

type Sequential[B tensor.Backend] struct {
	// contains filtered or unexported fields
}

Sequential is a container module that chains multiple modules together.

Each module's output becomes the next module's input, creating a sequential pipeline of transformations.

Example:

model := nn.NewSequential(
    nn.NewLinear(784, 128, backend),
    nn.NewReLU(),
    nn.NewLinear(128, 10, backend),
)

output := model.Forward(input)

This is equivalent to:

h1 := linear1.Forward(input)
h2 := relu.Forward(h1)
output := linear2.Forward(h2)

func NewSequential

func NewSequential[B tensor.Backend](modules ...Module[B]) *Sequential[B]

NewSequential creates a new Sequential container.

Parameters:

  • modules: List of modules to chain together

Returns a new Sequential container.

func (*Sequential[B]) Add

func (s *Sequential[B]) Add(module Module[B])

Add appends a module to the sequence.

This allows building models incrementally:

model := nn.NewSequential[Backend]()
model.Add(nn.NewLinear(784, 128, backend))
model.Add(nn.NewReLU())
model.Add(nn.NewLinear(128, 10, backend))

func (*Sequential[B]) Forward

func (s *Sequential[B]) Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]

Forward applies all modules in sequence.

The output of each module becomes the input to the next module.

Parameters:

  • input: Input tensor to the first module

Returns the output of the last module.

func (*Sequential[B]) Len

func (s *Sequential[B]) Len() int

Len returns the number of modules in the sequence.

func (*Sequential[B]) Module

func (s *Sequential[B]) Module(index int) Module[B]

Module returns the module at the given index.

Panics if index is out of bounds.

func (*Sequential[B]) Parameters

func (s *Sequential[B]) Parameters() []*Parameter[B]

Parameters returns all trainable parameters from all modules.

Parameters are collected from all modules in the sequence.

type Sigmoid

type Sigmoid[B tensor.Backend] struct{}

Sigmoid is a sigmoid activation module.

Applies the element-wise function: σ(x) = 1 / (1 + exp(-x))

Sigmoid squashes values to the range (0, 1), making it useful for binary classification and gate mechanisms in LSTMs/GRUs.

Example:

sigmoid := nn.NewSigmoid[Backend]()
output := sigmoid.Forward(input)  // Values in range (0, 1)

func NewSigmoid

func NewSigmoid[B tensor.Backend]() *Sigmoid[B]

NewSigmoid creates a new Sigmoid activation module.

func (*Sigmoid[B]) Forward

func (s *Sigmoid[B]) Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]

Forward applies Sigmoid activation: σ(x) = 1 / (1 + exp(-x)).

func (*Sigmoid[B]) Parameters

func (s *Sigmoid[B]) Parameters() []*Parameter[B]

Parameters returns an empty slice (Sigmoid has no trainable parameters).

type SigmoidBackend

type SigmoidBackend interface {
	Sigmoid(*tensor.RawTensor) *tensor.RawTensor
}

SigmoidBackend is an interface for backends that support Sigmoid activation.

type Tanh

type Tanh[B tensor.Backend] struct{}

Tanh is a hyperbolic tangent activation module.

Applies the element-wise function: tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

Tanh squashes values to the range (-1, 1), making it zero-centered which can help with training. Often used in RNNs.

Example:

tanh := nn.NewTanh[Backend]()
output := tanh.Forward(input)  // Values in range (-1, 1)

func NewTanh

func NewTanh[B tensor.Backend]() *Tanh[B]

NewTanh creates a new Tanh activation module.

func (*Tanh[B]) Forward

func (t *Tanh[B]) Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]

Forward applies Tanh activation.

func (*Tanh[B]) Parameters

func (t *Tanh[B]) Parameters() []*Parameter[B]

Parameters returns an empty slice (Tanh has no trainable parameters).

type TanhBackend

type TanhBackend interface {
	Tanh(*tensor.RawTensor) *tensor.RawTensor
}

TanhBackend is an interface for backends that support Tanh activation.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL