Documentation
¶
Overview ¶
Package nn implements neural network modules for the Born ML Framework.
This package provides building blocks for constructing neural networks:
- Module interface: Base interface for all NN components
- Parameter: Trainable parameters with gradient tracking
- Linear: Fully connected layer
- Activations: ReLU, Sigmoid, Tanh
- Loss functions: MSE, CrossEntropy
- Sequential: Container for stacking layers
Design inspired by PyTorch's nn.Module but adapted for Go generics.
Index ¶
- func Accuracy[B tensor.Backend](logits *tensor.Tensor[float32, B], targets *tensor.Tensor[int32, B]) float32
- func CrossEntropyBackward[B tensor.Backend](logits *tensor.Tensor[float32, B], targets *tensor.Tensor[int32, B], backend B) *tensor.Tensor[float32, B]
- func Ones[B tensor.Backend](shape tensor.Shape, backend B) *tensor.Tensor[float32, B]
- func Randn[B tensor.Backend](shape tensor.Shape, backend B) *tensor.Tensor[float32, B]
- func Xavier[B tensor.Backend](fanIn, fanOut int, shape tensor.Shape, backend B) *tensor.Tensor[float32, B]
- func Zeros[B tensor.Backend](shape tensor.Shape, backend B) *tensor.Tensor[float32, B]
- type Conv2D
- func (c *Conv2D[B]) ComputeOutputSize(inputH, inputW int) [2]int
- func (c *Conv2D[B]) Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]
- func (c *Conv2D[B]) InChannels() int
- func (c *Conv2D[B]) KernelSize() [2]int
- func (c *Conv2D[B]) OutChannels() int
- func (c *Conv2D[B]) Padding() int
- func (c *Conv2D[B]) Parameters() []*Parameter[B]
- func (c *Conv2D[B]) Stride() int
- func (c *Conv2D[B]) String() string
- type CrossEntropyLoss
- type Linear
- type MSELoss
- type MaxPool2D
- func (m *MaxPool2D[B]) ComputeOutputSize(inputH, inputW int) [2]int
- func (m *MaxPool2D[B]) Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]
- func (m *MaxPool2D[B]) KernelSize() int
- func (m *MaxPool2D[B]) Parameters() []*Parameter[B]
- func (m *MaxPool2D[B]) Stride() int
- func (m *MaxPool2D[B]) String() string
- type Module
- type Parameter
- type ReLU
- type ReLUBackend
- type Sequential
- type Sigmoid
- type SigmoidBackend
- type Tanh
- type TanhBackend
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Accuracy ¶
func Accuracy[B tensor.Backend]( logits *tensor.Tensor[float32, B], targets *tensor.Tensor[int32, B], ) float32
Accuracy computes classification accuracy for a batch.
Parameters:
- logits: Model predictions [batch_size, num_classes]
- targets: Ground truth class indices [batch_size]
Returns:
- Accuracy as a float between 0 and 1.
func CrossEntropyBackward ¶
func CrossEntropyBackward[B tensor.Backend]( logits *tensor.Tensor[float32, B], targets *tensor.Tensor[int32, B], backend B, ) *tensor.Tensor[float32, B]
CrossEntropyBackward computes gradient of CrossEntropyLoss w.r.t. logits.
This function provides manual backward pass for CrossEntropyLoss. It will be integrated with autodiff in Phase 2.
Gradient Formula:
∂L/∂logits[i] = softmax(logits)[i] - y_one_hot[i]
= probs[i] - (1 if i==target else 0)
For single class target:
∂L/∂logits[i] = probs[i] if i ≠ target ∂L/∂logits[i] = probs[i] - 1 if i = target
Parameters:
- logits: [batch_size, num_classes]
- targets: [batch_size] (class indices)
Returns:
- grads: [batch_size, num_classes] gradient tensor
Note: Gradients are automatically averaged over batch size.
func Ones ¶
Ones creates a tensor filled with ones.
Parameters:
- shape: Shape of the tensor
- backend: Backend to use for tensor creation
Returns a tensor filled with ones.
func Randn ¶
Randn creates a tensor with random values from standard normal distribution.
Values are drawn from N(0, 1).
Parameters:
- shape: Shape of the tensor
- backend: Backend to use for tensor creation
Returns a tensor with random normal values.
func Xavier ¶
func Xavier[B tensor.Backend](fanIn, fanOut int, shape tensor.Shape, backend B) *tensor.Tensor[float32, B]
Xavier (Glorot) initialization for weights.
Initializes weights with values drawn from a uniform distribution: U(-sqrt(6/(fan_in + fan_out)), sqrt(6/(fan_in + fan_out)))
This initialization helps maintain variance of activations across layers.
Parameters:
- fanIn: Number of input units
- fanOut: Number of output units
- shape: Shape of the weight tensor
- backend: Backend to use for tensor creation
Returns a tensor initialized with Xavier distribution.
Types ¶
type Conv2D ¶
Conv2D is a 2D convolutional layer.
Performs convolution: output = Conv2D(input, weight) + bias
Input shape: [batch, in_channels, height, width] Weight shape: [out_channels, in_channels, kernel_h, kernel_w] Bias shape: [out_channels] Output shape: [batch, out_channels, out_h, out_w]
Where:
out_h = (height + 2*padding - kernel_h) / stride + 1 out_w = (width + 2*padding - kernel_w) / stride + 1
Example:
// Create 2D conv: 1 channel -> 6 channels, 5x5 kernel
conv := nn.NewConv2D(1, 6, 5, 5, 1, 0, true, backend)
// Forward pass
input := tensor.Zeros[float32](tensor.Shape{32, 1, 28, 28}, backend) // MNIST-like
output := conv.Forward(input) // [32, 6, 24, 24]
func NewConv2D ¶
func NewConv2D[B tensor.Backend]( inChannels, outChannels int, kernelH, kernelW int, stride, padding int, useBias bool, backend B, ) *Conv2D[B]
NewConv2D creates a new 2D convolutional layer with Xavier initialization.
Parameters:
- inChannels: Number of input channels
- outChannels: Number of output channels (number of filters)
- kernelH, kernelW: Kernel dimensions
- stride: Stride for convolution (commonly 1 or 2)
- padding: Zero padding to apply to input (commonly 0, 1, 2)
- useBias: Whether to include bias term
- backend: Backend for computation
Initialization:
- Weights: Xavier/Glorot uniform initialization
- Bias: Zeros
func (*Conv2D[B]) ComputeOutputSize ¶
ComputeOutputSize computes output spatial dimensions for given input size.
Returns: [out_height, out_width].
func (*Conv2D[B]) Forward ¶
Forward performs the forward pass.
Input: [batch, in_channels, height, width] Output: [batch, out_channels, out_h, out_w].
func (*Conv2D[B]) InChannels ¶
InChannels returns the number of input channels.
func (*Conv2D[B]) KernelSize ¶
KernelSize returns the kernel size [height, width].
func (*Conv2D[B]) OutChannels ¶
OutChannels returns the number of output channels.
func (*Conv2D[B]) Parameters ¶
Parameters returns all trainable parameters.
type CrossEntropyLoss ¶
CrossEntropyLoss computes cross-entropy loss for multi-class classification.
This implementation uses the LogSoftmax + NLLLoss decomposition for numerical stability, following modern best practices (PyTorch, Burn 2025).
Mathematical Formulation:
Loss = -log_probs[target] where log_probs = LogSoftmax(logits)
Gradient (Backward):
∂L/∂logits = Softmax(logits) - y_one_hot
Usage:
criterion := nn.NewCrossEntropyLoss[Backend](backend) logits := model.Forward(input) // [batch_size, num_classes] loss := criterion.Forward(logits, targets) // targets: [batch_size] (class indices)
Key Properties:
- Expects raw logits (unnormalized scores) as input
- Uses log-sum-exp trick for numerical stability
- Prevents overflow when logits > 88 (float32 limit)
- Prevents underflow when all logits are very negative
References:
- "Adam: A Method for Stochastic Optimization" (Kingma & Ba, 2014)
- PyTorch CrossEntropyLoss documentation
- Burn framework loss implementations
func NewCrossEntropyLoss ¶
func NewCrossEntropyLoss[B tensor.Backend](backend B) *CrossEntropyLoss[B]
NewCrossEntropyLoss creates a new cross-entropy loss function.
func (*CrossEntropyLoss[B]) Forward ¶
func (c *CrossEntropyLoss[B]) Forward( logits *tensor.Tensor[float32, B], targets *tensor.Tensor[int32, B], ) *tensor.Tensor[float32, B]
Forward computes cross-entropy loss.
Parameters:
- logits: Model predictions (unnormalized scores) with shape [batch_size, num_classes]
- targets: Ground truth class indices with shape [batch_size] (values in range [0, num_classes-1])
Returns:
- Scalar loss value (mean over batch)
Note: This is a simplified implementation for Phase 1 (MNIST proof-of-concept). Full autodiff support for Softmax/Log operations will be added in Phase 2.
func (*CrossEntropyLoss[B]) Parameters ¶
func (c *CrossEntropyLoss[B]) Parameters() []*Parameter[B]
Parameters returns an empty slice (loss functions have no trainable parameters).
type Linear ¶
Linear implements a fully connected (dense) layer.
Performs the transformation: y = x @ W.T + b where:
- x is the input tensor with shape [batch_size, in_features]
- W is the weight matrix with shape [out_features, in_features]
- b is the bias vector with shape [out_features]
- y is the output tensor with shape [batch_size, out_features]
Weights are initialized using Xavier/Glorot initialization. Biases are initialized to zeros.
Example:
backend := cpu.New()
layer := nn.NewLinear(784, 128, backend)
input := tensor.Randn[float32](tensor.Shape{32, 784}, backend) // batch_size=32
output := layer.Forward(input) // shape: [32, 128]
func NewLinear ¶
NewLinear creates a new Linear layer.
Weights are initialized using Xavier/Glorot uniform distribution. Biases are initialized to zeros.
Parameters:
- inFeatures: Number of input features
- outFeatures: Number of output features
- backend: Backend to use for tensor operations
Returns a new Linear layer.
func (*Linear[B]) Forward ¶
Forward computes the output of the linear layer.
Performs: y = x @ W.T + b
Input shape: [batch_size, in_features] Output shape: [batch_size, out_features]
Parameters:
- input: Input tensor with shape [batch_size, in_features]
Returns output tensor with shape [batch_size, out_features].
func (*Linear[B]) InFeatures ¶
InFeatures returns the number of input features.
func (*Linear[B]) OutFeatures ¶
OutFeatures returns the number of output features.
func (*Linear[B]) Parameters ¶
Parameters returns the trainable parameters of this layer.
Returns [weight, bias].
type MSELoss ¶
MSELoss computes Mean Squared Error loss.
Loss = mean((predictions - targets)²)
MSE is commonly used for regression tasks where the goal is to predict continuous values.
Example:
mse := nn.NewMSELoss[Backend]() predictions := model.Forward(input) loss := mse.Forward(predictions, targets)
func NewMSELoss ¶
NewMSELoss creates a new MSE loss function.
func (*MSELoss[B]) Forward ¶
func (m *MSELoss[B]) Forward(predictions, targets *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B]
Forward computes the MSE loss.
Loss = mean((predictions - targets)²)
Parameters:
- predictions: Model predictions with shape [batch_size, ...]
- targets: Ground truth targets with same shape as predictions
Returns a scalar loss value (shape [1] or []).
func (*MSELoss[B]) Parameters ¶
Parameters returns an empty slice (loss functions have no trainable parameters).
type MaxPool2D ¶
MaxPool2D is a 2D max pooling layer.
Max pooling reduces spatial dimensions by taking the maximum value in each non-overlapping window. Unlike Conv2D, MaxPool2D has no learnable parameters.
Input shape: [batch, channels, height, width] Output shape: [batch, channels, out_height, out_width]
Where:
out_height = (height - kernelSize) / stride + 1 out_width = (width - kernelSize) / stride + 1
Common configurations:
- 2x2 pool, stride=2: Reduces spatial dimensions by half (most common)
- 3x3 pool, stride=2: Aggressive downsampling
- 2x2 pool, stride=1: Overlapping pooling (less common)
Example:
// Create 2x2 max pooling with stride 2
pool := nn.NewMaxPool2D(2, 2, backend)
// Forward pass
input := tensor.Randn[float32](tensor.Shape{32, 64, 28, 28}, backend)
output := pool.Forward(input) // [32, 64, 14, 14]
func NewMaxPool2D ¶
NewMaxPool2D creates a new 2D max pooling layer.
Parameters:
- kernelSize: Size of pooling window (square)
- stride: Stride for pooling (typically same as kernelSize for non-overlapping)
- backend: Backend for computation
Common patterns:
- NewMaxPool2D(2, 2, backend): Standard 2x2 non-overlapping pooling
- NewMaxPool2D(3, 2, backend): Overlapping 3x3 pooling with stride 2
func (*MaxPool2D[B]) ComputeOutputSize ¶
ComputeOutputSize computes output spatial dimensions for given input size.
Returns: [out_height, out_width].
func (*MaxPool2D[B]) Forward ¶
Forward performs the forward pass.
Input: [batch, channels, height, width] Output: [batch, channels, out_height, out_width].
func (*MaxPool2D[B]) KernelSize ¶
KernelSize returns the pooling kernel size.
func (*MaxPool2D[B]) Parameters ¶
Parameters returns all trainable parameters (empty for MaxPool2D).
MaxPool2D has no learnable parameters, so this always returns an empty slice.
type Module ¶
type Module[B tensor.Backend] interface { // Forward computes the output of the module given an input tensor. // // The input tensor should have the appropriate shape for this module. // For example, Linear expects [batch_size, in_features]. // // Returns the output tensor with shape determined by the module type. Forward(input *tensor.Tensor[float32, B]) *tensor.Tensor[float32, B] // Parameters returns all trainable parameters of this module. // // This includes weights, biases, and any nested module parameters. // Returns an empty slice for modules without trainable parameters // (e.g., activation functions). Parameters() []*Parameter[B] }
Module is the base interface for all neural network components.
Every NN module must implement:
- Forward: Compute output from input
- Parameters: Return all trainable parameters
Modules can be composed to build complex architectures:
model := nn.Sequential[Backend](
nn.NewLinear(784, 128, backend),
nn.NewReLU(),
nn.NewLinear(128, 10, backend),
)
Type parameter B must satisfy the tensor.Backend interface.
type Parameter ¶
Parameter represents a trainable parameter in a neural network.
Parameters are tensors that require gradient computation during training. They typically represent weights and biases of layers.
Example:
// Create a weight parameter
weight := nn.NewParameter("weight", weightTensor)
// Access the tensor
w := weight.Tensor()
// Get gradient after backward pass
grad := weight.Grad()
func NewParameter ¶
NewParameter creates a new trainable parameter.
The parameter tensor should be initialized before creating the Parameter. Gradient will be allocated during the first backward pass.
Parameters:
- name: Descriptive name for this parameter (e.g., "linear1.weight")
- tensor: The initialized parameter tensor
Returns a new Parameter.
func (*Parameter[B]) Grad ¶
Grad returns the gradient tensor.
Returns nil if no gradient has been computed yet (before backward pass).
func (*Parameter[B]) SetGrad ¶
SetGrad sets the gradient tensor.
This is typically called by the optimizer or during backward pass.
type ReLU ¶
ReLU is a Rectified Linear Unit activation module.
Applies the element-wise function: f(x) = max(0, x)
ReLU is the most commonly used activation function in deep learning. It helps with the vanishing gradient problem and is computationally efficient.
Example:
relu := nn.NewReLU[Backend]() output := relu.Forward(input) // All negative values become 0
func (*ReLU[B]) Parameters ¶
Parameters returns an empty slice (ReLU has no trainable parameters).
type ReLUBackend ¶
ReLUBackend is an interface for backends that support ReLU activation.
type Sequential ¶
Sequential is a container module that chains multiple modules together.
Each module's output becomes the next module's input, creating a sequential pipeline of transformations.
Example:
model := nn.NewSequential(
nn.NewLinear(784, 128, backend),
nn.NewReLU(),
nn.NewLinear(128, 10, backend),
)
output := model.Forward(input)
This is equivalent to:
h1 := linear1.Forward(input) h2 := relu.Forward(h1) output := linear2.Forward(h2)
func NewSequential ¶
func NewSequential[B tensor.Backend](modules ...Module[B]) *Sequential[B]
NewSequential creates a new Sequential container.
Parameters:
- modules: List of modules to chain together
Returns a new Sequential container.
func (*Sequential[B]) Add ¶
func (s *Sequential[B]) Add(module Module[B])
Add appends a module to the sequence.
This allows building models incrementally:
model := nn.NewSequential[Backend]() model.Add(nn.NewLinear(784, 128, backend)) model.Add(nn.NewReLU()) model.Add(nn.NewLinear(128, 10, backend))
func (*Sequential[B]) Forward ¶
Forward applies all modules in sequence.
The output of each module becomes the input to the next module.
Parameters:
- input: Input tensor to the first module
Returns the output of the last module.
func (*Sequential[B]) Len ¶
func (s *Sequential[B]) Len() int
Len returns the number of modules in the sequence.
func (*Sequential[B]) Module ¶
func (s *Sequential[B]) Module(index int) Module[B]
Module returns the module at the given index.
Panics if index is out of bounds.
func (*Sequential[B]) Parameters ¶
func (s *Sequential[B]) Parameters() []*Parameter[B]
Parameters returns all trainable parameters from all modules.
Parameters are collected from all modules in the sequence.
type Sigmoid ¶
Sigmoid is a sigmoid activation module.
Applies the element-wise function: σ(x) = 1 / (1 + exp(-x))
Sigmoid squashes values to the range (0, 1), making it useful for binary classification and gate mechanisms in LSTMs/GRUs.
Example:
sigmoid := nn.NewSigmoid[Backend]() output := sigmoid.Forward(input) // Values in range (0, 1)
func NewSigmoid ¶
NewSigmoid creates a new Sigmoid activation module.
func (*Sigmoid[B]) Parameters ¶
Parameters returns an empty slice (Sigmoid has no trainable parameters).
type SigmoidBackend ¶
SigmoidBackend is an interface for backends that support Sigmoid activation.
type Tanh ¶
Tanh is a hyperbolic tangent activation module.
Applies the element-wise function: tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
Tanh squashes values to the range (-1, 1), making it zero-centered which can help with training. Often used in RNNs.
Example:
tanh := nn.NewTanh[Backend]() output := tanh.Forward(input) // Values in range (-1, 1)
func (*Tanh[B]) Parameters ¶
Parameters returns an empty slice (Tanh has no trainable parameters).