Documentation
¶
Overview ¶
Package nn provides neural network layers and building blocks.
Overview ¶
This package contains:
- Layers: Linear, Conv2D, MaxPool2D
- Activations: ReLU, Sigmoid, Tanh
- Loss functions: CrossEntropyLoss, MSELoss
- Utilities: Sequential, Module interface, Parameter
- Initialization: Xavier, Zeros, Ones, Randn
Basic Usage ¶
import (
"github.com/born-ml/born/nn"
"github.com/born-ml/born/backend/cpu"
)
func main() {
backend := cpu.New()
// Build a simple MLP
model := nn.NewSequential(
nn.NewLinear(784, 128, backend),
nn.NewReLU(),
nn.NewLinear(128, 10, backend),
)
// Forward pass
output := model.Forward(input)
}
Layers ¶
Linear: Fully connected layer with Xavier initialization
layer := nn.NewLinear(inFeatures, outFeatures, backend)
Conv2D: 2D convolutional layer with im2col algorithm
conv := nn.NewConv2D(inChannels, outChannels, kernelSize, stride, padding, backend)
MaxPool2D: 2D max pooling layer
pool := nn.NewMaxPool2D(kernelSize, stride, backend)
Activations ¶
Common activation functions:
relu := nn.NewReLU() sigmoid := nn.NewSigmoid() tanh := nn.NewTanh()
Loss Functions ¶
CrossEntropyLoss: For classification tasks (numerically stable)
criterion := nn.NewCrossEntropyLoss(backend) loss := criterion.Forward(logits, labels)
MSELoss: For regression tasks
criterion := nn.NewMSELoss(backend) loss := criterion.Forward(predictions, targets)
Sequential Models ¶
Build models by composing layers:
model := nn.NewSequential(
nn.NewLinear(784, 256, backend),
nn.NewReLU(),
nn.NewLinear(256, 128, backend),
nn.NewReLU(),
nn.NewLinear(128, 10, backend),
)
Parameter Management ¶
Access model parameters for optimization:
params := model.Parameters()
for _, param := range params {
fmt.Println(param.Name(), param.Tensor().Shape())
}
Package nn provides public wrappers for positional encodings.
Index ¶
- func Accuracy[B tensor.Backend](logits *tensor.Tensor[float32, B], targets *tensor.Tensor[int32, B]) float32
- func CausalMask[B tensor.Backend](seqLen int, backend B) *tensor.Tensor[float32, B]
- func CrossEntropyBackward[B tensor.Backend](logits *tensor.Tensor[float32, B], targets *tensor.Tensor[int32, B], backend B) *tensor.Tensor[float32, B]
- func Ones[B tensor.Backend](shape tensor.Shape, backend B) *tensor.Tensor[float32, B]
- func Randn[B tensor.Backend](shape tensor.Shape, backend B) *tensor.Tensor[float32, B]
- func ScaledDotProductAttention[B tensor.Backend](query, key, value *tensor.Tensor[float32, B], mask *tensor.Tensor[float32, B], ...) (*tensor.Tensor[float32, B], *tensor.Tensor[float32, B])
- func Xavier[B tensor.Backend](fanIn, fanOut int, shape tensor.Shape, backend B) *tensor.Tensor[float32, B]
- func Zeros[B tensor.Backend](shape tensor.Shape, backend B) *tensor.Tensor[float32, B]
- type ALiBi
- type Conv2D
- type CrossEntropyLoss
- type Embedding
- type FFN
- type KVCache
- type LayerNorm
- type LearnedPositionalEmbedding
- type Linear
- type MSELoss
- type MaxPool2D
- type Module
- type MultiHeadAttention
- type Parameter
- type RMSNorm
- type ReLU
- type RotaryEncoding
- type RotaryEncodingConfig
- type Sequential
- type SiLU
- type Sigmoid
- type SinusoidalPositionalEncoding
- type Tanh
- type TransformerBlock
- type TransformerConfig
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Accuracy ¶
func Accuracy[B tensor.Backend]( logits *tensor.Tensor[float32, B], targets *tensor.Tensor[int32, B], ) float32
Accuracy computes the classification accuracy.
Example:
acc := nn.Accuracy(predictions, labels)
fmt.Printf("Accuracy: %.2f%%\n", acc*100)
func CausalMask ¶ added in v0.4.0
CausalMask creates a causal (autoregressive) attention mask.
In causal attention, each position can only attend to earlier positions. This is used in autoregressive models like GPT.
Returns a mask tensor where future positions are masked with -inf. Shape: [1, 1, seq_len, seq_len] (broadcastable to [batch, heads, seq, seq])
Example:
mask := nn.CausalMask(10, backend) // [1, 1, 10, 10] output, weights := nn.ScaledDotProductAttention(Q, K, V, mask, 0)
func CrossEntropyBackward ¶
func CrossEntropyBackward[B tensor.Backend]( logits *tensor.Tensor[float32, B], targets *tensor.Tensor[int32, B], backend B, ) *tensor.Tensor[float32, B]
CrossEntropyBackward computes the backward pass for cross-entropy loss.
func Ones ¶
Ones initializes a tensor with ones.
Example:
backend := cpu.New()
weights := nn.Ones(tensor.Shape{128, 784}, backend)
func Randn ¶
Randn initializes a tensor with random values from N(0, 1).
Example:
backend := cpu.New()
weights := nn.Randn(tensor.Shape{128, 784}, backend)
func ScaledDotProductAttention ¶ added in v0.4.0
func ScaledDotProductAttention[B tensor.Backend]( query, key, value *tensor.Tensor[float32, B], mask *tensor.Tensor[float32, B], scale float32, ) (*tensor.Tensor[float32, B], *tensor.Tensor[float32, B])
ScaledDotProductAttention computes attention scores using the scaled dot-product mechanism.
This is the core attention mechanism used in transformers.
Parameters:
- query: Query tensor [batch, heads, seq_q, head_dim]
- key: Key tensor [batch, heads, seq_k, head_dim]
- value: Value tensor [batch, heads, seq_k, head_dim]
- mask: Optional attention mask [batch, 1, seq_q, seq_k] or nil (additive mask, -inf for masked)
- scale: Scaling factor (0 for auto-compute as 1/sqrt(head_dim))
Returns:
- output: Attended values [batch, heads, seq_q, head_dim]
- weights: Attention weights [batch, heads, seq_q, seq_k]
Example:
Q := tensor.Randn[float32](tensor.Shape{2, 8, 10, 64}, backend)
K := tensor.Randn[float32](tensor.Shape{2, 8, 10, 64}, backend)
V := tensor.Randn[float32](tensor.Shape{2, 8, 10, 64}, backend)
output, weights := nn.ScaledDotProductAttention(Q, K, V, nil, 0)
Types ¶
type ALiBi ¶ added in v0.4.0
ALiBi implements Attention with Linear Biases.
ALiBi adds a linear bias to attention scores based on the distance between positions. Used in BLOOM, MPT, and other models. Allows extrapolation to longer sequences.
Example:
backend := cpu.New() alibi := nn.NewALiBi(8, backend) // 8 attention heads bias := alibi.GetBias(128) // [1, 8, 128, 128] // In attention: scores := Q.BatchMatMul(K.T()) scores = scores.Add(bias) weights := scores.Softmax(-1)
func NewALiBi ¶ added in v0.4.0
NewALiBi creates a new ALiBi bias generator.
Computes slopes for each attention head using a geometric sequence.
Parameters:
- numHeads: Number of attention heads
- backend: Computation backend
Example:
alibi := nn.NewALiBi(8, backend) bias := alibi.GetBias(64) // Get bias for sequence length 64
type Conv2D ¶
Conv2D represents a 2D convolutional layer.
func NewConv2D ¶
func NewConv2D[B tensor.Backend]( inChannels, outChannels int, kernelH, kernelW int, stride, padding int, useBias bool, backend B, ) *Conv2D[B]
NewConv2D creates a new 2D convolutional layer.
Example:
backend := cpu.New() conv := nn.NewConv2D(1, 32, 3, 3, 1, 1, true, backend) // in_channels=1, out_channels=32, kernel=3x3, stride=1, padding=1, useBias=true
type CrossEntropyLoss ¶
type CrossEntropyLoss[B tensor.Backend] = nn.CrossEntropyLoss[B]
CrossEntropyLoss represents the cross-entropy loss for classification.
func NewCrossEntropyLoss ¶
func NewCrossEntropyLoss[B tensor.Backend](backend B) *CrossEntropyLoss[B]
NewCrossEntropyLoss creates a new cross-entropy loss function.
Example:
backend := cpu.New() criterion := nn.NewCrossEntropyLoss(backend) loss := criterion.Forward(logits, labels)
type Embedding ¶ added in v0.3.0
Embedding represents a lookup table for embeddings.
func NewEmbedding ¶ added in v0.3.0
NewEmbedding creates a new embedding layer.
Example:
backend := cpu.New()
embed := nn.NewEmbedding[B](50000, 768, backend) // vocab=50000, dim=768
tokenIds := tensor.FromSlice([]int32{1, 5, 10}, tensor.Shape{1, 3}, backend)
embeddings := embed.Forward(tokenIds) // [1, 3, 768]
type FFN ¶ added in v0.4.0
FFN (Feed-Forward Network) is a 2-layer MLP with SiLU activation.
Architecture:
FFN(x) = Linear2(SiLU(Linear1(x)))
Used inside TransformerBlock.
type KVCache ¶ added in v0.4.0
KVCache is a public alias for internal KV cache implementation.
KVCache stores key-value pairs for efficient autoregressive generation. See internal/nn/kvcache.go for detailed documentation.
type LayerNorm ¶ added in v0.4.0
LayerNorm represents Layer Normalization.
func NewLayerNorm ¶ added in v0.4.0
NewLayerNorm creates a new LayerNorm layer.
Example:
backend := cpu.New() norm := nn.NewLayerNorm[B](768, 1e-5, backend) output := norm.Forward(input) // [..., 768] -> [..., 768]
type LearnedPositionalEmbedding ¶ added in v0.4.0
type LearnedPositionalEmbedding[B tensor.Backend] = nn.LearnedPositionalEmbedding[B]
LearnedPositionalEmbedding implements learned positional embeddings.
These embeddings are trainable parameters that are updated during training. Used in GPT-2 and other models.
Example:
backend := cpu.New() pe := nn.NewLearnedPositionalEmbedding(512, 256, backend) encodings := pe.Forward(100) // [1, 100, 256] // Get parameters for optimizer params := pe.Parameters()
func NewLearnedPositionalEmbedding ¶ added in v0.4.0
func NewLearnedPositionalEmbedding[B tensor.Backend](maxLen, dim int, backend B) *LearnedPositionalEmbedding[B]
NewLearnedPositionalEmbedding creates a new learned positional embedding layer.
The embeddings are initialized from a normal distribution N(0, 1).
Parameters:
- maxLen: Maximum sequence length
- dim: Embedding dimension
- backend: Computation backend
Example:
pe := nn.NewLearnedPositionalEmbedding(512, 256, backend)
type MSELoss ¶
MSELoss represents the mean squared error loss for regression.
func NewMSELoss ¶
NewMSELoss creates a new MSE loss function.
Example:
backend := cpu.New() criterion := nn.NewMSELoss(backend) loss := criterion.Forward(predictions, targets)
type MultiHeadAttention ¶ added in v0.4.0
type MultiHeadAttention[B tensor.Backend] = nn.MultiHeadAttention[B]
MultiHeadAttention represents the multi-head attention mechanism.
func NewMultiHeadAttention ¶ added in v0.4.0
func NewMultiHeadAttention[B tensor.Backend](embedDim, numHeads int, backend B) *MultiHeadAttention[B]
NewMultiHeadAttention creates a new multi-head attention module.
Parameters:
- embedDim: Total embedding dimension (must be divisible by numHeads)
- numHeads: Number of attention heads
- backend: Computation backend
Example:
backend := cpu.New() mha := nn.NewMultiHeadAttention[B](768, 12, backend) // BERT-base config output := mha.Forward(x, x, x, nil) // Self-attention
type RotaryEncoding ¶ added in v0.4.0
type RotaryEncoding[B tensor.Backend] = nn.RotaryEncoding[B]
RotaryEncoding implements Rotary Position Embedding (RoPE).
RoPE is used in modern LLMs like LLaMA, Mistral, DeepSeek, and Qwen. It applies a rotation to query and key embeddings based on their position.
Example:
backend := cpu.New()
config := nn.RotaryEncodingConfig{
DModel: 64,
MaxSeqLen: 2048,
Theta: 10000.0,
}
rope := nn.NewRotaryEncoding(config, backend)
// Apply to attention queries/keys
q := tensor.Randn[float32](tensor.Shape{batch, heads, seq, 64}, backend)
q_rotated := rope.Forward(q)
func NewRotaryEncoding ¶ added in v0.4.0
func NewRotaryEncoding[B tensor.Backend](cfg RotaryEncodingConfig, backend B) *RotaryEncoding[B]
NewRotaryEncoding creates a new RoPE (Rotary Position Embedding) layer.
Pre-computes cosine and sine values for all positions and dimension pairs.
Parameters:
- cfg: Configuration (DModel, MaxSeqLen, Theta)
- backend: Computation backend
Example:
config := nn.RotaryEncodingConfig{
DModel: 64, // Head dimension
MaxSeqLen: 2048, // Max sequence length
Theta: 10000.0,
}
rope := nn.NewRotaryEncoding(config, backend)
type RotaryEncodingConfig ¶ added in v0.4.0
type RotaryEncodingConfig = nn.RotaryEncodingConfig
RotaryEncodingConfig configures a RotaryEncoding layer.
type Sequential ¶
type Sequential[B tensor.Backend] = nn.Sequential[B]
Sequential represents a sequential container of modules.
func NewSequential ¶
func NewSequential[B tensor.Backend](modules ...Module[B]) *Sequential[B]
NewSequential creates a new sequential model.
Example:
backend := cpu.New()
model := nn.NewSequential(
nn.NewLinear(784, 128, backend),
nn.NewReLU(),
nn.NewLinear(128, 10, backend),
)
type SiLU ¶ added in v0.3.0
SiLU represents the Sigmoid Linear Unit (SiLU/Swish) activation function. SiLU(x) = x * sigmoid(x).
type Sigmoid ¶
Sigmoid represents the Sigmoid activation function.
func NewSigmoid ¶
NewSigmoid creates a new Sigmoid activation layer.
Example:
sigmoid := nn.NewSigmoid()
type SinusoidalPositionalEncoding ¶ added in v0.4.0
type SinusoidalPositionalEncoding[B tensor.Backend] = nn.SinusoidalPositionalEncoding[B]
SinusoidalPositionalEncoding implements fixed sinusoidal positional encodings.
This is the original positional encoding from "Attention is All You Need" (Vaswani et al., 2017).
Example:
backend := cpu.New() pe := nn.NewSinusoidalPositionalEncoding(512, 256, backend) encodings := pe.Forward(100) // [1, 100, 256] // Add to embeddings embeddings := embeddings.Add(encodings)
func NewSinusoidalPositionalEncoding ¶ added in v0.4.0
func NewSinusoidalPositionalEncoding[B tensor.Backend](maxLen, dim int, backend B) *SinusoidalPositionalEncoding[B]
NewSinusoidalPositionalEncoding creates a new sinusoidal positional encoding layer.
Pre-computes all positional encodings up to maxLen using sine and cosine functions.
Parameters:
- maxLen: Maximum sequence length
- dim: Embedding dimension
- backend: Computation backend
Example:
pe := nn.NewSinusoidalPositionalEncoding(512, 256, backend)
type TransformerBlock ¶ added in v0.4.0
type TransformerBlock[B tensor.Backend] = nn.TransformerBlock[B]
TransformerBlock is a complete Transformer Block with attention and FFN.
Architecture (Pre-Norm):
x → Norm → MHA → + → Norm → FFN → + → output
↑_______| ↑_______|
(residual) (residual)
Used in all transformer models (GPT, BERT, LLaMA, etc.)
func NewTransformerBlock ¶ added in v0.4.0
func NewTransformerBlock[B tensor.Backend](config TransformerConfig, backend B) *TransformerBlock[B]
NewTransformerBlock creates a new Transformer Block.
Parameters:
- config: Configuration (embedDim, numHeads, ffnDim, etc.)
- backend: Computation backend
Example:
backend := autodiff.New(cpu.New())
config := nn.TransformerConfig{
EmbedDim: 768,
NumHeads: 12,
FFNDim: 3072,
NormFirst: true,
UseRMSNorm: true,
NormEps: 1e-5,
}
block := nn.NewTransformerBlock(config, backend)
output := block.Forward(x, mask)
type TransformerConfig ¶ added in v0.4.0
type TransformerConfig = nn.TransformerConfig
TransformerConfig defines the configuration for a Transformer Block.
Fields:
- EmbedDim: Embedding dimension (d_model, e.g., 768 for GPT-2)
- NumHeads: Number of attention heads (e.g., 12 for GPT-2)
- FFNDim: FFN hidden dimension (typically 4 * EmbedDim)
- Dropout: Dropout rate (0 = no dropout, not yet implemented)
- NormFirst: true = Pre-Norm (LLaMA), false = Post-Norm (original)
- UseRMSNorm: true = RMSNorm (LLaMA), false = LayerNorm (BERT/GPT)
- NormEps: Normalization epsilon (1e-5 typical)
Example:
config := nn.TransformerConfig{
EmbedDim: 768,
NumHeads: 12,
FFNDim: 3072,
NormFirst: true,
UseRMSNorm: true,
NormEps: 1e-5,
}