Documentation
¶
Overview ¶
Package optim implements optimization algorithms for training neural networks.
This package provides:
- Optimizer interface: Base interface for all optimizers
- SGD: Stochastic Gradient Descent with momentum
- Adam: Adaptive Moment Estimation
Design inspired by PyTorch's torch.optim but adapted for Go with type safety.
Example usage:
// Create optimizer
optimizer := optim.NewAdam(model.Parameters(), optim.AdamConfig{
LR: 0.001,
})
// Training loop
for epoch := range epochs {
loss := computeLoss(model, data)
// Compute gradients
backend.Tape().StartRecording()
output := model.Forward(input)
loss := lossFunc.Forward(output, targets)
grads := autodiff.Backward(loss, backend)
// Update parameters
optimizer.Step(grads)
optimizer.ZeroGrad()
}
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Adam ¶
Adam implements the Adam (Adaptive Moment Estimation) optimizer.
Adam combines ideas from RMSprop and momentum:
- Maintains exponential moving averages of gradients (first moment)
- Maintains exponential moving averages of squared gradients (second moment)
- Applies bias correction to compensate for initialization at zero
Update rule:
m_t = beta1 * m_{t-1} + (1-beta1) * gradient // First moment
v_t = beta2 * v_{t-1} + (1-beta2) * gradient² // Second moment
m_hat = m_t / (1 - beta1^t) // Bias correction
v_hat = v_t / (1 - beta2^t) // Bias correction
param = param - lr * m_hat / (sqrt(v_hat) + eps) // Parameter update
Adam is particularly well-suited for:
- Large datasets and high-dimensional parameter spaces
- Non-stationary objectives and sparse gradients
- Problems with very noisy/sparse gradients
Reference: "Adam: A Method for Stochastic Optimization" (Kingma & Ba, 2014)
Example:
optimizer := optim.NewAdam(model.Parameters(), optim.AdamConfig{
LR: 0.001,
Betas: [2]float32{0.9, 0.999},
Eps: 1e-8,
})
for epoch := range epochs {
loss := train_step(model, batch)
grads := autodiff.Backward(loss, backend)
optimizer.Step(grads)
optimizer.ZeroGrad()
}
func NewAdam ¶
NewAdam creates a new Adam optimizer.
Parameters:
- params: Model parameters to optimize
- config: Adam configuration (LR, Betas, Eps)
Returns a new Adam optimizer with default hyperparameters if not specified.
Default hyperparameters:
- LR: 0.001
- Beta1: 0.9
- Beta2: 0.999
- Eps: 1e-8
func (*Adam[B]) GetTimestep ¶
GetTimestep returns the current timestep.
Useful for monitoring optimizer state.
func (*Adam[B]) SetLR ¶
SetLR updates the learning rate.
Useful for learning rate scheduling during training.
type AdamConfig ¶
type AdamConfig struct {
LR float32 // Learning rate (default: 0.001)
Betas [2]float32 // Coefficients for computing running averages (default: [0.9, 0.999])
Eps float32 // Term for numerical stability (default: 1e-8)
}
AdamConfig holds configuration for Adam optimizer.
type Config ¶
type Config struct {
LR float32 // Learning rate
}
Config is the base configuration for all optimizers.
type Optimizer ¶
type Optimizer interface {
// Step applies gradient updates to all parameters.
//
// Takes a gradient map from Backward() and updates parameters in-place.
// The gradient map should contain RawTensor -> gradient mapping.
//
// Example:
// grads := autodiff.Backward(loss, backend)
// optimizer.Step(grads)
Step(grads map[*tensor.RawTensor]*tensor.RawTensor)
// ZeroGrad clears all parameter gradients.
//
// This should be called before each backward pass to prevent
// gradient accumulation from previous iterations.
//
// Example:
// optimizer.ZeroGrad()
// loss := model.Forward(...)
// grads := autodiff.Backward(loss, backend)
ZeroGrad()
// GetLR returns the current learning rate.
//
// Useful for monitoring and learning rate scheduling.
GetLR() float32
}
Optimizer is the base interface for all optimization algorithms.
Optimizers update model parameters based on computed gradients to minimize the loss function during training.
All optimizers must implement:
- Step: Apply gradient updates to parameters
- ZeroGrad: Clear gradients before next iteration
- GetLR: Get current learning rate (for monitoring/scheduling)
type SGD ¶
SGD implements Stochastic Gradient Descent optimizer with optional momentum.
Update rule without momentum:
param = param - lr * gradient
Update rule with momentum:
velocity = momentum * velocity + gradient param = param - lr * velocity
Momentum helps accelerate SGD in relevant directions and dampens oscillations.
Example:
optimizer := optim.NewSGD(model.Parameters(), optim.SGDConfig{
LR: 0.01,
Momentum: 0.9,
})
for epoch := range epochs {
loss := train_step(model, batch)
grads := autodiff.Backward(loss, backend)
optimizer.Step(grads)
optimizer.ZeroGrad()
}
func NewSGD ¶
NewSGD creates a new SGD optimizer.
Parameters:
- params: Model parameters to optimize
- config: SGD configuration (LR, Momentum)
Returns a new SGD optimizer.
Example:
sgd := optim.NewSGD(model.Parameters(), optim.SGDConfig{
LR: 0.01,
Momentum: 0.9,
})
func (*SGD[B]) SetLR ¶
SetLR updates the learning rate.
Useful for learning rate scheduling during training.