Born - Production-Ready ML for Go

"Models are born production-ready"
Born is a modern deep learning framework for Go, inspired by Burn (Rust). Build ML models in pure Go and deploy as single binaries - no Python runtime, no complex dependencies.
Project Status: π v0.3.0 Released! (Transformer Primitives - LLaMA/GPT support!)
Latest: β‘ Phase 2.5 complete - Modern LLM architectures now supported
Pure Go ML with GPU acceleration - no CGO required!
Why Born?
The Problem
Deploying ML models is hard:
- Python runtime required
- Complex dependency management
- Large Docker images
- Slow startup times
- Integration friction with Go backends
The Born Solution
import "github.com/born-ml/born"
// Models "born" ready for production
model := born.Load("resnet50.born")
prediction := model.Predict(image)
// That's it. No Python. No containers. Just Go.
Benefits:
- Single binary deployment
- Fast startup (< 100ms)
- Small memory footprint
- Native Go integration
- Cross-platform out of the box
Features
- Pure Go - No CGO dependencies, trivial cross-compilation
- Type Safe - Generics-powered API for compile-time guarantees
- GPU Acceleration - WebGPU backend (zero-CGO, 123x speedup)
- Transformer Ready - Full support for LLaMA, GPT, Mistral architectures
- Autodiff - Automatic differentiation via decorators
- Modern Layers - RMSNorm, SiLU, Embedding, Multi-head Attention (v0.4)
- Production Ready - Single binary deployment, fast startup
- WebAssembly - Run inference in browsers natively
Quick Start
Installation
# Clone repository
git clone https://github.com/born-ml/born.git
cd born
# Build
make build
# Or install CLI
make install
Development Setup
Requirements:
- Go 1.25+
- Make (optional, but recommended)
- golangci-lint (for linting)
Build:
make build # Build all binaries
make test # Run tests
make lint # Run linter
make bench # Run benchmarks
Example: MNIST Classification
Working example included! See examples/mnist/ for complete implementation.
package main
import (
"github.com/born-ml/born/autodiff"
"github.com/born-ml/born/backend/cpu"
"github.com/born-ml/born/nn"
"github.com/born-ml/born/optim"
)
func main() {
// Create backend with autodiff
backend := autodiff.New(cpu.New())
// Define model (784 β 128 β 10)
model := NewMNISTNet(backend)
// Create loss and optimizer
criterion := nn.NewCrossEntropyLoss(backend)
optimizer := optim.NewAdam(model.Parameters(), optim.AdamConfig{
LR: 0.001,
Betas: [2]float32{0.9, 0.999},
}, backend)
// Training loop
for epoch := range 10 {
// Forward pass
logits := model.Forward(batch.ImagesTensor)
loss := criterion.Forward(logits, batch.LabelsTensor)
// Backward pass
optimizer.ZeroGrad()
grads := backend.Backward(loss.Raw())
optimizer.Step(grads)
// Log progress
acc := nn.Accuracy(logits, batch.LabelsTensor)
fmt.Printf("Epoch %d: Loss=%.4f, Accuracy=%.2f%%\n",
epoch, loss.Raw().AsFloat32()[0], acc*100)
}
}
Run it: cd examples/mnist && go run .
Core Features:
- β
Tensor operations (Add, MatMul, Reshape, Exp, Sqrt, Cat, etc.)
- β
31 type-safe public API operations (MulScalar, Greater, Softmax, Int32, etc.)
- β
Automatic differentiation with gradient tape
- β
Neural network modules (Linear, Conv2D, ReLU, SiLU, RMSNorm, Embedding)
- β
Optimizers (SGD with momentum, Adam with bias correction)
- β
Losses (CrossEntropyLoss with numerical stability)
- β
GPU acceleration (WebGPU - 123x speedup)
- β
Transformer primitives (for LLaMA, GPT, Mistral architectures)
Architecture
Backend Abstraction
Born uses a backend interface for device independence:
type Backend interface {
Add(a, b *RawTensor) *RawTensor
MatMul(a, b *RawTensor) *RawTensor
// ... other operations
}
Available Backends:
| Backend |
Status |
Description |
| CPU |
β
Available |
Pure Go implementation (v0.1.1) |
| WebGPU |
β
Available |
Zero-CGO GPU via go-webgpu (v0.2.0) |
| Vulkan |
π Q3 2025 |
Cross-platform GPU compute |
| CUDA |
π Q3 2025 |
NVIDIA GPU via zero-CGO |
| Metal |
π Q4 2025 |
Apple GPU (macOS/iOS) |
Decorator Pattern
Functionality composed via decorators (inspired by Burn):
// Basic backend
base := cpu.New()
// Add autodiff
withAutodiff := autodiff.New(base)
// Add kernel fusion
optimized := fusion.New(withAutodiff)
// Your code works with any backend!
model := createModel(optimized)
Type Safety with Generics
type Tensor[T DType, B Backend] struct {
raw *RawTensor
backend B
}
// Compile-time type checking
func (t *Tensor[float32, B]) MatMul(other *Tensor[float32, B]) *Tensor[float32, B]
Roadmap
Phase 1: Core (v0.1) - β
COMPLETE (Nov 2025)
- Tensor API with generics
- CPU backend (pure Go)
- Autodiff decorator with gradient tape
- NN modules (Linear, ReLU, Sigmoid, Tanh, Sequential)
- SGD/Adam optimizers with momentum/bias correction
- CrossEntropyLoss with numerical stability
- MNIST classification example
Status: All 7 core tasks complete. 132 unit tests, 83.8% average coverage, 0 linter issues.
Phase 2: GPU Backends (v0.2) - β
COMPLETE (Nov 2025)
- WebGPU backend (zero-CGO via go-webgpu)
- WGSL compute shaders (12 operations)
- GPU buffer pooling & memory management
- MNIST GPU inference (10.9x speedup)
Status: All 5 GPU tasks complete. 123x MatMul speedup, ~16000 samples/sec throughput.
- Math operations (Exp, Sqrt, Rsqrt, Cos, Sin, Log)
- Reductions (SumDim, MeanDim with keepDim, Sum, Argmax)
- Tensor manipulation (Cat, Chunk, Unsqueeze, Squeeze, Expand)
- Indexing (Gather, Where)
- Modern layers (SiLU, RMSNorm, Embedding, Softmax)
- Gradient control (NoGrad, Detach)
- 31 public API operations (MulScalar, Greater/Gt, Int32, etc.)
Status: All 7 tasks complete. 112 new tests, 0 linter issues. LLaMA/GPT/Mistral architectures now supported!
Phase 3: Attention Mechanisms (v0.4) - Q1 2026
- Multi-head attention (MHA)
- Scaled dot-product attention
- KV-cache for inference
- Layer normalization variants
- Linux/macOS WebGPU support
- ONNX import/export
- Model quantization (INT8, FP16)
- Pre-trained model loading
Long-Term: v1.0 LTS - 2027-2028
- Training utilities (BatchNorm, Dropout)
- Distributed training
- Advanced optimizations
- Model zoo
Full roadmap: See ROADMAP.md
Documentation
For Users
For Contributors
Philosophy
"Born Ready"
Models trained anywhere (PyTorch, TensorFlow) are imported and born production-ready:
Training β Birth β Production
(Burn) (Born) (Run)
PyTorch trains β Born imports β Born deploys
TensorFlow trains β Born imports β Born deploys
Born trains β Born ready β Born serves
Production First
- Single Binary: Entire model in one executable
- No Runtime: No Python, no dependencies
- Fast Startup: < 100ms cold start
- Small Memory: Minimal footprint
- Cloud Native: Natural fit for Go services
Developer Experience
- Type Safe: Catch errors at compile time
- Clean API: Intuitive and ergonomic
- Great Docs: Comprehensive documentation
- Easy Deploy:
go build and you're done
Actual Benchmarks (AMD Ryzen 9 5950X, NVIDIA RTX 3080):
Matrix Operations (WebGPU vs CPU)
| Operation |
CPU |
GPU |
Speedup |
| MatMul 1024x1024 |
7143ms |
58ms |
123x |
| MatMul 512x512 |
499ms |
12ms |
41x |
| MatMul 256x256 |
56ms |
3.7ms |
15x |
Neural Network Inference
| Batch Size |
CPU |
GPU |
Speedup |
Throughput |
| 64 |
48ms |
19ms |
2.5x |
3,357/s |
| 256 |
182ms |
21ms |
8.5x |
11,883/s |
| 512 |
348ms |
32ms |
10.9x |
15,973/s |
Note: CPU backend uses naive O(nΒ³) MatMul. SIMD optimizations planned for future releases.
Inspiration
Born is inspired by and learns from:
- Burn - Architecture patterns, decorator design
- PyTorch - API ergonomics
- TinyGrad - Simplicity principles
- Gonum - Go numerical computing
- HDF5 for Go - Model serialization, dataset storage (planned)
Acknowledgments
Special thanks to the projects that made Born possible:
Born's GPU acceleration is powered by go-webgpu - a remarkable pure Go binding for WebGPU via wgpu-native.
Why go-webgpu is special:
- Zero CGO - Pure Go bindings using goffi for FFI
- Cross-platform - Works on Windows, Linux, macOS
- Modern API - Clean, idiomatic Go interface to WebGPU
- Active development - Maintained and improving
Without go-webgpu, Born would need CGO for GPU support, making cross-compilation complex and defeating our "pure Go" goal. This library enables us to offer production-ready GPU acceleration while maintaining the simplicity of go build.
Thank you to Alfred Dobra and all contributors!
Project is in early development. Star the repo to follow progress!
License
Licensed under the Apache License, Version 2.0.
Why Apache 2.0?
- β
Patent protection - Critical for ML algorithms and production use
- β
Enterprise-friendly - Clear legal framework for commercial adoption
- β
Industry standard - Same as TensorFlow, battle-tested in ML ecosystem
- β
Contributor protection - Explicit patent grant and termination clauses
See LICENSE file for full terms.
FAQ
Q: Why not use Gorgonia?
A: Gorgonia is great but uses a different approach. Born focuses on modern Go (generics), pure Go (no CGO), and production-first design inspired by Burn.
Q: When will it be ready?
A: Core features (v0.1-v0.3) are RELEASED! Includes CPU/GPU backends and transformer primitives. ONNX import targeted for v0.5.0 (Q2 2026).
Q: Can I use PyTorch models?
A: Yes! Via ONNX import (v0.5.0, Q2 2026). Train in PyTorch, deploy with Born.
Q: WebAssembly support?
A: Yes! Pure Go compiles to WASM natively. Inference in browsers out of the box.
Q: How can I help?
A: Watch this space! Contributing guide coming soon.