born

module

v0.3.0 Latest Latest Go to latest Published: Nov 30, 2025 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/born-ml/born

Links

Open Source Insights

README ¶

Born - Production-Ready ML for Go

"Models are born production-ready"

Born is a modern deep learning framework for Go, inspired by Burn (Rust). Build ML models in pure Go and deploy as single binaries - no Python runtime, no complex dependencies.

Project Status: 🎉 v0.3.0 Released! (Transformer Primitives - LLaMA/GPT support!) Latest: ⚡ Phase 2.5 complete - Modern LLM architectures now supported

Pure Go ML with GPU acceleration - no CGO required!

Why Born?

The Problem

Deploying ML models is hard:

Python runtime required
Complex dependency management
Large Docker images
Slow startup times
Integration friction with Go backends

The Born Solution

import "github.com/born-ml/born"

// Models "born" ready for production
model := born.Load("resnet50.born")
prediction := model.Predict(image)

// That's it. No Python. No containers. Just Go.

Benefits:

Single binary deployment
Fast startup (< 100ms)
Small memory footprint
Native Go integration
Cross-platform out of the box

Features

Pure Go - No CGO dependencies, trivial cross-compilation
Type Safe - Generics-powered API for compile-time guarantees
GPU Acceleration - WebGPU backend (zero-CGO, 123x speedup)
Transformer Ready - Full support for LLaMA, GPT, Mistral architectures
Autodiff - Automatic differentiation via decorators
Modern Layers - RMSNorm, SiLU, Embedding, Multi-head Attention (v0.4)
Production Ready - Single binary deployment, fast startup
WebAssembly - Run inference in browsers natively

Quick Start

Installation

# Clone repository
git clone https://github.com/born-ml/born.git
cd born

# Build
make build

# Or install CLI
make install

Development Setup

Requirements:

Go 1.25+
Make (optional, but recommended)
golangci-lint (for linting)

Build:

make build          # Build all binaries
make test           # Run tests
make lint           # Run linter
make bench          # Run benchmarks

Example: MNIST Classification

Working example included! See examples/mnist/ for complete implementation.

package main

import (
    "github.com/born-ml/born/autodiff"
    "github.com/born-ml/born/backend/cpu"
    "github.com/born-ml/born/nn"
    "github.com/born-ml/born/optim"
)

func main() {
    // Create backend with autodiff
    backend := autodiff.New(cpu.New())

    // Define model (784 → 128 → 10)
    model := NewMNISTNet(backend)

    // Create loss and optimizer
    criterion := nn.NewCrossEntropyLoss(backend)
    optimizer := optim.NewAdam(model.Parameters(), optim.AdamConfig{
        LR:    0.001,
        Betas: [2]float32{0.9, 0.999},
    }, backend)

    // Training loop
    for epoch := range 10 {
        // Forward pass
        logits := model.Forward(batch.ImagesTensor)
        loss := criterion.Forward(logits, batch.LabelsTensor)

        // Backward pass
        optimizer.ZeroGrad()
        grads := backend.Backward(loss.Raw())
        optimizer.Step(grads)

        // Log progress
        acc := nn.Accuracy(logits, batch.LabelsTensor)
        fmt.Printf("Epoch %d: Loss=%.4f, Accuracy=%.2f%%\n",
            epoch, loss.Raw().AsFloat32()[0], acc*100)
    }
}

Run it: cd examples/mnist && go run .

Core Features:

✅ Tensor operations (Add, MatMul, Reshape, Exp, Sqrt, Cat, etc.)
✅ 31 type-safe public API operations (MulScalar, Greater, Softmax, Int32, etc.)
✅ Automatic differentiation with gradient tape
✅ Neural network modules (Linear, Conv2D, ReLU, SiLU, RMSNorm, Embedding)
✅ Optimizers (SGD with momentum, Adam with bias correction)
✅ Losses (CrossEntropyLoss with numerical stability)
✅ GPU acceleration (WebGPU - 123x speedup)
✅ Transformer primitives (for LLaMA, GPT, Mistral architectures)

Architecture

Backend Abstraction

Born uses a backend interface for device independence:

type Backend interface {
    Add(a, b *RawTensor) *RawTensor
    MatMul(a, b *RawTensor) *RawTensor
    // ... other operations
}

Available Backends:

Backend	Status	Description
CPU	✅ Available	Pure Go implementation (v0.1.1)
WebGPU	✅ Available	Zero-CGO GPU via go-webgpu (v0.2.0)
Vulkan	📋 Q3 2025	Cross-platform GPU compute
CUDA	📋 Q3 2025	NVIDIA GPU via zero-CGO
Metal	📋 Q4 2025	Apple GPU (macOS/iOS)

Decorator Pattern

Functionality composed via decorators (inspired by Burn):

// Basic backend
base := cpu.New()

// Add autodiff
withAutodiff := autodiff.New(base)

// Add kernel fusion
optimized := fusion.New(withAutodiff)

// Your code works with any backend!
model := createModel(optimized)

Type Safety with Generics

type Tensor[T DType, B Backend] struct {
    raw     *RawTensor
    backend B
}

// Compile-time type checking
func (t *Tensor[float32, B]) MatMul(other *Tensor[float32, B]) *Tensor[float32, B]

Roadmap

Phase 1: Core (v0.1) - ✅ COMPLETE (Nov 2025)

Tensor API with generics
CPU backend (pure Go)
Autodiff decorator with gradient tape
NN modules (Linear, ReLU, Sigmoid, Tanh, Sequential)
SGD/Adam optimizers with momentum/bias correction
CrossEntropyLoss with numerical stability
MNIST classification example

Status: All 7 core tasks complete. 132 unit tests, 83.8% average coverage, 0 linter issues.

Phase 2: GPU Backends (v0.2) - ✅ COMPLETE (Nov 2025)

WebGPU backend (zero-CGO via go-webgpu)
WGSL compute shaders (12 operations)
GPU buffer pooling & memory management
MNIST GPU inference (10.9x speedup)

Status: All 5 GPU tasks complete. 123x MatMul speedup, ~16000 samples/sec throughput.

Phase 2.5: Transformer Primitives (v0.3) - ✅ COMPLETE (Nov 2025)

Math operations (Exp, Sqrt, Rsqrt, Cos, Sin, Log)
Reductions (SumDim, MeanDim with keepDim, Sum, Argmax)
Tensor manipulation (Cat, Chunk, Unsqueeze, Squeeze, Expand)
Indexing (Gather, Where)
Modern layers (SiLU, RMSNorm, Embedding, Softmax)
Gradient control (NoGrad, Detach)
31 public API operations (MulScalar, Greater/Gt, Int32, etc.)

Status: All 7 tasks complete. 112 new tests, 0 linter issues. LLaMA/GPT/Mistral architectures now supported!

Phase 3: Attention Mechanisms (v0.4) - Q1 2026

Multi-head attention (MHA)
Scaled dot-product attention
KV-cache for inference
Layer normalization variants

Phase 4: Cross-Platform & ONNX (v0.5) - Q2 2026

Linux/macOS WebGPU support
ONNX import/export
Model quantization (INT8, FP16)
Pre-trained model loading

Long-Term: v1.0 LTS - 2027-2028

Training utilities (BatchNorm, Dropout)
Distributed training
Advanced optimizations
Model zoo

Full roadmap: See ROADMAP.md

Documentation

For Users

Philosophy - Production-first design principles
Use Cases - When to use Born (and when not)
Getting Started - Installation and first steps (coming soon)
API Reference - Complete API documentation
Examples - Sample code (MNIST MLP, CNN, GPU inference)

For Contributors

Contributing - How to contribute
GitHub Issues - Report bugs or request features

Philosophy

"Born Ready"

Models trained anywhere (PyTorch, TensorFlow) are imported and born production-ready:

Training → Birth → Production
 (Burn)    (Born)    (Run)

PyTorch trains  →  Born imports  →  Born deploys
TensorFlow trains → Born imports → Born deploys
Born trains    →  Born ready   →  Born serves

Production First

Single Binary: Entire model in one executable
No Runtime: No Python, no dependencies
Fast Startup: < 100ms cold start
Small Memory: Minimal footprint
Cloud Native: Natural fit for Go services

Developer Experience

Type Safe: Catch errors at compile time
Clean API: Intuitive and ergonomic
Great Docs: Comprehensive documentation
Easy Deploy: go build and you're done

Performance

Actual Benchmarks (AMD Ryzen 9 5950X, NVIDIA RTX 3080):

Matrix Operations (WebGPU vs CPU)

Operation	CPU	GPU	Speedup
MatMul 1024x1024	7143ms	58ms	123x
MatMul 512x512	499ms	12ms	41x
MatMul 256x256	56ms	3.7ms	15x

Neural Network Inference

Batch Size	CPU	GPU	Speedup	Throughput
64	48ms	19ms	2.5x	3,357/s
256	182ms	21ms	8.5x	11,883/s
512	348ms	32ms	10.9x	15,973/s

Note: CPU backend uses naive O(n³) MatMul. SIMD optimizations planned for future releases.

Inspiration

Born is inspired by and learns from:

Burn - Architecture patterns, decorator design
PyTorch - API ergonomics
TinyGrad - Simplicity principles
Gonum - Go numerical computing
HDF5 for Go - Model serialization, dataset storage (planned)

Acknowledgments

Special thanks to the projects that made Born possible:

🙏 go-webgpu

Born's GPU acceleration is powered by go-webgpu - a remarkable pure Go binding for WebGPU via wgpu-native.

Why go-webgpu is special:

Zero CGO - Pure Go bindings using goffi for FFI
Cross-platform - Works on Windows, Linux, macOS
Modern API - Clean, idiomatic Go interface to WebGPU
Active development - Maintained and improving

Without go-webgpu, Born would need CGO for GPU support, making cross-compilation complex and defeating our "pure Go" goal. This library enables us to offer production-ready GPU acceleration while maintaining the simplicity of go build.

Thank you to Alfred Dobra and all contributors!

Community

Project is in early development. Star the repo to follow progress!

GitHub Org: github.com/born-ml
Main Repo: github.com/born-ml/born
Discussions: GitHub Discussions
Issues: Report bugs or request features

License

Licensed under the Apache License, Version 2.0.

Why Apache 2.0?

✅ Patent protection - Critical for ML algorithms and production use
✅ Enterprise-friendly - Clear legal framework for commercial adoption
✅ Industry standard - Same as TensorFlow, battle-tested in ML ecosystem
✅ Contributor protection - Explicit patent grant and termination clauses

See LICENSE file for full terms.

FAQ

Q: Why not use Gorgonia? A: Gorgonia is great but uses a different approach. Born focuses on modern Go (generics), pure Go (no CGO), and production-first design inspired by Burn.

Q: When will it be ready? A: Core features (v0.1-v0.3) are RELEASED! Includes CPU/GPU backends and transformer primitives. ONNX import targeted for v0.5.0 (Q2 2026).

Q: Can I use PyTorch models? A: Yes! Via ONNX import (v0.5.0, Q2 2026). Train in PyTorch, deploy with Born.

Q: WebAssembly support? A: Yes! Pure Go compiles to WASM natively. Inference in browsers out of the box.

Q: How can I help? A: Watch this space! Contributing guide coming soon.

Born for Production. Ready from Day One.

Made with ❤️ by the Born ML team

Documentation • Contributing • Community

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
autodiff Package autodiff provides automatic differentiation capabilities.	Package autodiff provides automatic differentiation capabilities.
backend
cpu Package cpu provides a pure Go CPU backend for tensor operations.	Package cpu provides a pure Go CPU backend for tensor operations.
cmd
born command Package main provides the Born ML Framework CLI.	Package main provides the Born ML Framework CLI.
born-bench command Package main provides benchmarking tools for Born.	Package main provides benchmarking tools for Born.
born-convert command Package main provides model conversion tools for Born.	Package main provides model conversion tools for Born.
examples
mnist command
mnist-cnn command
mnist-gpu command MNIST GPU Inference Benchmark	MNIST GPU Inference Benchmark
internal
autodiff Package autodiff implements automatic differentiation using the decorator pattern.	Package autodiff implements automatic differentiation using the decorator pattern.
autodiff/ops Package ops defines operation interfaces and implementations for automatic differentiation.	Package ops defines operation interfaces and implementations for automatic differentiation.
backend Package backend provides backend implementations for tensor operations.	Package backend provides backend implementations for tensor operations.
backend/cpu Package cpu implements the CPU backend with SIMD optimizations and BLAS integration.	Package cpu implements the CPU backend with SIMD optimizations and BLAS integration.
backend/webgpu Package webgpu implements the WebGPU backend for GPU-accelerated tensor operations.	Package webgpu implements the WebGPU backend for GPU-accelerated tensor operations.
nn Package nn implements neural network modules for the Born ML Framework.	Package nn implements neural network modules for the Born ML Framework.
optim Package optim implements optimization algorithms for training neural networks.	Package optim implements optimization algorithms for training neural networks.
tensor Package tensor provides the core tensor types and operations for Born ML framework.	Package tensor provides the core tensor types and operations for Born ML framework.
nn Package nn provides neural network layers and building blocks.	Package nn provides neural network layers and building blocks.
optim Package optim provides optimization algorithms for training neural networks.	Package optim provides optimization algorithms for training neural networks.
tensor Package tensor provides type-safe tensor operations for the Born ML framework.	Package tensor provides type-safe tensor operations for the Born ML framework.