loom

module
v0.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 19, 2026 License: Apache-2.0

README ΒΆ

LOOM - Deterministic Neural Virtual Machine

"The SQLite of AI" β€” A Polyglot Neural VM with Bit-Exact Reproducibility

Loom is a Deterministic Neural Virtual Machine (DNVM) β€” a portable execution environment for neural networks that guarantees bitwise-identical results across all platforms, backends, and language bindings. It combines a JIT compiler (generating WebGPU shaders at runtime) with a pure Go CPU backend to deliver the same numerical results everywhere:

  • Portable IR: JSON network configs are your "bytecode" β€” define once, execute anywhere.
  • JIT to GPU: Runtime WGSL shader generation β†’ WebGPU compute pipelines.
  • Polyglot FFI: Single Go core exports to Python, C#, TypeScript, WASM via C-ABI.
  • Bit-Exact: 0.0000000000 difference between CPU and GPU, x86 and ARM, native and browser.

Unlike frameworks that disclaim cross-platform reproducibility, Loom enforces determinism by design. It compiles to a single binary with zero dependencies, transparently routing operations to CPU or WebGPU without changing user code.

Go Version License PyPI npm NuGet Python .NET Bit-Exact Determinism

🌍 Cross-Ecosystem Compatibility

Models trained in any platform work instantly in all others. Bit-for-bit identical results across Go, Python, C#, TypeScript, and browser WASM.

Platform Package Install
Go GitHub go get github.com/openfluke/loom
Python PyPI pip install welvet
C#/.NET NuGet dotnet add package Welvet
TypeScript/Node NPM npm install @openfluke/welvet
Browser WASM import { init } from "@openfluke/welvet"
Supported Platforms

Pre-compiled binaries for:

  • Linux: x86_64, ARM64, ARMv7
  • Windows: x86_64, x86, ARM64
  • macOS: Apple Silicon (M1/M2/M3), Intel, Universal
  • Android: ARM64, ARMv7
  • iOS: ARM64 (XCFramework)

Technical Architecture

What is Loom?

Loom is a Deterministic Neural Virtual Machine (DNVM) β€” a portable execution environment for neural networks that guarantees bitwise-identical results across all platforms, backends, and language bindings.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        LOOM ARCHITECTURE                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Python    β”‚   β”‚  TypeScript β”‚   β”‚     C#      β”‚   β”‚    WASM     β”‚  β”‚
β”‚  β”‚   Binding   β”‚   β”‚   Binding   β”‚   β”‚   Binding   β”‚   β”‚   Browser   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                 β”‚                 β”‚                 β”‚         β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚                          β–Ό                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                        C-ABI (FFI Layer)                          β”‚  β”‚
β”‚  β”‚         Handle-based state management, JSON marshalling           β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                          β”‚                                              β”‚
β”‚                          β–Ό                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    EXECUTION ENGINE (nn/)                         β”‚  β”‚
β”‚  β”‚   Forward/Backward passes, Optimizers, Schedulers, Tweening       β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                                         β”‚                     β”‚
β”‚         β–Ό                                         β–Ό                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚   CPU Backend   β”‚                    β”‚    GPU JIT Compiler     β”‚     β”‚
β”‚  β”‚   (Pure Go)     β”‚                    β”‚   (WGSL Generation)     β”‚     β”‚
β”‚  β”‚                 β”‚                    β”‚         β–Ό               β”‚     β”‚
β”‚  β”‚  Deterministic  β”‚                    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚     β”‚
β”‚  β”‚  IEEE-754 Math  │◄────────────────►  β”‚  β”‚  WebGPU Runtime β”‚    β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   Bit-identical    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚     β”‚
β”‚                           results       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Classification
Term Description
Virtual Machine Executes a portable IR (JSON network configs) on heterogeneous backends
JIT Compiler Generates WGSL shaders at runtime, compiles to GPU compute pipelines
Deterministic Guarantees bitwise-identical results across CPU, GPU, WASM, x86, ARM
Polyglot Single Go core exports to Python, C#, TypeScript, WASM via C-ABI
Architectural Layers
Layer Component Role
IR (Bytecode) JSON network configs, nn/serialization.go Portable, declarative network specification
Type System nn/types.go with Tensor[T Numeric] Multi-precision tensors (F64β†’I8), generic operations
Execution nn/forward.go, nn/backward.go Deterministic layer-by-layer forward/backward
JIT Backend gpu/*.go Runtime WGSL generation β†’ WebGPU pipelines
FFI Runtime cabi/main.go Handle-based API, state management, memory safety
Bindings python/, csharp/, typescript/, wasm/ Thin wrappers exposing the C-ABI
Determinism Guarantee

Unlike typical ML runtimes that disclaim cross-platform reproducibility, Loom enforces bit-exact determinism:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Testing: Dense                                                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β€’ Max Diff:  0.0000000000 (Idx: -1)                                 β”‚
β”‚  β€’ Mean Diff: 0.0000000000                                           β”‚
β”‚  βœ… [GOLD STANDARD] Exact Bit-Determinism                            β”‚
β”‚     Perfect match. CPU and GPU logic are identical down to the bit.  β”‚
β”‚                                                                      β”‚
β”‚  Output Sample:                                                      β”‚
β”‚    [0] CPU: 0.5010004044 | GPU: 0.5010004044 | Diff: 0.0000000000    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Verified across:

  • CPU (Go) ↔ GPU (WebGPU/WGSL)
  • x86_64 ↔ ARM64 ↔ ARMv7
  • Linux ↔ Windows ↔ macOS ↔ Android ↔ iOS
  • Native ↔ WASM (Browser)
Comparison to Similar Projects
Project What It Is How Loom Differs
ONNX Runtime Multi-backend inference engine Loom adds training, bidirectional FFI, and determinism guarantees
GGML Quantized inference library Loom adds GPU JIT compilation and cross-platform bitwise reproducibility
TVM Compiler infrastructure for ML Loom is simpler (pure Go), directly embeddable, with determinism by design
WebAssembly Portable bytecode standard Loom's JSON network configs are conceptually "WASM for neural compute"
Why This Matters
  1. Reproducible Research: Same model, same inputs β†’ same outputs, regardless of where it runs
  2. Cross-Platform Deployment: Train on Linux GPU, deploy to iOS/Android/Browser with identical behavior
  3. Debugging: No "works on my machine" issues from floating-point non-determinism
  4. Verification: Prove correctness once, trust it everywhere

Key Strengths

  • True Embeddability: Single binary. Zero external dependencies. No Python runtime needed.
  • Hybrid Gradient/Geometric Engine: Neural Tweening combines geometric gap-closing with backpropagation-guided momentum for real-time adaptation.
  • Geometric/Recursive Clustering: Differentiable KMeansLayer allows networks to learn interpretable symbolic prototypes within a neural hierarchy.
  • Structural Parallelism: Native support for Inception, ResNeXt, Siamese, and MoE architectures via LayerParallel with 6 combine modes.
  • Native Mixed-Precision: Generic tensor backend supports int8, uint16, float32, float64 natively.
  • Complete Training Infrastructure: 7 LR schedulers, 3 optimizers (SGD/AdamW/RMSprop), 10 softmax variants.
  • Pure Go Tokenizer: HuggingFace-compatible BPE tokenizer for LLM inference.
  • Step-Based Execution: Real-time inference with layer-by-layer control via StepForward API.
  • Network Telemetry: Runtime introspection via GetMethodsJSON() and ExtractNetworkBlueprint().
Key Limitations
  • Ecosystem Maturity: No central "Model Zoo" or pip-installable convenience; relies on loading external checkpoints.
  • GPU Support: WebGPU acceleration is implemented (Dense, Conv2D, MHA) but is beta/experimental and less stable than CuDNN/CUDA.
  • Operator Coverage: While "Deep" support is good (MHA, LSTM), "Broad" support (e.g., 3D Conv, Deformable Attn, FFTs) is missing compared to SciPy/JAX.
  • Math Backend: Relies on custom explicit forward/backward passes rather than a general-purpose symbolic autograd graph.

Based on exhaustive benchmarks (300+ combinations tested), here are the optimal configurations:

Training Mode Selection
Scenario Recommended Mode Why
Real-time / Robotics StepBP or StepTweenChain 100% availability, 0ms blocking
Noisy / Adversarial Data StepTweenChain 94% robustness vs 86% for NormalBP
Offline Batch Training NormalBP Highest accuracy when blocking is acceptable
Multi-Agent Systems StepBP 12x better coordination vs blocked training
Continuous Adaptation StepTweenChain Maintains competence during distribution shift
Layer Γ— Training Mode (float32)
Layer Best Mode Score Accuracy Availability
Conv2D StepTweenChain 1187 98.7% 100%
Conv2D StepTween 1012 98.7% 100%
Attention StepTween 830 90.1% 100%
RNN StepTween 663 76.5% 100%
Dense StepTween 379 42.5% 100%
LSTM NormalBP 49 53.6% 28.7%

[!TIP] Conv2D + StepTweenChain + float32 is the optimal configuration for most real-time scenarios, achieving 98.7% accuracy with 100% availability.

Numeric Type Selection
Type Best For Notes
float32 Most use cases 18/30 benchmark wins, best accuracy
float64 Scientific computing Higher precision, slower, wins with NormalBP
int16 LSTM layers Only type that works for step-based LSTM
uint16 Edge/embedded Good balance of range and speed

[!NOTE] Integer types (int8, uint8, etc.) work but achieve only ~13-23% accuracy on adaptive tasks. Use floats for training, integers for quantized inference.


Benchmarks and Repro

Benchmark methodology and results live in docs/step_tween_assessment.md. Results are hardware- and build-dependent; use CPU runs as the reference baseline when comparing.


What's New

🧠 Recursive Neuro-Symbolic Architecture: The differentiable KMeansLayer enables models to learn hierarchical concept taxonomies. Perfect for OOD detection and robust classification. See docs/research_paper_7_recursive_neuro_symbolic.md.

πŸŽ‰ Transformer Inference: SmolLM2-135M-Instruct runs entirely in browser WASM with pure Go implementation.

🀯 Grid Softmax = Native MoE: Mathematically proven equivalent to PyTorch MoE with 97.1% loss reduction. See examples/moe_proof_demo.go.

⚑ Grid Scatter Mode: Place parallel branch outputs at specific 2D/3D grid positions for multi-agent systems, hierarchical RL, and ensemble methods with explicit topology.

🧠 Neural Tweening: Train and run simultaneously with 100% accuracy on shallow networks, never crashes to 0% during task changes. Benchmarks β†’

πŸ“¦ Recursive Safetensors: Full support for deeply nested architectures (MoE, Sequential, Parallel) with 100% bitwise save/load consistency. Verified with tva/testing/safetensors_recursive.go.

πŸ”’ Numerical Type Benchmarking: Compare network behavior across 13 numerical types (F64, F32, F16, BF16, F4, I64, I32, I16, I8, U64, U32, U16, U8) with in-memory quantization. WASM-compatible for browser deployment testing.

πŸ§ͺ MNIST Verification: End-to-end demo tva/demo/conv2d-mnist/main.go proving exact CPU/GPU consistency, training convergence, and multi-precision save/load integrity.


Framework Comparison

Global AI Landscape
Feature Category Feature Loom (Go) PyTorch (Py) TF / TFLite GoMLX (Go) Spago (Go) Core ML TF.js Candle (Rust)
Core Primary Language Go Python Python / C++ Go Go Swift / ObjC JS / TS Rust
Runtime Dependency None (Binary) Heavy (Pip) Binary (Edge) CGo / XLA None OS-Native Browser None
Auto-Differentiation ⚠️ Hybrid/Manual βœ… Full βœ… Full βœ… Full (XLA) βœ… Manual ❌ (Inference) βœ… Full βœ… Full
Safetensors βœ… Native βœ… βœ… βœ… ❌ ❌ βœ… βœ…
ONNX Support ❌ βœ… (Export) βœ… ⚠️ ❌ βœ… (Import) βœ… ⚠️
Structure Inference βœ… Auto-Detect ❌ ❌ ❌ ❌ ❌ ❌ ❌
Training Gradient Descent βœ… Manual Chain βœ… Standard βœ… Standard βœ… Standard βœ… Standard βœ… (On-device) βœ… Standard βœ… Standard
Neural Tweening βœ… Hybrid Engine ❌ ❌ ❌ ❌ ❌ ❌ ❌
LR Schedulers βœ… 7 Types βœ… βœ… βœ… ⚠️ Basic βœ… βœ… βœ…
Optimizers βœ… 3 (SGD/AdamW/RMSprop) βœ… Many βœ… Many βœ… βœ… ⚠️ βœ… βœ…
Layer Support Dense (MLP) βœ… βœ… βœ… βœ… βœ… βœ… βœ… βœ…
Conv2D βœ… βœ… βœ… βœ… ❌ βœ… βœ… βœ…
Conv1D βœ… Native βœ… βœ… βœ… ❌ βœ… βœ… βœ…
RNN / LSTM βœ… Full Gate βœ… βœ… βœ… βœ… βœ… βœ… βœ…
Transformer (MHA) βœ… (Explicit) βœ… βœ… βœ… βœ… (BERT) βœ… βœ… βœ…
SwiGLU βœ… Native βœ… βœ… βœ… ❌ ❌ ❌ βœ…
Parallel / MoE βœ… Structure ❌ (Manual) ❌ (Manual) ❌ ❌ ❌ ❌ ❌
Sequential Layers βœ… Native βœ… βœ… ⚠️ ⚠️ ⚠️ βœ… ⚠️
Embeddings βœ… βœ… βœ… βœ… βœ… βœ… βœ… βœ…
Tokenizer βœ… Pure Go ❌ (Rust/C++) ❌ (C++) ❌ ❌ βœ… ❌ βœ…
Normalization LayerNorm βœ… Native βœ… βœ… βœ… βœ… βœ… βœ… βœ…
RMSNorm βœ… Native ⚠️ (Manual) ⚠️ (Manual) βœ… ❌ ❌ ❌ βœ…
Residual/Skip βœ… Native βœ… βœ… βœ… ❌ βœ… βœ… βœ…
Advanced Stitch Layers βœ… Native ❌ (Manual) ❌ (Manual) ❌ ❌ ❌ ❌ ❌
Dynamic Arch Gen βœ… Built-in ❌ ❌ ❌ ❌ ❌ ❌ ❌
Step-Based Forward βœ… Unique ❌ ❌ ❌ ❌ ❌ ❌ ❌
K-Means Clustering βœ… Differentiable ❌ ❌ ❌ ❌ ❌ ❌ ❌
Correlation Analysis βœ… Pearson/Spearman ❌ ❌ ❌ ❌ ❌ ❌ ❌
Model Evaluation βœ… Deviation/Metrics βœ… βœ… ⚠️ ⚠️ ⚠️ ⚠️ ⚠️
Network Telemetry βœ… Blueprint API ❌ ⚠️ ❌ ❌ ❌ ⚠️ ❌
Runtime Introspection βœ… Reflection ⚠️ (Python) ⚠️ ❌ ❌ ❌ ⚠️ ❌
Platform WASM Training βœ… Full ❌ ❌ ❌ ❌ ❌ βœ… (Slow) βœ…
Cross-Lang ABI βœ… Universal ❌ ❌ ❌ ❌ ❌ ❌ ⚠️
Ecosystem HuggingFace Hub ⚠️ (Read/Inspect) βœ… Native βœ… Native ❌ βœ… ❌ βœ… βœ…
Pre-trained Zoo ❌ βœ… Massive βœ… Massive ❌ βœ… (Small) βœ… (Apple) βœ… Large ⚠️ Growing
Mobile/Web βœ… WASM / C-ABI βœ… (Mobile) βœ… King ❌ ❌ βœ… King (iOS) βœ… King (Web) βœ… (WASM)
Go Ecosystem Comparison
Category Feature Loom GoMLX Gorgonia Spago Go-Deep Gonum
Foundation Primary implementation Pure Go CGo (XLA) Pure Go + CGo Pure Go Pure Go Pure Go
Tensor Backend Custom (Generic) XLA (C++) Custom Custom (Dense) Custom Dense Matrix
Autograd ⚠️ Hybrid βœ… Full βœ… Symbolic βœ… Dynamic βœ… Backprop ❌
Model Load Safetensors βœ… Native βœ… ❌ ❌ ❌ ❌
Model Export binary/json XLA format Onnx (Import) Gob Json ❌
Architecture Dense (MLP) βœ… βœ… βœ… βœ… βœ… βœ… (Matrix Mul)
Conv2D βœ… βœ… βœ… βœ… βœ… ❌
Conv1D βœ… Native βœ… ⚠️ (via 2D) ⚠️ (via 2D) ❌ ❌
RNN / LSTM βœ… Full Gate βœ… ⚠️ Basic βœ… BiLSTM ❌ ❌
Transformer (MHA) βœ… Explicit βœ… ⚠️ Hard βœ… (BERT) ❌ ❌
SwiGLU βœ… βœ… ❌ ❌ ❌ ❌
Embeddings βœ… βœ… βœ… βœ… ❌ ❌
Parallel / MoE βœ… MoE + Gating ❌ (Manual) ❌ ❌ ❌ ❌
Sequential Layers βœ… Native + Nested ⚠️ (Manual) ⚠️ (Manual) ⚠️ (Manual) ❌ ❌
Tokenizer βœ… Pure Go ❌ (Deps) ❌ βœ… (WordPiece) ❌ ❌
Training Gradient Descent βœ… Manual βœ… Standard βœ… Standard βœ… Standard βœ… Standard ❌
Hybrid Tweening βœ… Unique ❌ ❌ ❌ ❌ ❌
LR Schedulers βœ… 7 Types βœ… βœ… ⚠️ Basic ❌ ❌
Optimizers βœ… SGD/AdamW/RMSprop βœ… βœ… βœ… ⚠️ SGD ❌
Softmax Variants βœ… 10 Types ⚠️ Standard ⚠️ Standard ⚠️ Standard ⚠️ Standard ❌
Normalization LayerNorm βœ… Native βœ… ⚠️ Manual βœ… ❌ ❌
RMSNorm βœ… Native βœ… ❌ ❌ ❌ ❌
Residual/Skip βœ… Native βœ… βœ… ❌ ❌ ❌
Advanced RoPE Embeddings βœ… GQA Support βœ… ❌ ❌ ❌ ❌
Network Grafting βœ… Unique ❌ ❌ ❌ ❌ ❌
Step-Based Forward βœ… Unique ❌ ❌ ❌ ❌ ❌
Dynamic Arch Gen βœ… Unique ❌ ❌ ❌ ❌ ❌
K-Means Clustering βœ… Differentiable ❌ ❌ ❌ ❌ ❌
Correlation Analysis βœ… Pearson/Spearman ❌ ❌ ❌ ❌ ❌
Model Evaluation βœ… Full Suite ⚠️ ⚠️ ⚠️ ❌ ❌
Network Telemetry βœ… Blueprint ❌ ⚠️ ❌ ❌ ❌
Runtime Introspection βœ… Reflection ❌ ⚠️ ❌ ❌ ❌
Platform C-ABI (Polyglot) βœ… Universal ❌ ❌ ❌ ❌ ❌
WASM Training βœ… Full ❌ (XLA) ❌ ❌ ❌ ❌
Ecosystem HuggingFace ⚠️ (Load) ❌ ❌ βœ… (Load) ❌ ❌
Documentation ⚠️ Growing βœ… Good βœ… Good βœ… Good ⚠️ Minimal βœ… Excellent
Maintenance πŸ”₯ Active πŸ”₯ Active ⚠️ Slow ⏸️ Paused ⚠️ Slow πŸ”₯ Active
Native Numerical Type & Precision Support
Layer Type Numerical Type Loom GoMLX Gorgonia Spago PyTorch
All Layers Float32 βœ… βœ… βœ… βœ… (Float64) βœ…
(Dense, Conv, Float64 (High Prec) βœ… Native βœ… βœ… βœ… βœ…
RNN, Attn) Float16 / BF16 ⚠️ (Storage) βœ… (XLA) ❌ ❌ βœ…
Int8 (Training) βœ… Native ❌ ❌ ❌ ⚠️ (QAT Wrapper)
Int8 (Inference) βœ… ❌ ❌ ❌ βœ… (Quant)
Int16, Int32, Int64 βœ… Native βœ… (XLA) ⚠️ (Tensor) ❌ ❌ (Tensor Only)
Uint8, Uint16, Uint32 βœ… Native βœ… (XLA) ⚠️ (Tensor) ❌ βœ… (Uint8 Only)

[!NOTE] Complete Type System: Unlike frameworks that treat integers primarily as storage formats for quantization, Loom's Generics allow native training and inference on exotic types like uint16 (common in medical imaging), int32, or float64 (scientific sim) across every layer type without changes to the model code.

Summary Verdict
  • Choose PyTorch if you are doing Research, need the latest SOTA models, or rely on complex dynamic architectures.
  • Choose TensorFlow / TFLite if you need robust Mobile/Edge Deployment.
  • Choose GoMLX if you need High-Performance Training in Go and can tolerate CGo/C++ dependencies.
  • Choose Core ML if you are targeting iOS/macOS exclusively.
  • Choose Loom if you need Pure Go-Native Embedding (Cloud/CLI/Server), want a single binary with zero dependencies, need to experiment with the Neural Tweening training paradigm, or need unique features like Step-Based Forward Pass for real-time inference and Dynamic Architecture Generation for automated model exploration.

Layer Types & Features

Supported Layer Types
Layer Type String Description
Dense dense Fully connected layer
LSTM lstm Long Short-Term Memory
RNN rnn Recurrent Neural Network
GRU gru Gated Recurrent Unit
Conv2D conv2d 2D Convolution
Conv1D conv1d 1D Convolution
Multi-Head Attention multi_head_attention Transformer attention
LayerNorm layer_norm Layer normalization
RMSNorm rms_norm RMS normalization
SwiGLU swiglu SwiGLU activation layer
KMeans kmeans Differentiable recursive clustering layer
Softmax softmax 10 variants (Standard, Grid, Hierarchical, Temperature, Gumbel, Masked, Sparsemax, Entmax, Adaptive, Mixture)
Embedding embedding Token embedding
Parallel parallel Branching with 6 combine modes (add, concat, multiply, average, grid_scatter, filter)
Sequential sequential Grouped sub-layers
Activation Functions

relu, sigmoid, tanh, softmax, gelu, swish, mish, leaky_relu, elu, selu, linear


SafeTensors & Model Interoperability

Loom features a universal SafeTensors engine capable of standardizing models from any framework (PyTorch, TensorFlow, HuggingFace) into a highly optimized, single-file format. It proactively handles complex nested architectures (like Mixture-of-Experts within Parallel layers) via recursive serialization.

1. Universal "Any-to-Any" Quantization

Load a model in high precision (float32/float64) and instantly quantize it to any supported type for deployment. The file format handles the type conversion automatically.

  • Input: Model weights in F32 (e.g., from HuggingFace)
  • Output: Quantized weights in F4, I8, BF16, U16 etc.
  • Verification: 100% round-trip integrity verified for all 143 layer/type combinations.
// Load standard model
tensors, _ := nn.LoadSafetensors("llama.safetensors")
 
// Save as 4-bit optimized web model (automatically quantizes)
for name, t := range tensors { t.DType = "F4" }
nn.SaveSafetensors("llama-web-4bit.safetensors", tensors)
2. WASM / In-Memory Operation

Loom's SafeTensors implementation can operate purely in memory (using []byte buffers) without any filesystem access, making it perfect for WebAssembly (WASM) and constrained environments.

// Serialize directly to memory (for sending to browser/client)
bytes, _ := nn.SerializeSafetensors(myModelWeights)
 
// Load directly from memory (no disk I/O required)
tensors, _ := nn.LoadSafetensorsWithShapes(bytes)
3. Full Layer Support

The interoperability layer supports every component in the Loom ecosystem:

Category Supported Layers
Core Dense, Embedding, Parallel, Sequential
** Convolution** Conv1D, Conv2D
Sequence RNN, LSTM, GRU
Attention MultiHeadAttention, SwiGLU
Norm/Act LayerNorm, RMSNorm, Softmax (10 variants)

GPU Acceleration (WebGPU)

Experimental GPU acceleration via WebGPU compute shaders. Treat all GPU paths (forward and backward) as experimental for now. Use with:

network.GPU = true
network.WeightsToGPU()           // Mount weights to GPU
output, _ := network.Forward(input)  // Auto-routes to GPU!
network.Backward(dOutput)     // GPU backward pass
network.ReleaseGPUWeights()      // Cleanup
GPU Support Matrix
Layer Type Forward Backward (Training) Notes
Dense βœ… Stable ⚠️ Experimental Production speedup (20x) on large layers.
Conv2D βœ… Stable ⚠️ Experimental Works well, optimized for 32+ filters.
Conv1D βœ… Stable ⚠️ Experimental Gradients implemented, accuracy tuning needed.
RNN βœ… Stable ⚠️ Experimental Weights update, but BPTT limited to batch=1.
LSTM βœ… Stable ⚠️ Experimental Same limitations as RNN.
LayerNorm βœ… Stable ⚠️ Experimental Forward is stable, backward can be numeric unstable.
RMSNorm βœ… Stable ⚠️ Experimental Same as LayerNorm.
SwiGLU βœ… Stable ⚠️ Experimental High performance.
MHA βœ… Stable ⚠️ Experimental Functional parity verified.
Softmax βœ… Stable ⚠️ Experimental Functional.
KMeans ❌ WIP ❌ WIP Currently runs on CPU only.

Quick Start

Quick docs:

Installation
# Clone the repository
git clone https://github.com/openfluke/loom.git
cd loom

# Install dependencies
go mod download
Simple Example
package main

import (
    "fmt"
    "github.com/openfluke/loom/nn"
)

func main() {
    network := nn.NewNetwork(4096, 4, 4, 5)  // 80 total layers

    if err := network.InitGPU(); err != nil {
        panic(err)
    }
    defer network.ReleaseGPU()

    input := make([]float32, 4096)
    output, gpuTime, _ := network.ForwardGPU(input)

    fmt.Printf("GPU Forward time: %v, Output size: %d\n", gpuTime, len(output))
}
Model Serialization
// Save a trained model
err := network.SaveModel("model.json", "my_model")

// Load it back - ONE LINE!
loadedNet, err := nn.LoadModel("model.json", "my_model")

// Or use strings (great for APIs/databases/WASM)
jsonString, err := network.SaveModelToString("my_model")
loadedNet, err := nn.LoadModelFromString(jsonString, "my_model")
Cross-Platform API
Function Go Python TypeScript C# C
Create BuildNetworkFromJSON() create_network_from_json() createNetworkFromJSON() CreateLoomNetwork() CreateLoomNetwork()
Forward Forward() forward_simple() forward() LoomForward() LoomForward()
Train Train() train_simple() train() LoomTrain() LoomTrain()
Save SaveModelToString() save_model_simple() saveModel() LoomSaveModel() LoomSaveModel()
Load LoadModelFromString() load_model_simple() loadLoomNetwork() LoomLoadModel() LoomLoadModel()
Evaluate EvaluateNetwork() evaluate_network_simple() evaluate() LoomEvaluateNetwork() LoomEvaluateNetwork()

Language Bindings

Python
pip install welvet
import welvet

config = {"batch_size": 1, "layers": [...]}
welvet.create_network_from_json(config)
output = welvet.forward_simple([0.1, 0.2, 0.3, 0.4])

See python/README.md for complete documentation.

TypeScript / Node.js
npm install @openfluke/welvet
import { init, createNetworkFromJSON } from "@openfluke/welvet";

await init();
const network = createNetworkFromJSON(JSON.stringify(config));
const output = network.Forward(JSON.stringify([[0.1, 0.2, 0.3, 0.4]]));

See typescript/README.md for complete documentation.

C# / .NET
dotnet add package Welvet
using Welvet;

Network.CreateFromJson(config);
var output = NativeMethods.LoomForward(input, input.Length);

See csharp/README.md for complete documentation.


Project Structure

loom/
β”œβ”€β”€ nn/                  # Neural network package (core)
β”œβ”€β”€ tokenizer/           # Pure Go BPE tokenizer
β”œβ”€β”€ wasm/                # WebAssembly module
β”œβ”€β”€ cabi/                # C ABI for FFI
β”œβ”€β”€ python/              # Python package (welvet)
β”œβ”€β”€ typescript/          # TypeScript/WASM package
β”œβ”€β”€ csharp/              # C#/.NET package (Welvet)
β”œβ”€β”€ fabric/              # Demo application
β”œβ”€β”€ pods/                # GPU compute pods
β”œβ”€β”€ model_conversion/    # HuggingFace model import
β”œβ”€β”€ docs/                # Documentation
└── detector/            # GPU device detection

Documentation

More Examples: See github.com/openfluke/tva for additional examples and experiments.

Comprehensive Test Suite

Loom includes a rigorous verification suite in tva/muniversal_testing.go and cabi/universal_test.c that validates functional correctness across all layers, numeric types, and backend engines (CPU/GPU).

Coverage Summary (2297 tests)
Test Section Tests Description
Part 1: Core 6 Forward/backward pass correctness for basic layers
Part 2: Serialization 2100 Save/Load for all layers Γ— 15 dtypes + parallel permutations
Part 3: Advanced 11 Complex layers (MHA, Grid Softmax, K-Means) and math ops
Part 5: GPU Determinism 15 Validates GPU forward pass matches CPU results
Part 6: GPU Training 21 Verifies GPU learning convergence vs CPU baseline
Part 7: In-Memory/WASM 144 SafeTensors round-trip without filesystem (11 layers Γ— 13 dtypes)

[!NOTE] GPU Acceleration Limits: As of v0.0.8, WebGPU acceleration is enabled for standard Forward/Backward passes. The structural API nn/step_forward.go (Step-based execution), nn/tween.go (Neural Tweening), and nn/kmeans_layer.go (K-Means) currently run on CPU only.

Browser Testing (v0.3.0): The universal test suite can now be run directly in the browser with full parity. See typescript/README.md for details on running serve.py.

C ABI Parity

The C test suite (cabi/universal_test.c) mirrors the Go suite with 2298 tests, validating that all functionality is accessible through the FFI layer for Python, C#, TypeScript, and WASM bindings.

Verified Advanced Architectures

The test suite also verifies complex, production-ready architectural patterns:

  • Recursive Symbol Learning (RN1-RN6): Differentiable K-Means layers nested to form taxonomies, achieving 100% accuracy on hierarchical tasks with full interpretability.
  • Heterogenous MoE: Using LayerParallel with CombineMode: "filter" to route inputs to experts of different depths/types (e.g., CNN expert vs Dense expert).
  • Stitched Experts: Using LayerStitch to harmonize outputs from parallel branches with different dimensions (e.g., 5-dim output and 7-dim output stitched to common 10-dim).
  • Neural Grafting: Training only the gating mechanism of an MoE while keeping experts frozen, using TweenStep for precise surgical updates.
  • Bit-Exact Determinism: Verifying that GPU forward passes match CPU results to within machine epsilon (often exactly bit-matching for integer ops).
Runnable Demos
  • MNIST Consistency (tva/demo/conv2d-mnist/main.go):

    • Trains a digit classifier on MNIST.
    • Saves model to JSON and Safetensors.
    • Reloads model and verifies 0.000000000 max difference in predictions.
    • Benchmarks all 13 numerical types (F64β†’U8) for quantization quality.
    • Proves robustness of SaveWeightsToSafetensors / LoadWeightsFromSafetensors.
  • Recursive Safetensors (tva/testing/safetensors_recursive.go):

    • Constructs a complex nested Network: MoE (Gate) -> [Parallel -> [Dense, Sequential -> [Conv1D, RNN]]].
    • Saves and reloads to prove structural integrity of serialization for arbitrary depths.

Requirements

  • Go: 1.24 or higher
  • GPU: WebGPU-compatible GPU (Vulkan, Metal, or D3D12) - optional
  • OS: Linux, macOS, or Windows

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

Apache License 2.0 - see LICENSE file for details.


Made with ❀️ by Openfluke

Directories ΒΆ

Path Synopsis
Package nn provides a grid neural network implementation with both CPU and GPU execution.
Package nn provides a grid neural network implementation with both CPU and GPU execution.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL