cabi

command
v0.0.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 2, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

README ΒΆ

LOOM C ABI

C Foreign Function Interface (FFI) for LOOM. Use LOOM transformers and neural networks from any language that supports C FFI: Python, C#, Rust, C++, Node.js, etc.

Features:

  • βœ… Simple API: New streamlined functions - CreateLoomNetwork, LoomForward, LoomTrain, LoomSaveModel, LoomLoadModel, LoomEvaluateNetwork
  • βœ… 8 Layer Types (All CPU): Dense, Conv2D, Multi-Head Attention, LayerNorm, RNN, LSTM, Softmax (10 variants), Parallel (4 combine modes)
  • βœ… Full CPU Implementation: Every layer works with complete forward/backward passes - tested and reliable!
  • βœ… Transformer Inference: Run LLMs with streaming generation
  • βœ… Cross-Platform Consistency: Same API as Python, TypeScript, C#, WASM
  • 🌐 Universal FFI: Works from any language (Python, C#, Rust, C++, Node.js, etc.)
  • πŸ“¦ Cross-Platform: Linux, macOS, Windows, Android, iOS
  • ⚠️ GPU Note: GPU/WebGPU code exists but is untested; all demos use reliable CPU execution

πŸŽ‰ NEW: Simple API

The simple API uses a global network instance - no handle management needed!

Quick Example (C)
#include <stdio.h>
#include <stdlib.h>
#include "libloom.h"

int main() {
    // Create network from JSON
    const char* config = "{"
        "\"batch_size\": 1,"
        "\"grid_rows\": 1,"
        "\"grid_cols\": 1,"
        "\"layers_per_cell\": 3,"
        "\"layers\": ["
            "{\"type\": \"dense\", \"input_size\": 8, \"output_size\": 16, \"activation\": \"relu\"},"
            "{\"type\": \"dense\", \"input_size\": 16, \"output_size\": 8, \"activation\": \"relu\"},"
            "{\"type\": \"dense\", \"input_size\": 8, \"output_size\": 2, \"activation\": \"sigmoid\"}"
        "]"
    "}";

    char* result = CreateLoomNetwork(config);
    printf("Network created: %s\n", result);
    FreeLoomString(result);

    // Forward pass
    float inputs[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8};
    char* output = LoomForward(inputs, 8);
    printf("Output: %s\n", output);
    FreeLoomString(output);

    // Training
    const char* batches = "["
        "{\"Input\": [0.2, 0.2, 0.2, 0.2, 0.8, 0.8, 0.8, 0.8], \"Target\": [1.0, 0.0]},"
        "{\"Input\": [0.9, 0.9, 0.9, 0.9, 0.1, 0.1, 0.1, 0.1], \"Target\": [0.0, 1.0]}"
    "]";

    const char* train_config = "{"
        "\"Epochs\": 100,"
        "\"LearningRate\": 0.1,"
        "\"UseGPU\": false,"
        "\"PrintEveryBatch\": 0"
    "}";

    char* train_result = LoomTrain(batches, train_config);
    printf("Training: %s\n", train_result);
    FreeLoomString(train_result);

    // Save model
    char* model_json = LoomSaveModel("my_model");
    printf("Model saved (%zu bytes)\n", strlen(model_json));

    // Load model
    char* load_result = LoomLoadModel(model_json, "my_model");
    printf("Model loaded: %s\n", load_result);
    FreeLoomString(load_result);
    FreeLoomString(model_json);

    // Evaluate
    const char* eval_inputs = "[[0.2, 0.2, 0.2, 0.2, 0.8, 0.8, 0.8, 0.8]]";
    const char* expected = "[0]";
    char* metrics = LoomEvaluateNetwork(eval_inputs, expected);
    printf("Evaluation: %s\n", metrics);
    FreeLoomString(metrics);

    return 0;
}
Simple API Reference
// Create network from JSON configuration
char* CreateLoomNetwork(const char* jsonConfig);
// Returns: {"status": "success", "message": "network created"}

// Forward pass
char* LoomForward(float* inputs, int length);
// Returns: JSON array of outputs, e.g., [0.95, 0.05]

// Backward pass
char* LoomBackward(float* gradients, int length);
// Returns: {"status": "success"}

// Update weights
void LoomUpdateWeights(float learningRate);

// Train network
char* LoomTrain(const char* batchesJSON, const char* configJSON);
// Returns: Training result with losses and throughput

// Save model to JSON string
char* LoomSaveModel(const char* modelID);
// Returns: Complete model as JSON string

// Load model from JSON string
char* LoomLoadModel(const char* jsonString, const char* modelID);
// Returns: {"success": true}

// Get network information
char* LoomGetNetworkInfo();
// Returns: {"grid_rows": 1, "grid_cols": 3, ...}

// Evaluate network with deviation metrics
char* LoomEvaluateNetwork(const char* inputsJSON, const char* expectedOutputsJSON);
// Returns: {"total_samples": 4, "score": 100.0, "avg_deviation": 0.0, ...}

// Free C strings returned by LOOM
void FreeLoomString(char* str);

Cross-Platform Demos:

  • C: simple_bench.c - Grid scatter with save/load verification
  • Python: ../python/examples/grid_scatter_demo.py
  • TypeScript: ../typescript/example/grid-scatter.ts
  • JavaScript/WASM: ../wasm/grid_scatter_demo.js
  • C#: ../csharp/examples/GridScatterDemo.cs

All demos produce identical results (99.5% improvement, 100/100 quality score, 0.00 save/load difference)!

πŸš€ Transformer Inference

Run LLMs (LLaMA, SmolLM, GPT-2, etc.) from any language via C ABI with streaming support.

Quick Start

1. Build the library:

./build.sh
# Creates libloom.so (Linux) / libloom.dylib (macOS) / libloom.dll (Windows)

2. Run the demo:

# Download a model first (e.g., SmolLM2-135M-Instruct from HuggingFace)
# Then start the web interface:
python3 web_interface.py ../models/SmolLM2-135M-Instruct 8080

# Open http://localhost:8080/inference.html in your browser
Python Example (Streaming)
import ctypes
import json

# Load library
loom = ctypes.CDLL('./libloom.so')

# Configure function signatures
loom.LoadTokenizerFromBytes.argtypes = [ctypes.c_char_p, ctypes.c_int]
loom.LoadTokenizerFromBytes.restype = ctypes.c_void_p

loom.LoadTransformerFromBytes.argtypes = [ctypes.c_char_p, ctypes.c_int, ctypes.c_char_p, ctypes.c_int]
loom.LoadTransformerFromBytes.restype = ctypes.c_void_p

loom.GenerateNextToken.argtypes = [ctypes.c_char_p, ctypes.c_float]
loom.GenerateNextToken.restype = ctypes.c_void_p

loom.EncodeText.argtypes = [ctypes.c_char_p, ctypes.c_bool]
loom.EncodeText.restype = ctypes.c_void_p

loom.DecodeTokens.argtypes = [ctypes.c_char_p, ctypes.c_bool]
loom.DecodeTokens.restype = ctypes.c_void_p

loom.Loom_FreeCString.argtypes = [ctypes.c_void_p]
loom.Loom_FreeCString.restype = None

# Load tokenizer
with open('models/SmolLM2-135M-Instruct/tokenizer.json', 'rb') as f:
    tok_data = f.read()
result_ptr = loom.LoadTokenizerFromBytes(tok_data, len(tok_data))
result_json = ctypes.string_at(result_ptr).decode('utf-8')
loom.Loom_FreeCString(result_ptr)
result = json.loads(result_json)
print(f"βœ“ Tokenizer loaded (vocab: {result['vocab_size']})")

# Load transformer
with open('models/SmolLM2-135M-Instruct/config.json', 'rb') as f:
    config = f.read()
with open('models/SmolLM2-135M-Instruct/model.safetensors', 'rb') as f:
    weights = f.read()
result_ptr = loom.LoadTransformerFromBytes(config, len(config), weights, len(weights))
result_json = ctypes.string_at(result_ptr).decode('utf-8')
loom.Loom_FreeCString(result_ptr)
result = json.loads(result_json)
print(f"βœ“ Model loaded ({result['num_layers']} layers, hidden={result['hidden_size']})")

# Encode prompt
prompt = "Once upon a time"
encode_ptr = loom.EncodeText(prompt.encode('utf-8'), True)
encode_json = ctypes.string_at(encode_ptr).decode('utf-8')
loom.Loom_FreeCString(encode_ptr)
tokens = json.loads(encode_json)['ids']

# Generate tokens one at a time (streaming)
for i in range(50):
    gen_ptr = loom.GenerateNextToken(json.dumps(tokens).encode('utf-8'), 0.7)
    gen_json = ctypes.string_at(gen_ptr).decode('utf-8')
    loom.Loom_FreeCString(gen_ptr)
    gen_result = json.loads(gen_json)

    next_token = gen_result['token']
    tokens.append(next_token)

    # Decode and print token
    decode_ptr = loom.DecodeTokens(json.dumps([next_token]).encode('utf-8'), True)
    decode_json = ctypes.string_at(decode_ptr).decode('utf-8')
    loom.Loom_FreeCString(decode_ptr)
    token_text = json.loads(decode_json)['text']
    print(token_text, end='', flush=True)

    if gen_result.get('is_eos'):
        break

print()  # Newline at end

Transformer API Reference

Loading
// Load tokenizer from bytes
char* LoadTokenizerFromBytes(char* dataPtr, int dataLen);
// Returns: {"success": true, "vocab_size": 49152, ...}

// Load transformer model
char* LoadTransformerFromBytes(char* configPtr, int configLen,
                               char* weightsPtr, int weightsLen);
// Returns: {"success": true, "num_layers": 30, "hidden_size": 576, ...}
Text Processing
// Encode text to token IDs
char* EncodeText(char* textPtr, bool addSpecialTokens);
// Returns: {"success": true, "ids": [123, 456, ...]}

// Decode token IDs to text
char* DecodeTokens(char* idsJSON, bool skipSpecialTokens);
// Returns: {"success": true, "text": "decoded text"}
Generation
// Generate single next token (for streaming)
char* GenerateNextToken(char* idsJSON, float temperature);
// Returns: {"success": true, "token": 789, "is_eos": false}

// Generate full text at once
char* GenerateText(char* promptPtr, int maxTokens, float temperature);
// Returns: {"success": true, "generated_text": "...", "num_tokens": 50}
Memory Management
// Free C strings returned by LOOM functions
void Loom_FreeCString(char* ptr);

⚠️ Important: All functions return JSON strings allocated with malloc(). You must call Loom_FreeCString() on every returned pointer to avoid memory leaks.

⚠️ Use c_void_p in Python: When using ctypes, declare return types as ctypes.c_void_p (not c_char_p) to avoid Python's automatic string conversion which corrupts the pointer.

Language Examples

C#
[DllImport("libloom.so")]
private static extern IntPtr LoadTokenizerFromBytes(byte[] data, int len);

[DllImport("libloom.so")]
private static extern void Loom_FreeCString(IntPtr ptr);

// Usage
byte[] tokData = File.ReadAllBytes("tokenizer.json");
IntPtr resultPtr = LoadTokenizerFromBytes(tokData, tokData.Length);
string resultJson = Marshal.PtrToStringAnsi(resultPtr);
Loom_FreeCString(resultPtr);
Rust
#[link(name = "loom")]
extern "C" {
    fn LoadTokenizerFromBytes(data: *const u8, len: i32) -> *mut c_char;
    fn Loom_FreeCString(ptr: *mut c_char);
}

unsafe {
    let tok_data = std::fs::read("tokenizer.json")?;
    let result_ptr = LoadTokenizerFromBytes(tok_data.as_ptr(), tok_data.len() as i32);
    let result_cstr = CStr::from_ptr(result_ptr);
    let result_json = result_cstr.to_str()?;
    Loom_FreeCString(result_ptr);
}
Node.js (ffi-napi)
const ffi = require("ffi-napi");
const loom = ffi.Library("./libloom.so", {
  LoadTokenizerFromBytes: ["string", ["pointer", "int"]],
  Loom_FreeCString: ["void", ["pointer"]],
});

const tokData = fs.readFileSync("tokenizer.json");
const resultPtr = loom.LoadTokenizerFromBytes(tokData, tokData.length);
const resultJson = JSON.parse(resultPtr);
loom.Loom_FreeCString(resultPtr);

Building

βœ… All build scripts now include transformer.go - All platforms will have transformer inference support.

Each platform script now builds all variants automatically:

# macOS - builds arm64 + x86_64 + universal (3 outputs)
./build_macos.sh

# iOS - builds device + simulators + XCFramework (4 outputs)  
./build_ios.sh

# Linux - builds for current architecture
./build_linux.sh
Manual Platform Selection
./build_all.sh linux arm64          # Linux ARM64
./build_all.sh macos universal      # macOS Universal Binary
./build_all.sh windows x86_64       # Windows 64-bit
./build_all.sh android arm64        # Android ARM64

# Build all available platforms at once
./build_all.sh --clean all
Package & Serve Builds
# Zip all compiled builds and start HTTP server for download
./serve_builds.sh
Verify Transformer Functions

After building, verify transformer functions are included:

# macOS
nm -gU compiled/macos_universal/libloom.dylib | grep LoadTokenizer

# Linux
nm -D compiled/linux_x86_64/libloom.so | grep LoadTokenizer

# Windows (on Windows)
dumpbin /exports compiled/windows_x86_64/libloom.dll | findstr LoadTokenizer

You should see: LoadTokenizerFromBytes, LoadTransformerFromBytes, EncodeText, DecodeTokens, GenerateText, GenerateNextToken

Supported Platforms
Platform Architectures Output Status Notes
macOS arm64, x86_64, universal libloom.dylib βœ… Tested Universal = fat binary for both archs
iOS arm64 (device), x86_64_sim, arm64_sim libloom.a πŸ”¨ Builds Static library, not yet tested on device
Linux x86_64 libloom.so βœ… Tested Native build
Linux arm64 libloom.so ❌ Broken WebGPU compile error (on todo list)
Windows x86_64 libloom.dll βœ… Tested Works via NuGet package
Windows arm64 libloom.dll ❌ Broken WebGPU compile error (on todo list)
Android arm64, armv7, x86_64, x86 libloom.so βœ… Tested Requires Android NDK
WebAssembly wasm32 main.wasm βœ… Tested Browser/Node.js via wasm/

Legend: βœ… Tested = verified working on target platform | πŸ”¨ Builds = compiles successfully, not yet device-tested | ⚠️ Untested = build script exists but not verified

Build Output Directories:

  • compiled/macos_arm64/ - Apple Silicon Macs
  • compiled/macos_x86_64/ - Intel Macs
  • compiled/macos_universal/ - Universal binary (24MB, both archs)
  • compiled/ios_arm64/ - iPhone/iPad devices
  • compiled/ios_arm64_sim/ - Simulator on Apple Silicon Macs
  • compiled/ios_x86_64_sim/ - Simulator on Intel Macs
  • compiled/ios_xcframework/ - XCFramework for Xcode integration
  • compiled/linux_x86_64/ - Linux x86_64
  • compiled/android_arm64/ - Android ARM64

Neural Network API (Legacy)

char* Loom_NewNetwork(int inputSize, int gridRows, int gridCols, int layersPerCell, bool useGPU);

Creates a new neural network and returns JSON with handle:

{
  "handle": 1,
  "type": "Network",
  "input_size": 784,
  "grid_rows": 2,
  "grid_cols": 1,
  "layers_cell": 1,
  "total_layers": 2,
  "gpu": true,
  "gpu_init_ms": 45
}
Layer Initialization
char* Loom_InitDenseLayer(int inputSize, int outputSize, int activation);

Creates a dense layer configuration (returns JSON string):

Activation types:

  • 0 = Linear
  • 1 = ReLU
  • 2 = Sigmoid
  • 3 = Tanh
char* Loom_SetLayer(int64_t handle, int row, int col, int layer, char* configJSON);

Sets layer configuration from JSON.

Example:

char* config = Loom_InitDenseLayer(784, 392, 1); // 784β†’392, ReLU
Loom_SetLayer(handle, 0, 0, 0, config);
Loom_FreeCString(config);
Method Calling
char* Loom_Call(int64_t handle, char* method, char* argsJSON);

Dynamically calls any Network method with JSON arguments.

Examples:

// Forward pass
char* output = Loom_Call(handle, "ForwardCPU", "[[0.1, 0.2, ...]]");

// Training
char* result = Loom_Call(handle, "Train", "[{\"epochs\": 10, \"lr\": 0.01}]");

// Get batch size
char* size = Loom_Call(handle, "GetBatchSize", "[]");
Introspection
## Neural Network API (Legacy)

The C ABI also exposes the original neural network training API. See the full docs in the old README or use `Loom_ListMethods()` to discover available functions.

### Basic Neural Network Example

```c
#include <stdio.h>
#include <stdlib.h>

extern char* Loom_NewNetwork(int, int, int, int, bool);
extern char* Loom_InitDenseLayer(int, int, int);
extern char* Loom_SetLayer(int64_t, int, int, int, char*);
extern char* Loom_Call(int64_t, char*, char*);
extern void Loom_Free(int64_t);
extern void Loom_FreeCString(char*);

int main() {
    // Create 2-layer network (784 β†’ 392 β†’ 10)
    char* result = Loom_NewNetwork(784, 2, 1, 1, false);
    int64_t handle = 1;  // Parse from JSON
    Loom_FreeCString(result);

    // Layer 1: 784 β†’ 392, ReLU
    char* layer0 = Loom_InitDenseLayer(784, 392, 1);
    Loom_SetLayer(handle, 0, 0, 0, layer0);
    Loom_FreeCString(layer0);

    // Layer 2: 392 β†’ 10, Linear
    char* layer1 = Loom_InitDenseLayer(392, 10, 0);
    Loom_SetLayer(handle, 1, 0, 0, layer1);
    Loom_FreeCString(layer1);

    // Forward pass
    char* output = Loom_Call(handle, "ForwardCPU", "[[0.1, ...]]");
    printf("Output: %s\n", output);
    Loom_FreeCString(output);

    Loom_Free(handle);
    return 0;
}

Compile:

gcc -o mnist mnist.c -L. -lloom -Wl,-rpath,.

Files

  • transformer.go - Transformer inference C exports
  • main.go - Neural network C exports (legacy)
  • web_interface.py - Python web server with streaming inference
  • inference.html - Browser UI for text generation
  • build.sh - Simple build script
  • build_all.sh - Multi-platform build system
  • test_transformer.sh - Setup verification

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Your Application       β”‚
β”‚  (Python/C#/Rust/etc)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚ C FFI
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   libloom.so/.dylib     β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Transformer Engine  β”‚ β”‚
β”‚ β”‚  β€’ SmolLM2-135M     β”‚ β”‚
β”‚ β”‚  β€’ Token-by-token   β”‚ β”‚
β”‚ β”‚  β€’ BPE tokenizer    β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Neural Network API  β”‚ β”‚
β”‚ β”‚  β€’ Training         β”‚ β”‚
β”‚ β”‚  β€’ Inference        β”‚ β”‚
β”‚ β”‚  β€’ GPU support      β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Performance Notes

  • Memory: SmolLM2-135M uses ~500MB RAM
  • Speed: ~10-50 tokens/sec on CPU (depends on model size)
  • Streaming: Token-by-token generation for real-time UX
  • GPU: Not yet implemented for transformers (CPU only)

Troubleshooting

Python: "munmap_chunk(): invalid pointer"

  • Use ctypes.c_void_p for return types, not c_char_p
  • Python's automatic conversion corrupts the pointer

Library not found:

export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH  # Linux
export DYLD_LIBRARY_PATH=.:$DYLD_LIBRARY_PATH  # macOS

Cross-compilation fails:

  • Install required toolchains (see Building section)
  • Check $ANDROID_NDK_HOME for Android builds

License

MIT (same as parent LOOM project)


Ready to use transformers from your favorite language? Start with web_interface.py as a reference implementation! πŸš€


Returns JSON array of all available methods:

```json
{
  "methods": [
    {
      "name": "ForwardCPU",
      "parameters": ["[][]float32"],
      "returns": ["[][]float32"]
    },
    {
      "name": "Train",
      "parameters": ["nn.TrainingConfig"],
      "returns": ["error"]
    }
  ],
  "count": 24
}
char* Loom_GetInfo(int64_t handle);

Returns object metadata:

{
  "type": "*nn.Network",
  "kind": "ptr",
  "methods": 24,
  "handle": 1,
  "gpu_enabled": true,
  "grid_rows": 2,
  "grid_cols": 1,
  "layers_per_cell": 1,
  "input_size": 784,
  "batch_size": 32,
  "total_layers": 2
}
Model Persistence
char* Loom_SaveModel(int64_t handle, char* modelID);

Serializes network to JSON string.

char* Loom_LoadModel(char* jsonString, char* modelID);

Deserializes network from JSON (returns handle info).

Memory Management
void Loom_Free(int64_t handle);

Releases GPU resources and deletes object from handle map.

void Loom_FreeCString(char* p);

Frees C strings allocated by LOOM (all return values).

Version Info
char* Loom_GetVersion();

Returns version string.

Transformer Inference API

Loading Functions
// Load tokenizer from bytes
char* LoadTokenizerFromBytes(char* dataPtr, int dataLen);

// Load transformer model from config and weights
char* LoadTransformerFromBytes(char* configPtr, int configLen,
                               char* weightsPtr, int weightsLen);
Text Processing
// Encode text to token IDs
char* EncodeText(char* textPtr, bool addSpecialTokens);

// Decode token IDs to text
char* DecodeTokens(char* idsJSON, bool skipSpecialTokens);
Generation
// Generate text from prompt
char* GenerateText(char* promptPtr, int maxTokens, float temperature);

// Generate single next token
char* GenerateNextToken(char* idsJSON, float temperature);
Quick Start - Transformer Inference

1. Serve model files:

# Method 1: Go HTTP server
cd cmd/serve_model_bytes
./serve_model_bytes -model ../../models/SmolLM2-135M-Instruct -port 8080

# Method 2: Python web interface (uses C ABI directly)
./web_interface.py ../models/SmolLM2-135M-Instruct 8080

2. Open web interface:

# Open inference.html in your browser
open http://localhost:8080/inference.html

Example Python usage:

#!/usr/bin/env python3
import ctypes
import json

# Load shared library
loom = ctypes.CDLL('./libloom.so')

# Configure function signatures
loom.LoadTokenizerFromBytes.argtypes = [ctypes.c_char_p, ctypes.c_int]
loom.LoadTokenizerFromBytes.restype = ctypes.c_char_p
loom.GenerateText.argtypes = [ctypes.c_char_p, ctypes.c_int, ctypes.c_float]
loom.GenerateText.restype = ctypes.c_char_p

# Load tokenizer
with open('models/SmolLM2-135M-Instruct/tokenizer.json', 'rb') as f:
    tok_data = f.read()
result_ptr = loom.LoadTokenizerFromBytes(tok_data, len(tok_data))
result = json.loads(ctypes.string_at(result_ptr).decode('utf-8'))
print(f"Tokenizer: vocab_size={result['vocab_size']}")

# Load transformer
with open('models/SmolLM2-135M-Instruct/config.json', 'rb') as f:
    config = f.read()
with open('models/SmolLM2-135M-Instruct/model.safetensors', 'rb') as f:
    weights = f.read()
result_ptr = loom.LoadTransformerFromBytes(config, len(config), weights, len(weights))
result = json.loads(ctypes.string_at(result_ptr).decode('utf-8'))
print(f"Model: {result['num_layers']} layers, hidden_size={result['hidden_size']}")

# Generate text
result_ptr = loom.GenerateText(b"Once upon a time", 50, 0.7)
result = json.loads(ctypes.string_at(result_ptr).decode('utf-8'))
print(f"Generated: {result['generated_text']}")

Neural Network API

Usage Example

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

// Function declarations
extern char* Loom_NewNetwork(int, int, int, int, bool);
extern char* Loom_InitDenseLayer(int, int, int);
extern char* Loom_SetLayer(int64_t, int, int, int, char*);
extern char* Loom_Call(int64_t, char*, char*);
extern void Loom_Free(int64_t);
extern void Loom_FreeCString(char*);

int main() {
    // Create network
    char* result = Loom_NewNetwork(784, 2, 1, 1, false);
    int64_t handle = extractHandle(result); // Parse JSON
    Loom_FreeCString(result);

    // Initialize layers
    char* layer0 = Loom_InitDenseLayer(784, 392, 1);
    Loom_SetLayer(handle, 0, 0, 0, layer0);
    Loom_FreeCString(layer0);

    char* layer1 = Loom_InitDenseLayer(392, 10, 0);
    Loom_SetLayer(handle, 1, 0, 0, layer1);
    Loom_FreeCString(layer1);

    // Forward pass
    char* input = "[[0.1, 0.2, ...]]"; // 784 values
    char* output = Loom_Call(handle, "ForwardCPU", input);
    printf("Output: %s\n", output);
    Loom_FreeCString(output);

    // Cleanup
    Loom_Free(handle);
    return 0;
}

Compile:

gcc -o my_program my_program.c -L. -lloom -Wl,-rpath,.

Benchmark Results

Run ./simple_bench to compare CPU vs GPU performance:

=== LOOM C ABI Simple Benchmark ===
Version: LOOM C ABI v1.0

Network: 2x1x1 grid, input_size=784
Iterations: 100

--- CPU Test ---
CPU Network created in 2.34 ms (handle: 1)
Layers initialized
CPU Forward: 100 iterations in 45.67 ms (avg: 0.4567 ms/iter)

--- GPU Test ---
GPU Network created in 52.10 ms (handle: 2)
Layers initialized
GPU Forward: 100 iterations in 12.34 ms (avg: 0.1234 ms/iter)

=== Results ===
CPU Avg: 0.4567 ms/iter
GPU Avg: 0.1234 ms/iter
Speedup: 3.70x (GPU faster)

Type Conversion

LOOM automatically converts between JSON and Go types:

Go Type JSON Type Example
int, int32, int64 Number 42
float32, float64 Number 3.14
bool Boolean true
string String "hello"
[]T Array [1, 2, 3]
map[string]T Object {"key": "value"}
struct Object {"field": 123}
Custom types (LayerType) Number 1

Language Bindings

Python (ctypes)
import ctypes
import json

loom = ctypes.CDLL('./libloom.so')
loom.Loom_NewNetwork.restype = ctypes.c_char_p
loom.Loom_Call.restype = ctypes.c_char_p
loom.Loom_FreeCString.argtypes = [ctypes.c_char_p]

# Create network
result = loom.Loom_NewNetwork(784, 2, 1, 1, False)
data = json.loads(result.decode('utf-8'))
handle = data['handle']
loom.Loom_FreeCString(result)

# Forward pass
input_json = json.dumps([[0.1] * 784])
output = loom.Loom_Call(handle, b"ForwardCPU", input_json.encode())
print(output.decode('utf-8'))
loom.Loom_FreeCString(output)

# Cleanup
loom.Loom_Free(handle)
Rust (FFI)
use std::ffi::{CString, CStr};
use std::os::raw::c_char;

#[link(name = "loom")]
extern "C" {
    fn Loom_NewNetwork(input: i32, rows: i32, cols: i32, layers: i32, gpu: bool) -> *mut c_char;
    fn Loom_Call(handle: i64, method: *const c_char, args: *const c_char) -> *mut c_char;
    fn Loom_Free(handle: i64);
    fn Loom_FreeCString(p: *mut c_char);
}

fn main() {
    unsafe {
        let result = Loom_NewNetwork(784, 2, 1, 1, false);
        let result_str = CStr::from_ptr(result).to_str().unwrap();
        println!("{}", result_str);
        Loom_FreeCString(result);
    }
}

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ C/C++/Rust  β”‚
β”‚   Program   β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ C ABI calls
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   libloom.so/dylib      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Handle Manager    β”‚  β”‚
β”‚  β”‚  (sync.Mutex)     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚            β”‚             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ JSON Converter    β”‚  β”‚
β”‚  β”‚  (reflect)        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚            β”‚             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  nn.Network       β”‚  β”‚
β”‚  β”‚  Methods (24+)    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Error Handling

All functions return JSON. Errors are indicated by {"error": "message"}:

char* result = Loom_Call(handle, "InvalidMethod", "[]");
// Returns: {"error": "Method not found: InvalidMethod"}

Always check for errors before parsing results.

Thread Safety

  • Handle storage is protected by sync.Mutex
  • Multiple goroutines can safely access different Network objects
  • Same Network object should not be used concurrently from multiple threads

Performance Notes

  • GPU initialization overhead: ~50ms first call
  • JSON parsing: Minimal overhead for small payloads
  • Reflection overhead: ~1-5Β΅s per method call
  • Best for: Batch operations, not per-sample inference

License

Same as parent LOOM project.

Documentation ΒΆ

The Go Gopher

There is no documentation for this package.

Directories ΒΆ

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL