mil

package
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 11, 2026 License: MIT Imports: 3 Imported by: 1

Documentation

Rendered for darwin/amd64

Overview

Package mil generates MIL (Model Intermediate Language) programs and weight blobs for Apple Neural Engine compilation.

text := mil.GenConv(16, 16, 1)
blob, _ := mil.BuildWeightBlob(weights, 16, 16)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BuildIdentityWeightBlob

func BuildIdentityWeightBlob(channels int) ([]byte, error)

BuildIdentityWeightBlob builds weights for an identity convolution (I matrix).

func BuildWeightBlob

func BuildWeightBlob(weights []float32, outCh, inCh int) ([]byte, error)

BuildWeightBlob constructs the binary weight blob for MIL compilation.

The blob layout matches the ANE's expected format:

  • Bytes 0-63: File header (0x01 at offset 0, 0x02 at offset 4)
  • Bytes 64-127: Chunk header (0xDEADBEEF magic, data size, data offset)
  • Bytes 128+: FP16 weight data

weights must have exactly outCh*inCh elements (OIHW layout, H=W=1).

func BuildWeightBlobV1

func BuildWeightBlobV1(data []float32) ([]byte, error)

BuildWeightBlobV1 constructs a binary weight blob for a flat 1D weight vector. Unlike BuildWeightBlob which reshapes to OIHW, this stores raw 1D data suitable for RMSNorm weight vectors and other non-convolution weights.

func GenConv

func GenConv(inCh, outCh, spatial int) string

GenConv generates a MIL text for a 1×1 convolution kernel with fp16 internal computation. inCh and outCh are channel counts; spatial is the spatial dimension (1 for vectors).

func GenConvFP16IO

func GenConvFP16IO(inCh, outCh, spatial int) string

GenConvFP16IO generates a MIL text for a 1×1 convolution with fp16 I/O (no casts).

func GenConvFP32

func GenConvFP32(inCh, outCh, spatial int) string

GenConvFP32 generates a MIL text for a 1×1 convolution with fp32 weights (no casting).

func GenGQAExpand

func GenGQAExpand(kvHeads, qHeads, headDim, seqLen int) string

GenGQAExpand generates a MIL text for expanding KV heads for grouped-query attention. It tiles the KV tensor along the head dimension by a factor of qHeads/kvHeads.

func GenIdentity

func GenIdentity(channels, spatial int) string

GenIdentity generates a MIL text for an identity operation (1×1 conv with identity weights).

func GenIdentityFP16IO

func GenIdentityFP16IO(channels, spatial int) string

GenIdentityFP16IO generates a MIL text for an identity operation with fp16 I/O (no fp32 casts, the ANE reads and writes fp16 directly).

func GenMatmul

func GenMatmul(inCh, outCh, spatial int) string

GenMatmul generates a MIL text for a matrix multiplication as a 1×1 convolution. This is equivalent to y = x @ W^T for [batch, inCh] -> [batch, outCh].

func GenRMSNorm

func GenRMSNorm(channels, spatial int, eps float64) string

GenRMSNorm generates a MIL text for the overflow-safe 11-op RMSNorm decomposition.

The 11-op sequence prevents fp16 overflow (values >256 cause CPU fallback):

abs → reduce_max → maximum(1e-6) → real_div → square → reduce_mean →
add(eps) → sqrt → mul(safe_max) → real_div → mul(weight)

The program takes a single fp16 tensor input [1, channels, 1, spatial] and produces the same shape output. The weight vector is loaded from a BLOBFILE.

func GenReadState

func GenReadState(name string, shape [4]int) string

GenReadState generates a MIL text for reading a named state buffer. This is used for iOS 18+ stateful inference (e.g., KV cache on ANE).

func GenSDPA

func GenSDPA(headDim, nHeads, seqLen int) string

GenSDPA generates a MIL text for scaled dot-product attention. Inputs: Q, K, V of shape [1, nHeads, seqLen, headDim]. Scale is 1/sqrt(headDim).

func GenScaleFP16IO

func GenScaleFP16IO(spatial int) string

GenScaleFP16IO generates a MIL text for a simple multiplication (1 channel, S spatial). Each spatial element is multiplied by the scalar weight.

func GenUpdateState

func GenUpdateState(name string, shape [4]int) string

GenUpdateState generates a MIL text for updating a named state buffer. This emits the coreml_update_state op for iOS 18+ stateful inference.

Types

type BlobDataType

type BlobDataType uint32

BlobDataType identifies the element type in a weight blob entry.

const (
	BlobFloat16 BlobDataType = 1
	BlobFloat32 BlobDataType = 2
	BlobUInt8   BlobDataType = 3
	BlobInt8    BlobDataType = 8
)

type BlobWriter

type BlobWriter struct {
	// contains filtered or unexported fields
}

BlobWriter accumulates weight blobs and builds a multi-weight MIL Blob Storage v2 binary.

The format consists of a 64-byte file header, followed by 64-byte per-blob metadata entries, then 64-byte-aligned data segments.

func NewBlobWriter

func NewBlobWriter() *BlobWriter

NewBlobWriter creates a new BlobWriter.

func (*BlobWriter) AddFloat16

func (w *BlobWriter) AddFloat16(data []float32) int

AddFloat16 converts float32 data to fp16 and appends it as a blob entry. Returns the blob index. Use Offset after all blobs are added to get the data offset.

func (*BlobWriter) AddFloat32

func (w *BlobWriter) AddFloat32(data []float32) int

AddFloat32 appends float32 data as a blob entry. Returns the blob index. Use Offset after all blobs are added to get the data offset.

func (*BlobWriter) AddRaw

func (w *BlobWriter) AddRaw(dtype BlobDataType, data []byte) int

AddRaw appends raw byte data as a blob entry. Returns the blob index. Use Offset after all blobs are added to get the data offset.

func (*BlobWriter) Build

func (w *BlobWriter) Build() ([]byte, error)

Build produces the complete binary blob.

func (*BlobWriter) Count

func (w *BlobWriter) Count() int

Count returns the number of blobs added.

func (*BlobWriter) Offset

func (w *BlobWriter) Offset(i int) uint64

Offset returns the byte offset where blob i's data starts in the built output. This must be called after all blobs have been added, as the offset depends on the total number of blobs (which determines the metadata section size).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL