mil

package

v0.3.5 Latest Latest Go to latest Published: Mar 16, 2026 License: MIT Imports: 4 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tmc/apple

Links

Open Source Insights

Documentation ¶

Rendered for darwin/amd64

Overview ¶

Package mil generates MIL (Model Intermediate Language) programs and weight blobs for Apple Neural Engine compilation.

text := mil.GenConv(16, 16, 1)
blob, _ := mil.BuildWeightBlob(weights, 16, 16)

Index ¶

func BuildCausalMaskBlob(seq int) ([]byte, error)
func BuildFP16Blob(data []float32) ([]byte, error)
func BuildIdentityWeightBlob(channels int) ([]byte, error)
func BuildRoPECosSinBlobs(seq, headDim int) ([]byte, []byte, error)
func BuildRoPECosSinBlobsWithTheta(seq, headDim int, theta float64) ([]byte, []byte, error)
func BuildTransposedWeightBlob(weights []float32, rows, cols int) ([]byte, error)
func BuildVectorWeightBlob(weights []float32) ([]byte, error)
func BuildWeightBlob(weights []float32, outCh, inCh int) ([]byte, error)
func BuildWeightBlobV1(data []float32) ([]byte, error)
func GenClassifierBackward(dim, vocab, seq int) string
func GenClassifierForward(dim, vocab, seq int) string
func GenConv(inCh, outCh, spatial int) string
func GenConvDynamicFP16(inCh, outCh, spatial int) string
func GenConvFP16(inCh, outCh, spatial int) string
func GenConvFP16IO(inCh, outCh, spatial int) string
func GenConvFP32(inCh, outCh, spatial int) string
func GenDynamicMatmul(inCh, outCh, batch int) string
func GenFFNBackward(dim, hidden, seq int) string
func GenFFNBackwardReLU2(dim, hidden, seq int) string
func GenFFNForward(dim, hidden, seq int) string
func GenFFNForwardRMS(dim, hidden, seq int) string
func GenFFNForwardRMSReLU2(dim, hidden, seq int) string
func GenFFNForwardReLU2(dim, hidden, seq int) string
func GenFFNForwardTaps(dim, hidden, seq int) string
func GenFFNForwardTapsReLU2(dim, hidden, seq int) string
func GenFinalRMSNorm(dim, seq int) string
func GenFinalRMSNormDynamic(dim, seq int) string
func GenGQAExpand(kvHeads, qHeads, headDim, seqLen int) string
func GenIdentity(channels, spatial int) string
func GenIdentityFP16IO(channels, spatial int) string
func GenMatmul(inCh, outCh, spatial int) string
func GenQKVBackward(dim, heads, seq int) string
func GenQKVForwardRMS(dim, seq int) string
func GenRMSNorm(channels, spatial int, eps float64) string
func GenRMSNormBackward(dim, seq int) string
func GenRMSNormBackwardDynamic(dim, seq int) string
func GenReadState(name string, shape [4]int) string
func GenSDPA(headDim, nHeads, seqLen int) string
func GenSDPAApplyForward(dim, heads, seq int) string
func GenSDPABackward1(dim, heads, seq int) string
func GenSDPABackward2(dim, heads, seq int) string
func GenSDPAForward(dim, heads, seq int) string
func GenSDPAForwardTaps(dim, heads, seq int) string
func GenScaleFP16IO(spatial int) string
func GenSoftmaxVocab(vocab, seq int) string
func GenUpdateState(name string, shape [4]int) string
type BlobDataType
type BlobWriter
- func NewBlobWriter() *BlobWriter
- func (w *BlobWriter) AddFloat16(data []float32) int
- func (w *BlobWriter) AddFloat32(data []float32) int
- func (w *BlobWriter) AddRaw(dtype BlobDataType, data []byte) int
- func (w *BlobWriter) Build() ([]byte, error)
- func (w *BlobWriter) Count() int
- func (w *BlobWriter) Offset(i int) uint64

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BuildCausalMaskBlob ¶ added in v0.3.0

func BuildCausalMaskBlob(seq int) ([]byte, error)

BuildCausalMaskBlob builds the upper-triangular fp16 causal mask used by SDPA.

func BuildFP16Blob ¶ added in v0.3.0

func BuildFP16Blob(data []float32) ([]byte, error)

BuildFP16Blob builds a generic fp16 MIL BLOBFILE payload from row-major data.

func BuildIdentityWeightBlob ¶

func BuildIdentityWeightBlob(channels int) ([]byte, error)

BuildIdentityWeightBlob builds weights for an identity convolution (I matrix).

func BuildRoPECosSinBlobs ¶ added in v0.3.0

func BuildRoPECosSinBlobs(seq, headDim int) ([]byte, []byte, error)

BuildRoPECosSinBlobs builds fp16 cosine/sine tables for RoPE with the standard base frequency theta=10000. Each output has shape [1,1,seq,headDim], flattened row-major.

func BuildRoPECosSinBlobsWithTheta ¶ added in v0.3.2

func BuildRoPECosSinBlobsWithTheta(seq, headDim int, theta float64) ([]byte, []byte, error)

BuildRoPECosSinBlobsWithTheta builds fp16 cosine/sine tables for RoPE with a configurable base frequency theta. Common values: 10000 (original), 100000 (nanochat), 500000 (Llama3). Each output has shape [1,1,seq,headDim], flattened row-major.

func BuildTransposedWeightBlob ¶ added in v0.3.0

func BuildTransposedWeightBlob(weights []float32, rows, cols int) ([]byte, error)

BuildTransposedWeightBlob builds a baked-weight blob for the transpose of a row-major matrix. The input matrix is [rows, cols] row-major; the baked tensor is [cols, rows].

func BuildVectorWeightBlob ¶ added in v0.3.0

func BuildVectorWeightBlob(weights []float32) ([]byte, error)

BuildVectorWeightBlob builds a single-baked-weight blob for a 1D fp16 tensor.

func BuildWeightBlob ¶

func BuildWeightBlob(weights []float32, outCh, inCh int) ([]byte, error)

BuildWeightBlob constructs the binary weight blob for MIL compilation.

The blob layout matches the ANE's expected format:

Bytes 0-63: File header (0x01 at offset 0, 0x02 at offset 4)
Bytes 64-127: Chunk header (0xDEADBEEF magic, data size, data offset)
Bytes 128+: FP16 weight data

weights must have exactly outCh*inCh elements (OIHW layout, H=W=1).

func BuildWeightBlobV1 ¶

func BuildWeightBlobV1(data []float32) ([]byte, error)

BuildWeightBlobV1 constructs a binary weight blob for a flat 1D weight vector. Unlike BuildWeightBlob which reshapes to OIHW, this stores raw 1D data suitable for RMSNorm weight vectors and other non-convolution weights.

func GenClassifierBackward ¶ added in v0.3.0

func GenClassifierBackward(dim, vocab, seq int) string

GenClassifierBackward generates the classifier backward kernel. It multiplies baked transpose(embed) by dlogits to produce dx.

func GenClassifierForward ¶ added in v0.3.0

func GenClassifierForward(dim, vocab, seq int) string

GenClassifierForward generates a classifier projection kernel. The embedding matrix is baked as a conv weight tensor with shape [vocab, dim, 1, 1].

func GenConv ¶

func GenConv(inCh, outCh, spatial int) string

GenConv generates a MIL text for a 1×1 convolution kernel with fp16 internal computation. inCh and outCh are channel counts; spatial is the spatial dimension (1 for vectors).

func GenConvDynamicFP16 ¶ added in v0.3.0

func GenConvDynamicFP16(inCh, outCh, spatial int) string

GenConvDynamicFP16 generates an fp16 conv graph with runtime-provided weights.

func GenConvFP16 ¶ added in v0.3.0

func GenConvFP16(inCh, outCh, spatial int) string

GenConvFP16 generates a minimal fp16 conv graph matching the working training path.

func GenConvFP16IO ¶

func GenConvFP16IO(inCh, outCh, spatial int) string

GenConvFP16IO generates a MIL text for a 1×1 convolution with fp16 I/O (no casts).

func GenConvFP32 ¶

func GenConvFP32(inCh, outCh, spatial int) string

GenConvFP32 generates a MIL text for a 1×1 convolution with fp32 weights (no casting).

func GenDynamicMatmul ¶ added in v0.3.0

func GenDynamicMatmul(inCh, outCh, batch int) string

GenDynamicMatmul generates a weightless MIL graph for y = x*w.

The single input tensor packs activations and weights together as [1, inCh, 1, batch+outCh] fp32:

spatial [0:batch] contains activations laid out as [inCh, batch]
spatial [batch:batch+outCh] contains weights laid out as [inCh, outCh]

The output tensor is [1, outCh, 1, batch] fp32.

func GenFFNBackward ¶ added in v0.3.0

func GenFFNBackward(dim, hidden, seq int) string

GenFFNBackward generates the backward half of the fused FFN block. Input layout is concat(dffn, h1, h3); output layout is concat(dx, dh1, dh3).

func GenFFNBackwardReLU2 ¶ added in v0.3.2

func GenFFNBackwardReLU2(dim, hidden, seq int) string

GenFFNBackwardReLU2 generates the backward half of the fused ReLU² FFN block. Input layout is concat(dffn, h1); output is concat(dx, dh1). The ReLU² derivative is 2*max(0, h1).

func GenFFNForward ¶ added in v0.3.0

func GenFFNForward(dim, hidden, seq int) string

GenFFNForward generates a fused FFN block with baked W1/W2/W3 weights. It computes W2(silu(W1(x)) * W3(x)).

func GenFFNForwardRMS ¶ added in v0.3.0

func GenFFNForwardRMS(dim, hidden, seq int) string

GenFFNForwardRMS generates the full FFN block with internal RMSNorm and the final residual-free output only.

func GenFFNForwardRMSReLU2 ¶ added in v0.3.2

func GenFFNForwardRMSReLU2(dim, hidden, seq int) string

GenFFNForwardRMSReLU2 generates the full FFN block with internal RMSNorm and ReLU² activation. It computes W2(relu(W1(rms_norm(x)))²).

func GenFFNForwardReLU2 ¶ added in v0.3.2

func GenFFNForwardReLU2(dim, hidden, seq int) string

GenFFNForwardReLU2 generates a fused FFN block with ReLU² activation and baked W1/W2 weights. It computes W2(relu(W1(x))²). Unlike the gated SiLU variant, there is no W3 (only 2 weight matrices).

func GenFFNForwardTaps ¶ added in v0.3.0

func GenFFNForwardTaps(dim, hidden, seq int) string

GenFFNForwardTaps generates a fused FFN block that also returns intermediates. The output layout is concat(out, h1, h3) along the channel dimension.

func GenFFNForwardTapsReLU2 ¶ added in v0.3.2

func GenFFNForwardTapsReLU2(dim, hidden, seq int) string

GenFFNForwardTapsReLU2 generates a fused ReLU² FFN block that also returns the h1 intermediate. Output layout is concat(out, h1) along the channel dimension.

func GenFinalRMSNorm ¶ added in v0.3.0

func GenFinalRMSNorm(dim, seq int) string

GenFinalRMSNorm generates a final-layer RMSNorm kernel with baked weights.

func GenFinalRMSNormDynamic ¶ added in v0.3.0

func GenFinalRMSNormDynamic(dim, seq int) string

GenFinalRMSNormDynamic generates a final-layer RMSNorm kernel with runtime-provided weights.

func GenGQAExpand ¶

func GenGQAExpand(kvHeads, qHeads, headDim, seqLen int) string

GenGQAExpand generates a MIL text for expanding KV heads for grouped-query attention. It tiles the KV tensor along the head dimension by a factor of qHeads/kvHeads.

func GenIdentity ¶

func GenIdentity(channels, spatial int) string

GenIdentity generates a MIL text for an identity operation (1×1 conv with identity weights).

func GenIdentityFP16IO ¶

func GenIdentityFP16IO(channels, spatial int) string

GenIdentityFP16IO generates a MIL text for an identity operation with fp16 I/O (no fp32 casts, the ANE reads and writes fp16 directly).

func GenMatmul ¶

func GenMatmul(inCh, outCh, spatial int) string

GenMatmul generates a MIL text for a matrix multiplication as a 1×1 convolution. This is equivalent to y = x @ W^T for [batch, inCh] -> [batch, outCh].

func GenQKVBackward ¶ added in v0.3.0

func GenQKVBackward(dim, heads, seq int) string

GenQKVBackward generates the fused QKV backward kernel. Input layout is concat(dq, dk, dv); output layout is dx.

func GenQKVForwardRMS ¶ added in v0.3.0

func GenQKVForwardRMS(dim, seq int) string

GenQKVForwardRMS generates the RMSNorm plus QKV projection block. Output layout is concat(q, k, v) along the channel dimension.

func GenRMSNorm ¶

func GenRMSNorm(channels, spatial int, eps float64) string

GenRMSNorm generates a MIL text for the overflow-safe 11-op RMSNorm decomposition.

The 11-op sequence prevents fp16 overflow (values >256 cause CPU fallback):

abs → reduce_max → maximum(1e-6) → real_div → square → reduce_mean →
add(eps) → sqrt → mul(safe_max) → real_div → mul(weight)

The program takes a single fp16 tensor input [1, channels, 1, spatial] and produces the same shape output. The weight vector is loaded from a BLOBFILE.

func GenRMSNormBackward ¶ added in v0.3.0

func GenRMSNormBackward(dim, seq int) string

GenRMSNormBackward generates the dx half of RMSNorm backward with baked weights. The input is concat(dy, x) along the channel dimension; dw remains a cheap CPU reduction.

func GenRMSNormBackwardDynamic ¶ added in v0.3.0

func GenRMSNormBackwardDynamic(dim, seq int) string

GenRMSNormBackwardDynamic generates the dx half of RMSNorm backward with runtime-provided weights. The input is concat(dy, x) along the channel dimension; dw remains a cheap CPU reduction.

func GenReadState ¶

func GenReadState(name string, shape [4]int) string

GenReadState generates a MIL text for reading a named state buffer. This is used for iOS 18+ stateful inference (e.g., KV cache on ANE).

func GenSDPA ¶

func GenSDPA(headDim, nHeads, seqLen int) string

GenSDPA generates a MIL text for scaled dot-product attention. Inputs: Q, K, V of shape [1, nHeads, seqLen, headDim]. Scale is 1/sqrt(headDim).

func GenSDPAApplyForward ¶ added in v0.3.0

func GenSDPAApplyForward(dim, heads, seq int) string

GenSDPAApplyForward generates the attention application block. Input0 is x and input1 is concat(q, k, v). Output layout is concat(x2, attn).

func GenSDPABackward1 ¶ added in v0.3.0

func GenSDPABackward1(dim, heads, seq int) string

GenSDPABackward1 generates the first SDPA backward kernel plus Wo^T. Input layout is concat(q, k, v, dx2); output layout is concat(dv, probs, dp).

func GenSDPABackward2 ¶ added in v0.3.0

func GenSDPABackward2(dim, heads, seq int) string

GenSDPABackward2 generates the second SDPA backward kernel. Input layout is concat(probs, dp, q, k); output layout is concat(dq, dk).

func GenSDPAForward ¶ added in v0.3.0

func GenSDPAForward(dim, heads, seq int) string

GenSDPAForward generates the fused attention forward block and returns x2 only.

func GenSDPAForwardTaps ¶ added in v0.3.0

func GenSDPAForwardTaps(dim, heads, seq int) string

GenSDPAForwardTaps generates the fused attention forward block with taps. Output layout is concat(x2, q, k, v, attn) along the channel dimension.

func GenScaleFP16IO ¶

func GenScaleFP16IO(spatial int) string

GenScaleFP16IO generates a MIL text for a simple multiplication (1 channel, S spatial). Each spatial element is multiplied by the scalar weight.

func GenSoftmaxVocab ¶ added in v0.3.0

func GenSoftmaxVocab(vocab, seq int) string

GenSoftmaxVocab generates a softmax over the channel dimension.

func GenUpdateState ¶

func GenUpdateState(name string, shape [4]int) string

GenUpdateState generates a MIL text for updating a named state buffer. This emits the coreml_update_state op for iOS 18+ stateful inference.

Types ¶

type BlobDataType ¶

type BlobDataType uint32

BlobDataType identifies the element type in a weight blob entry.

const (
	BlobFloat16 BlobDataType = 1
	BlobFloat32 BlobDataType = 2
	BlobUInt8   BlobDataType = 3
	BlobInt8    BlobDataType = 8
)

type BlobWriter ¶

type BlobWriter struct {
	// contains filtered or unexported fields
}

BlobWriter accumulates weight blobs and builds a multi-weight MIL Blob Storage v2 binary.

The format consists of a 64-byte file header, followed by 64-byte per-blob metadata entries, then 64-byte-aligned data segments.

func NewBlobWriter ¶

func NewBlobWriter() *BlobWriter

NewBlobWriter creates a new BlobWriter.

func (*BlobWriter) AddFloat16 ¶

func (w *BlobWriter) AddFloat16(data []float32) int

AddFloat16 converts float32 data to fp16 and appends it as a blob entry. Returns the blob index. Use Offset after all blobs are added to get the data offset.

func (*BlobWriter) AddFloat32 ¶

func (w *BlobWriter) AddFloat32(data []float32) int

AddFloat32 appends float32 data as a blob entry. Returns the blob index. Use Offset after all blobs are added to get the data offset.

func (*BlobWriter) AddRaw ¶

func (w *BlobWriter) AddRaw(dtype BlobDataType, data []byte) int

AddRaw appends raw byte data as a blob entry. Returns the blob index. Use Offset after all blobs are added to get the data offset.

func (*BlobWriter) Build ¶

func (w *BlobWriter) Build() ([]byte, error)

Build produces the complete binary blob.

func (*BlobWriter) Count ¶

func (w *BlobWriter) Count() int

Count returns the number of blobs added.

func (*BlobWriter) Offset ¶

func (w *BlobWriter) Offset(i int) uint64

Offset returns the byte offset where blob i's data starts in the built output. This must be called after all blobs have been added, as the offset depends on the total number of blobs (which determines the metadata section size).

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL