Documentation
¶
Overview ¶
Package mil generates MIL (Model Intermediate Language) programs and weight blobs for Apple Neural Engine compilation.
text := mil.GenConv(16, 16, 1) blob, _ := mil.BuildWeightBlob(weights, 16, 16)
Index ¶
- func BuildCausalMaskBlob(seq int) ([]byte, error)
- func BuildFP16Blob(data []float32) ([]byte, error)
- func BuildIdentityWeightBlob(channels int) ([]byte, error)
- func BuildRoPECosSinBlobs(seq, headDim int) ([]byte, []byte, error)
- func BuildRoPECosSinBlobsWithTheta(seq, headDim int, theta float64) ([]byte, []byte, error)
- func BuildTransposedWeightBlob(weights []float32, rows, cols int) ([]byte, error)
- func BuildVectorWeightBlob(weights []float32) ([]byte, error)
- func BuildWeightBlob(weights []float32, outCh, inCh int) ([]byte, error)
- func BuildWeightBlobV1(data []float32) ([]byte, error)
- func GenClassifierBackward(dim, vocab, seq int) string
- func GenClassifierForward(dim, vocab, seq int) string
- func GenConv(inCh, outCh, spatial int) string
- func GenConvDynamicFP16(inCh, outCh, spatial int) string
- func GenConvFP16(inCh, outCh, spatial int) string
- func GenConvFP16IO(inCh, outCh, spatial int) string
- func GenConvFP32(inCh, outCh, spatial int) string
- func GenDynamicMatmul(inCh, outCh, batch int) string
- func GenFFNBackward(dim, hidden, seq int) string
- func GenFFNBackwardReLU2(dim, hidden, seq int) string
- func GenFFNForward(dim, hidden, seq int) string
- func GenFFNForwardRMS(dim, hidden, seq int) string
- func GenFFNForwardRMSReLU2(dim, hidden, seq int) string
- func GenFFNForwardReLU2(dim, hidden, seq int) string
- func GenFFNForwardTaps(dim, hidden, seq int) string
- func GenFFNForwardTapsReLU2(dim, hidden, seq int) string
- func GenFinalRMSNorm(dim, seq int) string
- func GenFinalRMSNormDynamic(dim, seq int) string
- func GenGQAExpand(kvHeads, qHeads, headDim, seqLen int) string
- func GenIdentity(channels, spatial int) string
- func GenIdentityFP16IO(channels, spatial int) string
- func GenMatmul(inCh, outCh, spatial int) string
- func GenQKVBackward(dim, heads, seq int) string
- func GenQKVForwardRMS(dim, seq int) string
- func GenRMSNorm(channels, spatial int, eps float64) string
- func GenRMSNormBackward(dim, seq int) string
- func GenRMSNormBackwardDynamic(dim, seq int) string
- func GenReadState(name string, shape [4]int) string
- func GenSDPA(headDim, nHeads, seqLen int) string
- func GenSDPAApplyForward(dim, heads, seq int) string
- func GenSDPABackward1(dim, heads, seq int) string
- func GenSDPABackward2(dim, heads, seq int) string
- func GenSDPAForward(dim, heads, seq int) string
- func GenSDPAForwardTaps(dim, heads, seq int) string
- func GenScaleFP16IO(spatial int) string
- func GenSoftmaxVocab(vocab, seq int) string
- func GenUpdateState(name string, shape [4]int) string
- type BlobDataType
- type BlobWriter
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BuildCausalMaskBlob ¶ added in v0.3.0
BuildCausalMaskBlob builds the upper-triangular fp16 causal mask used by SDPA.
func BuildFP16Blob ¶ added in v0.3.0
BuildFP16Blob builds a generic fp16 MIL BLOBFILE payload from row-major data.
func BuildIdentityWeightBlob ¶
BuildIdentityWeightBlob builds weights for an identity convolution (I matrix).
func BuildRoPECosSinBlobs ¶ added in v0.3.0
BuildRoPECosSinBlobs builds fp16 cosine/sine tables for RoPE with the standard base frequency theta=10000. Each output has shape [1,1,seq,headDim], flattened row-major.
func BuildRoPECosSinBlobsWithTheta ¶ added in v0.3.2
BuildRoPECosSinBlobsWithTheta builds fp16 cosine/sine tables for RoPE with a configurable base frequency theta. Common values: 10000 (original), 100000 (nanochat), 500000 (Llama3). Each output has shape [1,1,seq,headDim], flattened row-major.
func BuildTransposedWeightBlob ¶ added in v0.3.0
BuildTransposedWeightBlob builds a baked-weight blob for the transpose of a row-major matrix. The input matrix is [rows, cols] row-major; the baked tensor is [cols, rows].
func BuildVectorWeightBlob ¶ added in v0.3.0
BuildVectorWeightBlob builds a single-baked-weight blob for a 1D fp16 tensor.
func BuildWeightBlob ¶
BuildWeightBlob constructs the binary weight blob for MIL compilation.
The blob layout matches the ANE's expected format:
- Bytes 0-63: File header (0x01 at offset 0, 0x02 at offset 4)
- Bytes 64-127: Chunk header (0xDEADBEEF magic, data size, data offset)
- Bytes 128+: FP16 weight data
weights must have exactly outCh*inCh elements (OIHW layout, H=W=1).
func BuildWeightBlobV1 ¶
BuildWeightBlobV1 constructs a binary weight blob for a flat 1D weight vector. Unlike BuildWeightBlob which reshapes to OIHW, this stores raw 1D data suitable for RMSNorm weight vectors and other non-convolution weights.
func GenClassifierBackward ¶ added in v0.3.0
GenClassifierBackward generates the classifier backward kernel. It multiplies baked transpose(embed) by dlogits to produce dx.
func GenClassifierForward ¶ added in v0.3.0
GenClassifierForward generates a classifier projection kernel. The embedding matrix is baked as a conv weight tensor with shape [vocab, dim, 1, 1].
func GenConv ¶
GenConv generates a MIL text for a 1×1 convolution kernel with fp16 internal computation. inCh and outCh are channel counts; spatial is the spatial dimension (1 for vectors).
func GenConvDynamicFP16 ¶ added in v0.3.0
GenConvDynamicFP16 generates an fp16 conv graph with runtime-provided weights.
func GenConvFP16 ¶ added in v0.3.0
GenConvFP16 generates a minimal fp16 conv graph matching the working training path.
func GenConvFP16IO ¶
GenConvFP16IO generates a MIL text for a 1×1 convolution with fp16 I/O (no casts).
func GenConvFP32 ¶
GenConvFP32 generates a MIL text for a 1×1 convolution with fp32 weights (no casting).
func GenDynamicMatmul ¶ added in v0.3.0
GenDynamicMatmul generates a weightless MIL graph for y = x*w.
The single input tensor packs activations and weights together as [1, inCh, 1, batch+outCh] fp32:
- spatial [0:batch] contains activations laid out as [inCh, batch]
- spatial [batch:batch+outCh] contains weights laid out as [inCh, outCh]
The output tensor is [1, outCh, 1, batch] fp32.
func GenFFNBackward ¶ added in v0.3.0
GenFFNBackward generates the backward half of the fused FFN block. Input layout is concat(dffn, h1, h3); output layout is concat(dx, dh1, dh3).
func GenFFNBackwardReLU2 ¶ added in v0.3.2
GenFFNBackwardReLU2 generates the backward half of the fused ReLU² FFN block. Input layout is concat(dffn, h1); output is concat(dx, dh1). The ReLU² derivative is 2*max(0, h1).
func GenFFNForward ¶ added in v0.3.0
GenFFNForward generates a fused FFN block with baked W1/W2/W3 weights. It computes W2(silu(W1(x)) * W3(x)).
func GenFFNForwardRMS ¶ added in v0.3.0
GenFFNForwardRMS generates the full FFN block with internal RMSNorm and the final residual-free output only.
func GenFFNForwardRMSReLU2 ¶ added in v0.3.2
GenFFNForwardRMSReLU2 generates the full FFN block with internal RMSNorm and ReLU² activation. It computes W2(relu(W1(rms_norm(x)))²).
func GenFFNForwardReLU2 ¶ added in v0.3.2
GenFFNForwardReLU2 generates a fused FFN block with ReLU² activation and baked W1/W2 weights. It computes W2(relu(W1(x))²). Unlike the gated SiLU variant, there is no W3 (only 2 weight matrices).
func GenFFNForwardTaps ¶ added in v0.3.0
GenFFNForwardTaps generates a fused FFN block that also returns intermediates. The output layout is concat(out, h1, h3) along the channel dimension.
func GenFFNForwardTapsReLU2 ¶ added in v0.3.2
GenFFNForwardTapsReLU2 generates a fused ReLU² FFN block that also returns the h1 intermediate. Output layout is concat(out, h1) along the channel dimension.
func GenFinalRMSNorm ¶ added in v0.3.0
GenFinalRMSNorm generates a final-layer RMSNorm kernel with baked weights.
func GenFinalRMSNormDynamic ¶ added in v0.3.0
GenFinalRMSNormDynamic generates a final-layer RMSNorm kernel with runtime-provided weights.
func GenGQAExpand ¶
GenGQAExpand generates a MIL text for expanding KV heads for grouped-query attention. It tiles the KV tensor along the head dimension by a factor of qHeads/kvHeads.
func GenIdentity ¶
GenIdentity generates a MIL text for an identity operation (1×1 conv with identity weights).
func GenIdentityFP16IO ¶
GenIdentityFP16IO generates a MIL text for an identity operation with fp16 I/O (no fp32 casts, the ANE reads and writes fp16 directly).
func GenMatmul ¶
GenMatmul generates a MIL text for a matrix multiplication as a 1×1 convolution. This is equivalent to y = x @ W^T for [batch, inCh] -> [batch, outCh].
func GenQKVBackward ¶ added in v0.3.0
GenQKVBackward generates the fused QKV backward kernel. Input layout is concat(dq, dk, dv); output layout is dx.
func GenQKVForwardRMS ¶ added in v0.3.0
GenQKVForwardRMS generates the RMSNorm plus QKV projection block. Output layout is concat(q, k, v) along the channel dimension.
func GenRMSNorm ¶
GenRMSNorm generates a MIL text for the overflow-safe 11-op RMSNorm decomposition.
The 11-op sequence prevents fp16 overflow (values >256 cause CPU fallback):
abs → reduce_max → maximum(1e-6) → real_div → square → reduce_mean → add(eps) → sqrt → mul(safe_max) → real_div → mul(weight)
The program takes a single fp16 tensor input [1, channels, 1, spatial] and produces the same shape output. The weight vector is loaded from a BLOBFILE.
func GenRMSNormBackward ¶ added in v0.3.0
GenRMSNormBackward generates the dx half of RMSNorm backward with baked weights. The input is concat(dy, x) along the channel dimension; dw remains a cheap CPU reduction.
func GenRMSNormBackwardDynamic ¶ added in v0.3.0
GenRMSNormBackwardDynamic generates the dx half of RMSNorm backward with runtime-provided weights. The input is concat(dy, x) along the channel dimension; dw remains a cheap CPU reduction.
func GenReadState ¶
GenReadState generates a MIL text for reading a named state buffer. This is used for iOS 18+ stateful inference (e.g., KV cache on ANE).
func GenSDPA ¶
GenSDPA generates a MIL text for scaled dot-product attention. Inputs: Q, K, V of shape [1, nHeads, seqLen, headDim]. Scale is 1/sqrt(headDim).
func GenSDPAApplyForward ¶ added in v0.3.0
GenSDPAApplyForward generates the attention application block. Input0 is x and input1 is concat(q, k, v). Output layout is concat(x2, attn).
func GenSDPABackward1 ¶ added in v0.3.0
GenSDPABackward1 generates the first SDPA backward kernel plus Wo^T. Input layout is concat(q, k, v, dx2); output layout is concat(dv, probs, dp).
func GenSDPABackward2 ¶ added in v0.3.0
GenSDPABackward2 generates the second SDPA backward kernel. Input layout is concat(probs, dp, q, k); output layout is concat(dq, dk).
func GenSDPAForward ¶ added in v0.3.0
GenSDPAForward generates the fused attention forward block and returns x2 only.
func GenSDPAForwardTaps ¶ added in v0.3.0
GenSDPAForwardTaps generates the fused attention forward block with taps. Output layout is concat(x2, q, k, v, attn) along the channel dimension.
func GenScaleFP16IO ¶
GenScaleFP16IO generates a MIL text for a simple multiplication (1 channel, S spatial). Each spatial element is multiplied by the scalar weight.
func GenSoftmaxVocab ¶ added in v0.3.0
GenSoftmaxVocab generates a softmax over the channel dimension.
func GenUpdateState ¶
GenUpdateState generates a MIL text for updating a named state buffer. This emits the coreml_update_state op for iOS 18+ stateful inference.
Types ¶
type BlobDataType ¶
type BlobDataType uint32
BlobDataType identifies the element type in a weight blob entry.
const ( BlobFloat16 BlobDataType = 1 BlobFloat32 BlobDataType = 2 BlobUInt8 BlobDataType = 3 BlobInt8 BlobDataType = 8 )
type BlobWriter ¶
type BlobWriter struct {
// contains filtered or unexported fields
}
BlobWriter accumulates weight blobs and builds a multi-weight MIL Blob Storage v2 binary.
The format consists of a 64-byte file header, followed by 64-byte per-blob metadata entries, then 64-byte-aligned data segments.
func (*BlobWriter) AddFloat16 ¶
func (w *BlobWriter) AddFloat16(data []float32) int
AddFloat16 converts float32 data to fp16 and appends it as a blob entry. Returns the blob index. Use Offset after all blobs are added to get the data offset.
func (*BlobWriter) AddFloat32 ¶
func (w *BlobWriter) AddFloat32(data []float32) int
AddFloat32 appends float32 data as a blob entry. Returns the blob index. Use Offset after all blobs are added to get the data offset.
func (*BlobWriter) AddRaw ¶
func (w *BlobWriter) AddRaw(dtype BlobDataType, data []byte) int
AddRaw appends raw byte data as a blob entry. Returns the blob index. Use Offset after all blobs are added to get the data offset.
func (*BlobWriter) Build ¶
func (w *BlobWriter) Build() ([]byte, error)
Build produces the complete binary blob.
func (*BlobWriter) Count ¶
func (w *BlobWriter) Count() int
Count returns the number of blobs added.
func (*BlobWriter) Offset ¶
func (w *BlobWriter) Offset(i int) uint64
Offset returns the byte offset where blob i's data starts in the built output. This must be called after all blobs have been added, as the offset depends on the total number of blobs (which determines the metadata section size).