dynamicmatmul

package
v0.4.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 21, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package dynamicmatmul provides compile-once ANE matmul kernels with runtime-provided weights.

It stages activations and weights into a single ANE input surface, then evaluates y = x*w without recompiling when w changes.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type EvalStats

type EvalStats struct {
	HWExecutionNS uint64
	Metrics       map[string]float64
}

EvalStats reports per-eval ANE timing.

type Executor

type Executor struct {
	// contains filtered or unexported fields
}

Executor evaluates row-major y = x*w with runtime-provided weights.

x has shape [batch, inDim], w has shape [inDim, outDim], and the result has shape [batch, outDim]. The executor compiles once for a fixed shape and rewrites the packed input surface on each evaluation.

func New

func New(batch, inDim, outDim int, opts Options) (*Executor, error)

New compiles a dynamic matmul kernel for the provided shape.

func (*Executor) Close

func (e *Executor) Close()

Close releases the compiled kernel.

func (*Executor) CopyOutputToInput

func (e *Executor) CopyOutputToInput(dst *model.Kernel, dstInput, dstChannel int) error

CopyOutputToInput copies the last evaluated output tensor into a destination kernel input without converting through Go-managed float buffers.

func (*Executor) Eval

func (e *Executor) Eval(x, w []float32) ([]float32, error)

Eval computes y = x*w and returns a new output slice.

func (*Executor) EvalCF

func (e *Executor) EvalCF(xCF []float32) (EvalStats, error)

EvalCF evaluates a channel-first input tensor against previously primed weights and leaves the output resident in the ANE output surfaces.

xCF has shape [inDim, batch].

func (*Executor) EvalCFHW

func (e *Executor) EvalCFHW(xCF []float32) (uint64, error)

EvalCFHW evaluates a channel-first input tensor against previously primed weights and returns only aggregate hardware execution time.

func (*Executor) EvalCFIOInto

func (e *Executor) EvalCFIOInto(dstCF, xCF []float32) (EvalStats, error)

EvalCFIOInto evaluates a channel-first input tensor into a channel-first output tensor using previously primed weights.

xCF has shape [inDim, batch] and dstCF has shape [outDim, batch].

func (*Executor) EvalCFIOIntoHW

func (e *Executor) EvalCFIOIntoHW(dstCF, xCF []float32) (uint64, error)

EvalCFIOIntoHW evaluates a channel-first input tensor into a channel-first output tensor using previously primed weights and returns only aggregate hardware execution time.

func (*Executor) EvalInto

func (e *Executor) EvalInto(dst, x, w []float32) (EvalStats, error)

EvalInto computes y = x*w into dst.

func (*Executor) EvalOneHotIOInto

func (e *Executor) EvalOneHotIOInto(dst []float32, xs []int) (EvalStats, error)

EvalOneHotIOInto computes y = x*w for one-hot activations encoded by xs and a previously primed IO-layout weight matrix.

xs holds at most batch token ids in [0, inDim). Position t selects the input row for batch element t. Remaining batch positions are treated as zero input.

func (*Executor) EvalOneHotIOIntoHW

func (e *Executor) EvalOneHotIOIntoHW(dst []float32, xs []int) (uint64, error)

EvalOneHotIOIntoHW computes y = x*w for one-hot activations encoded by xs and a previously primed IO-layout weight matrix, returning only aggregate hardware execution time.

func (*Executor) EvalWithStats

func (e *Executor) EvalWithStats(x, w []float32) ([]float32, EvalStats, error)

EvalWithStats computes y = x*w and returns a new output slice plus timing.

func (*Executor) PrimeWeightsIO

func (e *Executor) PrimeWeightsIO(wIO []float32) error

PrimeWeightsIO copies the full IO-layout weight matrix into the cached ANE input buffers. wIO must be laid out as [inDim, outDim].

func (*Executor) ReadOutputCF

func (e *Executor) ReadOutputCF(dstCF []float32) error

ReadOutputCF reads the last evaluated output tensor in channel-first order.

dstCF has shape [outDim, batch].

func (*Executor) UpdateWeightsIORows

func (e *Executor) UpdateWeightsIORows(wIO []float32, rows []int) error

UpdateWeightsIORows patches a subset of rows in the cached IO-layout weight buffers. rows contains input-channel row ids in [0, inDim).

type Options

type Options struct {
	QoS uint32

	// TileOut forces output-channel tiling when > 0.
	//
	// Each tile compiles a separate kernel with output width <= TileOut.
	// When TileOut == 0, New first tries a single full-width kernel and
	// falls back to tiling only if full-width compilation fails.
	TileOut int
}

Options configures executor creation.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL