dynamicmatmul

package

v0.4.4 Latest Latest Go to latest Published: Mar 21, 2026 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tmc/apple

Links

Open Source Insights

Documentation ¶

Rendered for

Overview ¶

Package dynamicmatmul provides compile-once ANE matmul kernels with runtime-provided weights.

It stages activations and weights into a single ANE input surface, then evaluates y = x*w without recompiling when w changes.

Index ¶

type EvalStats
type Executor
- func New(batch, inDim, outDim int, opts Options) (*Executor, error)
type Options

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Executor ¶

type Executor struct {
	// contains filtered or unexported fields
}

Executor evaluates row-major y = x*w with runtime-provided weights.

x has shape [batch, inDim], w has shape [inDim, outDim], and the result has shape [batch, outDim]. The executor compiles once for a fixed shape and rewrites the packed input surface on each evaluation.

func New ¶

func New(batch, inDim, outDim int, opts Options) (*Executor, error)

New compiles a dynamic matmul kernel for the provided shape.

func (*Executor) Close ¶

func (e *Executor) Close()

Close releases the compiled kernel.

func (*Executor) CopyOutputToInput ¶

func (e *Executor) CopyOutputToInput(dst *model.Kernel, dstInput, dstChannel int) error

CopyOutputToInput copies the last evaluated output tensor into a destination kernel input without converting through Go-managed float buffers.

func (*Executor) Eval ¶

func (e *Executor) Eval(x, w []float32) ([]float32, error)

Eval computes y = x*w and returns a new output slice.

func (*Executor) EvalCF ¶

func (e *Executor) EvalCF(xCF []float32) (EvalStats, error)

EvalCF evaluates a channel-first input tensor against previously primed weights and leaves the output resident in the ANE output surfaces.

xCF has shape [inDim, batch].

func (*Executor) EvalCFHW ¶

func (e *Executor) EvalCFHW(xCF []float32) (uint64, error)

EvalCFHW evaluates a channel-first input tensor against previously primed weights and returns only aggregate hardware execution time.

func (*Executor) EvalCFIOInto ¶

func (e *Executor) EvalCFIOInto(dstCF, xCF []float32) (EvalStats, error)

EvalCFIOInto evaluates a channel-first input tensor into a channel-first output tensor using previously primed weights.

xCF has shape [inDim, batch] and dstCF has shape [outDim, batch].

func (*Executor) EvalCFIOIntoHW ¶

func (e *Executor) EvalCFIOIntoHW(dstCF, xCF []float32) (uint64, error)

EvalCFIOIntoHW evaluates a channel-first input tensor into a channel-first output tensor using previously primed weights and returns only aggregate hardware execution time.

func (*Executor) EvalInto ¶

func (e *Executor) EvalInto(dst, x, w []float32) (EvalStats, error)

EvalInto computes y = x*w into dst.

func (*Executor) EvalOneHotIOInto ¶

func (e *Executor) EvalOneHotIOInto(dst []float32, xs []int) (EvalStats, error)

EvalOneHotIOInto computes y = x*w for one-hot activations encoded by xs and a previously primed IO-layout weight matrix.

xs holds at most batch token ids in [0, inDim). Position t selects the input row for batch element t. Remaining batch positions are treated as zero input.

func (*Executor) EvalOneHotIOIntoHW ¶

func (e *Executor) EvalOneHotIOIntoHW(dst []float32, xs []int) (uint64, error)

EvalOneHotIOIntoHW computes y = x*w for one-hot activations encoded by xs and a previously primed IO-layout weight matrix, returning only aggregate hardware execution time.

func (*Executor) EvalWithStats ¶

func (e *Executor) EvalWithStats(x, w []float32) ([]float32, EvalStats, error)

EvalWithStats computes y = x*w and returns a new output slice plus timing.

func (*Executor) PrimeWeightsIO ¶

func (e *Executor) PrimeWeightsIO(wIO []float32) error

PrimeWeightsIO copies the full IO-layout weight matrix into the cached ANE input buffers. wIO must be laid out as [inDim, outDim].

func (*Executor) ReadOutputCF ¶

func (e *Executor) ReadOutputCF(dstCF []float32) error

ReadOutputCF reads the last evaluated output tensor in channel-first order.

dstCF has shape [outDim, batch].

func (*Executor) UpdateWeightsIORows ¶

func (e *Executor) UpdateWeightsIORows(wIO []float32, rows []int) error

UpdateWeightsIORows patches a subset of rows in the cached IO-layout weight buffers. rows contains input-channel row ids in [0, inDim).

type Options ¶

type Options struct {
	QoS uint32

	// TileOut forces output-channel tiling when > 0.
	//
	// Each tile compiles a separate kernel with output width <= TileOut.
	// When TileOut == 0, New first tries a single full-width kernel and
	// falls back to tiling only if full-width compilation fails.
	TileOut int
}

Options configures executor creation.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

Types ¶

type EvalStats ¶

type Executor ¶

func New ¶

func (*Executor) Close ¶

func (*Executor) CopyOutputToInput ¶

func (*Executor) Eval ¶

func (*Executor) EvalCF ¶

func (*Executor) EvalCFHW ¶

func (*Executor) EvalCFIOInto ¶

func (*Executor) EvalCFIOIntoHW ¶

func (*Executor) EvalInto ¶

func (*Executor) EvalOneHotIOInto ¶

func (*Executor) EvalOneHotIOIntoHW ¶

func (*Executor) EvalWithStats ¶

func (*Executor) PrimeWeightsIO ¶

func (*Executor) ReadOutputCF ¶

func (*Executor) UpdateWeightsIORows ¶

type Options ¶

Source Files ¶