attention

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 4, 2025 License: Apache-2.0 Imports: 10 Imported by: 1

Documentation

Overview

Package attention provides attention mechanisms for neural networks.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type GroupedQueryAttention

type GroupedQueryAttention[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

GroupedQueryAttention implements the Grouped Query Attention mechanism.

func NewGroupedQueryAttention

func NewGroupedQueryAttention[T tensor.Numeric](engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim, numQueryHeads, numKeyValueHeads int) (*GroupedQueryAttention[T], error)

NewGroupedQueryAttention creates a new GroupedQueryAttention layer. modelDim: The dimension of the input and output of the block (d_model). numQueryHeads: The number of query heads. numKeyValueHeads: The number of key/value heads.

func (*GroupedQueryAttention[T]) Backward

func (gqa *GroupedQueryAttention[T]) Backward(ctx context.Context, dOut *tensor.Tensor[T], inputs ...*tensor.Tensor[T]) ([]*tensor.Tensor[T], error)

Backward computes the gradients for GroupedQueryAttention.

func (*GroupedQueryAttention[T]) Forward

func (gqa *GroupedQueryAttention[T]) Forward(ctx context.Context, inputs ...*tensor.Tensor[T]) (*tensor.Tensor[T], error)

Forward computes the grouped query attention.

func (*GroupedQueryAttention[T]) OutputShape

func (gqa *GroupedQueryAttention[T]) OutputShape(inputShapes ...[]int) ([]int, error)

OutputShape returns the output shape of the GroupedQueryAttention.

func (*GroupedQueryAttention[T]) Parameters

func (gqa *GroupedQueryAttention[T]) Parameters() []graph.Parameter[T]

Parameters returns the parameters of the GroupedQueryAttention layer.

type ScaledDotProductAttention

type ScaledDotProductAttention[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

ScaledDotProductAttention implements the scaled dot-product attention mechanism.

func NewScaledDotProductAttention

func NewScaledDotProductAttention[T tensor.Numeric](engine compute.Engine[T], headDim int) *ScaledDotProductAttention[T]

NewScaledDotProductAttention creates a new ScaledDotProductAttention layer.

func (*ScaledDotProductAttention[T]) Backward

func (sdpa *ScaledDotProductAttention[T]) Backward(_ context.Context, _ *tensor.Tensor[T], _, _, _ *tensor.Tensor[T]) ([]*tensor.Tensor[T], error)

Backward computes the gradients for ScaledDotProductAttention. dOut is the gradient from the subsequent layer.

func (*ScaledDotProductAttention[T]) Forward

func (sdpa *ScaledDotProductAttention[T]) Forward(ctx context.Context, q, k, v *tensor.Tensor[T]) (*tensor.Tensor[T], error)

Forward computes the scaled dot-product attention. Q, K, V are expected to be 3D tensors (batch_size, seq_len, head_dim).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL