Documentation
¶
Overview ¶
Package attention provides attention mechanisms for neural networks.
Index ¶
- type GroupedQueryAttention
- func (gqa *GroupedQueryAttention[T]) Backward(ctx context.Context, dOut *tensor.Tensor[T], inputs ...*tensor.Tensor[T]) ([]*tensor.Tensor[T], error)
- func (gqa *GroupedQueryAttention[T]) Forward(ctx context.Context, inputs ...*tensor.Tensor[T]) (*tensor.Tensor[T], error)
- func (gqa *GroupedQueryAttention[T]) OutputShape(inputShapes ...[]int) ([]int, error)
- func (gqa *GroupedQueryAttention[T]) Parameters() []graph.Parameter[T]
- type ScaledDotProductAttention
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type GroupedQueryAttention ¶
GroupedQueryAttention implements the Grouped Query Attention mechanism.
func NewGroupedQueryAttention ¶
func NewGroupedQueryAttention[T tensor.Numeric](engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim, numQueryHeads, numKeyValueHeads int) (*GroupedQueryAttention[T], error)
NewGroupedQueryAttention creates a new GroupedQueryAttention layer. modelDim: The dimension of the input and output of the block (d_model). numQueryHeads: The number of query heads. numKeyValueHeads: The number of key/value heads.
func (*GroupedQueryAttention[T]) Backward ¶
func (gqa *GroupedQueryAttention[T]) Backward(ctx context.Context, dOut *tensor.Tensor[T], inputs ...*tensor.Tensor[T]) ([]*tensor.Tensor[T], error)
Backward computes the gradients for GroupedQueryAttention.
func (*GroupedQueryAttention[T]) Forward ¶
func (gqa *GroupedQueryAttention[T]) Forward(ctx context.Context, inputs ...*tensor.Tensor[T]) (*tensor.Tensor[T], error)
Forward computes the grouped query attention.
func (*GroupedQueryAttention[T]) OutputShape ¶
func (gqa *GroupedQueryAttention[T]) OutputShape(inputShapes ...[]int) ([]int, error)
OutputShape returns the output shape of the GroupedQueryAttention.
func (*GroupedQueryAttention[T]) Parameters ¶
func (gqa *GroupedQueryAttention[T]) Parameters() []graph.Parameter[T]
Parameters returns the parameters of the GroupedQueryAttention layer.
type ScaledDotProductAttention ¶
type ScaledDotProductAttention[T tensor.Numeric] struct { // contains filtered or unexported fields }
ScaledDotProductAttention implements the scaled dot-product attention mechanism.
func NewScaledDotProductAttention ¶
func NewScaledDotProductAttention[T tensor.Numeric](engine compute.Engine[T], headDim int) *ScaledDotProductAttention[T]
NewScaledDotProductAttention creates a new ScaledDotProductAttention layer.
func (*ScaledDotProductAttention[T]) Backward ¶
func (sdpa *ScaledDotProductAttention[T]) Backward(_ context.Context, _ *tensor.Tensor[T], _, _, _ *tensor.Tensor[T]) ([]*tensor.Tensor[T], error)
Backward computes the gradients for ScaledDotProductAttention. dOut is the gradient from the subsequent layer.