bert

package
v0.3.0-alpha.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 10, 2026 License: Apache-2.0 Imports: 24 Imported by: 0

Documentation

Index

Constants

View Source
const DefaultMaxLen = 512

DefaultMaxLen is the default maximum sequence length for BERT models.

View Source
const DefaultModel = "bge-small-en-v1.5"

DefaultModel is the default BERT model shipped with Gramaton.

View Source
const DefaultModelRepo = "BAAI/bge-small-en-v1.5"

DefaultModelRepo is the HuggingFace repository for the default model.

Variables

This section is empty.

Functions

func AddBias

func AddBias(out, bias []float32, rows, cols int)

AddBias adds a [cols] bias vector to each row of out [rows, cols] in-place.

func EnsureModel

func EnsureModel(ctx context.Context, repo, model string, onProgress func(string)) error

EnsureModel checks if the model files exist locally. If not, downloads them from HuggingFace Hub. Always verifies every file's SHA256 against its sidecar before returning -- catches on-disk corruption between runs.

Integrity model:

  • Downloads verify Content-Length and write a SHA256 sidecar on success.
  • Subsequent loads recompute the hash and compare against the sidecar.
  • Mismatch quarantines the bad file (renamed to .suspect.<unix-ts>) and returns an error; restarting will re-download cleanly while preserving the suspect bytes for forensic analysis.
  • File present without sidecar (e.g., manually placed) bootstraps the sidecar with a warning log.

This is trust-on-first-use: the first download is whatever HF serves. Subsequent corruption, truncation, or tampering is caught.

func GELU

func GELU(x []float32)

GELU applies the Gaussian Error Linear Unit activation in-place. Uses the tanh approximation: 0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))

func L2Normalize

func L2Normalize(x []float32)

L2Normalize normalizes a vector in-place to unit length.

func LayerNorm

func LayerNorm(x, weight, bias []float32, n, dim int, eps float64)

LayerNorm applies layer normalization in-place over the last dimension. x is [n, dim]. weight and bias are [dim]. eps is typically 1e-12 for BERT.

func MatMul

func MatMul(a, bT []float32, M, K, N int, out []float32)

MatMul computes C = A * B^T where A is [M, K] and bT is [N, K] (transposed). Output is written to out [M, N] which must be pre-allocated and zeroed.

On amd64 with AVX2 + FMA3, dispatches the 4x4 tile body to an assembly kernel for ~4-6x speedup over pure Go on BERT-sized matrices. Falls back to the pure-Go tiled implementation for matrices too small to benefit, for remainder rows/columns, and for pre-Haswell/Rosetta hosts without AVX2.

func MatMulAdd

func MatMulAdd(a, bT []float32, M, K, N int, bias, out []float32)

MatMulAdd computes out = A * B^T + bias, where bias is [N] and broadcast across rows. out must be pre-zeroed before the matmul.

func ModelDir

func ModelDir(model string) string

ModelDir returns the cache directory for a model. Default: ~/.gramaton/models/<model>/

func Softmax

func Softmax(x []float32, rows, cols int)

Softmax applies row-wise softmax in-place. x is [rows, cols]. Numerically stable: subtracts row max before exponentiation.

func SoftmaxMasked

func SoftmaxMasked(x []float32, rows, cols int, mask []int32)

SoftmaxMasked applies row-wise softmax in-place with an attention mask. x is [rows, cols]. mask is [cols] where 0 means masked (set to -inf before softmax).

func ZeroSlice

func ZeroSlice(x []float32)

ZeroSlice sets all elements to zero.

Types

type AttentionWeights

type AttentionWeights struct {
	Q LinearWeights // [HiddenSize, HiddenSize]
	K LinearWeights
	V LinearWeights
	O LinearWeights // output projection
}

AttentionWeights holds Q, K, V projection weights and the output projection for multi-head self-attention.

type EmbeddingWeights

type EmbeddingWeights struct {
	Word      []float32 // [VocabSize, HiddenSize]
	Position  []float32 // [MaxPositionEmbeds, HiddenSize]
	TokenType []float32 // [2, HiddenSize]
	LNWeight  []float32 // [HiddenSize]
	LNBias    []float32 // [HiddenSize]
}

EmbeddingWeights holds the token, position, and segment embedding tables plus the post-embedding layer norm.

type EncoderLayer

type EncoderLayer struct {
	Attn    AttentionWeights
	AttnLN  LayerNormWeights
	FFNUp   LinearWeights // HiddenSize -> IntermediateSize
	FFNDown LinearWeights // IntermediateSize -> HiddenSize
	FFNLN   LayerNormWeights
}

EncoderLayer holds weights for one transformer encoder layer.

type LayerNormWeights

type LayerNormWeights struct {
	Weight []float32 // [HiddenSize]
	Bias   []float32 // [HiddenSize]
}

LayerNormWeights holds weight and bias for layer normalization.

type LinearWeights

type LinearWeights struct {
	Weight []float32 // [Out, In] -- already transposed for MatMul
	Bias   []float32 // [Out]
}

LinearWeights holds weight and bias for a linear layer. Weight is stored as [OutFeatures, InFeatures] (transposed for MatMul).

type Model

type Model struct {
	Config    ModelConfig
	Embedding EmbeddingWeights
	Layers    []EncoderLayer
}

Model holds the weights and configuration for BERT inference. Weights are backed by mmap'd safetensors data (zero-copy).

Read-only after LoadModel. Concurrent Forward calls are safe when each caller supplies its own Scratch.

func LoadModel

func LoadModel(st *SafeTensors, cfg ModelConfig) (*Model, error)

LoadModel loads a BERT model from a safetensors file using the given config.

func (*Model) Forward

func (m *Model) Forward(s *Scratch, tokenIDs, attentionMask []int32) []float32

Forward runs the BERT encoder and returns the CLS embedding (L2-normalized). tokenIDs and attentionMask must be the same length (<= MaxPositionEmbeds).

The caller supplies a Scratch sized for the model's MaxPositionEmbeds. Each goroutine calling Forward concurrently must use its own Scratch.

type ModelConfig

type ModelConfig struct {
	HiddenSize        int     `json:"hidden_size"`
	NumAttentionHeads int     `json:"num_attention_heads"`
	IntermediateSize  int     `json:"intermediate_size"`
	NumHiddenLayers   int     `json:"num_hidden_layers"`
	MaxPositionEmbeds int     `json:"max_position_embeddings"`
	VocabSize         int     `json:"vocab_size"`
	LayerNormEps      float64 `json:"layer_norm_eps"`
}

ModelConfig holds the BERT model hyperparameters, typically loaded from config.json in the model directory.

func ParseModelConfig

func ParseModelConfig(data []byte) (ModelConfig, error)

ParseModelConfig reads a HuggingFace config.json file.

type Provider

type Provider struct {
	// contains filtered or unexported fields
}

Provider implements embed.Provider using a pure Go BERT inference engine. Default model is bge-small-en-v1.5 (384-dim, 12-layer BERT encoder).

Thread-safety (RWMutex pattern):

  • Embed takes RLock for the duration of each per-text Encode + Forward. Multiple goroutines can hold RLocks concurrently; the model is read-only after LoadModel and each Embed iteration uses its own Scratch from the pool, so concurrent Forward is safe.
  • Close takes the full Lock and blocks until every in-flight RLock holder releases. After Close returns, model/tokenizer/ scratchPool/st are all nil; Embed checks under RLock and returns "bert: provider closed" cleanly without segfault.

Critical: the RLock must wrap BOTH the nil-check AND Encode + Forward. Releasing RLock between them would let Close Munmap the safetensors region while Forward is mid-read of float32 slices that point into mmap'd memory.

scratchPool holds Scratch instances reused across Forward calls. Each Embed iteration acquires a Scratch from the pool, runs Forward, returns the Scratch to the pool. Concurrent Embed goroutines each get their own Scratch instance.

Memory bound: each Scratch is ~14MB at maxSeq=512, hidden=384, intermediate=1536, heads=12. The pool grows under contention and shrinks during idle (sync.Pool semantics; entries are GC-eligible when not referenced). Peak live Scratches per Provider is bounded by maxWorkers (default min(GOMAXPROCS, 8) = ~112MB) under inner-loop fanout, plus one per concurrent caller goroutine holding RLock.

func New

func New(cfg config.EmbeddingConfig) (*Provider, error)

New creates a BERT embedding provider. Downloads the model from HuggingFace on first use if not cached locally.

func (*Provider) Close

func (p *Provider) Close() error

Close releases the mmap'd safetensors file. Takes the full write Lock; blocks until every concurrent Embed holding RLock has released. Without this guard, a concurrent Forward could read float32 slices that point into the mmap'd region after Munmap, causing a segfault.

Callers must NOT call Embed after Close returns; the model, tokenizer, and scratchPool fields are zeroed to make subsequent misuse return "bert: provider closed" rather than silently corrupt.

func (*Provider) ContextWindow

func (p *Provider) ContextWindow() int

ContextWindow returns the model's maximum sequence length in tokens.

func (*Provider) Embed

func (p *Provider) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed generates embeddings for the given texts. Returns one vector per input text in the same order. Returns nil, nil for empty input.

Concurrency model:

  • Single text (most common path; called from chunking and search in tight loops): runs inline without spawning goroutines.
  • Multiple texts: bounded errgroup fanout. Each text's Encode + Forward runs in its own goroutine, holding RLock for the duration. Worker count bounded by Provider.maxWorkers (default min(GOMAXPROCS, 8)).

On any goroutine error (provider closed, ctx cancelled), the errgroup's context is cancelled, in-flight goroutines exit at their next ctx check, and Embed returns (nil, err).

func (*Provider) ModelID

func (p *Provider) ModelID() string

ModelID returns the model identifier for embedding provenance tracking.

type SafeTensors

type SafeTensors struct {
	// contains filtered or unexported fields
}

SafeTensors provides zero-copy access to tensors stored in the HuggingFace safetensors format. The file is mmap'd read-only; float32 tensor data is accessed directly without copying.

Format: [8-byte header_len (uint64 LE)] [JSON header] [tensor data]

func OpenSafeTensors

func OpenSafeTensors(path string) (*SafeTensors, error)

OpenSafeTensors opens a safetensors file via mmap for zero-copy access.

func (*SafeTensors) Close

func (st *SafeTensors) Close() error

Close unmaps the file and releases resources.

func (*SafeTensors) GetFloat32

func (st *SafeTensors) GetFloat32(name string) ([]float32, []int, error)

GetFloat32 returns a float32 slice backed by the mmap'd data for the named tensor. The returned slice is valid until Close is called. The tensor must have dtype "F32".

func (*SafeTensors) Has

func (st *SafeTensors) Has(name string) bool

Has reports whether the named tensor exists.

func (*SafeTensors) Names

func (st *SafeTensors) Names() []string

Names returns all tensor names (excluding __metadata__).

type Scratch

type Scratch struct {
	// contains filtered or unexported fields
}

Scratch holds pre-allocated buffers used by Forward. Each Forward call writes every buffer location before reading (verified in Layer A audit), so a recycled Scratch is safe to reuse across calls without zeroing.

Each goroutine running Forward must supply its own Scratch. Concurrent reuse of the same Scratch instance corrupts output.

func NewScratch

func NewScratch(maxSeq int, cfg ModelConfig) *Scratch

NewScratch allocates a Scratch sized for the given configuration. Sized for max sequence length so the same Scratch handles any input up to MaxPositionEmbeds tokens.

type Tokenizer

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer implements BERT's WordPiece tokenization pipeline. Supports loading from HuggingFace tokenizer.json or plain vocab.txt.

func NewTokenizerFromJSON

func NewTokenizerFromJSON(data []byte) (*Tokenizer, error)

NewTokenizerFromJSON parses a HuggingFace tokenizer.json file and returns a configured Tokenizer. This is the preferred loading method as it captures the model's exact normalizer and vocab settings.

func NewTokenizerFromVocab

func NewTokenizerFromVocab(data []byte) (*Tokenizer, error)

NewTokenizerFromVocab parses a plain vocab.txt file (one token per line, indexed by line number). Uses default BERT normalizer settings.

func (*Tokenizer) Encode

func (t *Tokenizer) Encode(text string) (ids, mask, types []int32)

Encode tokenizes a text string and returns input tensors for BERT. Returns token IDs, attention mask (1 for real tokens, 0 for padding), and token type IDs (all 0 for single-segment input). The output is truncated to maxLen and does NOT include padding -- the caller can pad if needed for batching.

func (*Tokenizer) MaxLen

func (t *Tokenizer) MaxLen() int

MaxLen returns the maximum sequence length.

func (*Tokenizer) SetMaxLen

func (t *Tokenizer) SetMaxLen(n int)

SetMaxLen overrides the tokenizer's maximum sequence length. Used by the provider to clamp tokenizer truncation to the model's MaxPositionEmbeds when tokenizer.json declares a larger value than the model can actually process. Without this clamp, model.Forward panics with a slice-bounds error on the scratch buffers.

func (*Tokenizer) VocabSize

func (t *Tokenizer) VocabSize() int

VocabSize returns the number of tokens in the vocabulary.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL