bert

package

v0.3.0-alpha.3 Latest Latest Go to latest Published: May 10, 2026 License: Apache-2.0 Imports: 24 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/gramaton-ai/gramaton

Links

Open Source Insights

Documentation ¶

Rendered for

Index ¶

Constants
func AddBias(out, bias []float32, rows, cols int)
func EnsureModel(ctx context.Context, repo, model string, onProgress func(string)) error
func GELU(x []float32)
func L2Normalize(x []float32)
func LayerNorm(x, weight, bias []float32, n, dim int, eps float64)
func MatMul(a, bT []float32, M, K, N int, out []float32)
func MatMulAdd(a, bT []float32, M, K, N int, bias, out []float32)
func ModelDir(model string) string
func Softmax(x []float32, rows, cols int)
func SoftmaxMasked(x []float32, rows, cols int, mask []int32)
func ZeroSlice(x []float32)
type AttentionWeights
type EmbeddingWeights
type EncoderLayer
type LayerNormWeights
type LinearWeights
type Model
- func LoadModel(st *SafeTensors, cfg ModelConfig) (*Model, error)
- func (m *Model) Forward(s *Scratch, tokenIDs, attentionMask []int32) []float32
type ModelConfig
- func ParseModelConfig(data []byte) (ModelConfig, error)
type Provider
- func New(cfg config.EmbeddingConfig) (*Provider, error)
- func (p *Provider) Close() error
- func (p *Provider) ContextWindow() int
- func (p *Provider) Embed(ctx context.Context, texts []string) ([][]float32, error)
- func (p *Provider) ModelID() string
type SafeTensors
- func OpenSafeTensors(path string) (*SafeTensors, error)
- func (st *SafeTensors) Close() error
- func (st *SafeTensors) GetFloat32(name string) ([]float32, []int, error)
- func (st *SafeTensors) Has(name string) bool
- func (st *SafeTensors) Names() []string
type Scratch
- func NewScratch(maxSeq int, cfg ModelConfig) *Scratch
type Tokenizer
- func NewTokenizerFromJSON(data []byte) (*Tokenizer, error)
- func NewTokenizerFromVocab(data []byte) (*Tokenizer, error)
- func (t *Tokenizer) Encode(text string) (ids, mask, types []int32)
- func (t *Tokenizer) MaxLen() int
- func (t *Tokenizer) SetMaxLen(n int)
- func (t *Tokenizer) VocabSize() int

Constants ¶

View Source

const DefaultMaxLen = 512

DefaultMaxLen is the default maximum sequence length for BERT models.

View Source

const DefaultModel = "bge-small-en-v1.5"

DefaultModel is the default BERT model shipped with Gramaton.

View Source

const DefaultModelRepo = "BAAI/bge-small-en-v1.5"

DefaultModelRepo is the HuggingFace repository for the default model.

Variables ¶

This section is empty.

Functions ¶

func AddBias ¶

func AddBias(out, bias []float32, rows, cols int)

AddBias adds a [cols] bias vector to each row of out [rows, cols] in-place.

func EnsureModel ¶

func EnsureModel(ctx context.Context, repo, model string, onProgress func(string)) error

EnsureModel checks if the model files exist locally. If not, downloads them from HuggingFace Hub. Always verifies every file's SHA256 against its sidecar before returning -- catches on-disk corruption between runs.

Integrity model:

Downloads verify Content-Length and write a SHA256 sidecar on success.
Subsequent loads recompute the hash and compare against the sidecar.
Mismatch quarantines the bad file (renamed to .suspect.<unix-ts>) and returns an error; restarting will re-download cleanly while preserving the suspect bytes for forensic analysis.
File present without sidecar (e.g., manually placed) bootstraps the sidecar with a warning log.

This is trust-on-first-use: the first download is whatever HF serves. Subsequent corruption, truncation, or tampering is caught.

func GELU ¶

func GELU(x []float32)

GELU applies the Gaussian Error Linear Unit activation in-place. Uses the tanh approximation: 0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))

func L2Normalize ¶

func L2Normalize(x []float32)

L2Normalize normalizes a vector in-place to unit length.

func LayerNorm ¶

func LayerNorm(x, weight, bias []float32, n, dim int, eps float64)

LayerNorm applies layer normalization in-place over the last dimension. x is [n, dim]. weight and bias are [dim]. eps is typically 1e-12 for BERT.

func MatMul ¶

func MatMul(a, bT []float32, M, K, N int, out []float32)

MatMul computes C = A * B^T where A is [M, K] and bT is [N, K] (transposed). Output is written to out [M, N] which must be pre-allocated and zeroed.

On amd64 with AVX2 + FMA3, dispatches the 4x4 tile body to an assembly kernel for ~4-6x speedup over pure Go on BERT-sized matrices. Falls back to the pure-Go tiled implementation for matrices too small to benefit, for remainder rows/columns, and for pre-Haswell/Rosetta hosts without AVX2.

func MatMulAdd ¶

func MatMulAdd(a, bT []float32, M, K, N int, bias, out []float32)

MatMulAdd computes out = A * B^T + bias, where bias is [N] and broadcast across rows. out must be pre-zeroed before the matmul.

func ModelDir ¶

func ModelDir(model string) string

ModelDir returns the cache directory for a model. Default: ~/.gramaton/models/<model>/

func Softmax ¶

func Softmax(x []float32, rows, cols int)

Softmax applies row-wise softmax in-place. x is [rows, cols]. Numerically stable: subtracts row max before exponentiation.

func SoftmaxMasked ¶

func SoftmaxMasked(x []float32, rows, cols int, mask []int32)

SoftmaxMasked applies row-wise softmax in-place with an attention mask. x is [rows, cols]. mask is [cols] where 0 means masked (set to -inf before softmax).

func ZeroSlice ¶

func ZeroSlice(x []float32)

ZeroSlice sets all elements to zero.

Types ¶

type AttentionWeights ¶

type AttentionWeights struct {
	Q LinearWeights // [HiddenSize, HiddenSize]
	K LinearWeights
	V LinearWeights
	O LinearWeights // output projection
}

AttentionWeights holds Q, K, V projection weights and the output projection for multi-head self-attention.

type EmbeddingWeights ¶

type EmbeddingWeights struct {
	Word      []float32 // [VocabSize, HiddenSize]
	Position  []float32 // [MaxPositionEmbeds, HiddenSize]
	TokenType []float32 // [2, HiddenSize]
	LNWeight  []float32 // [HiddenSize]
	LNBias    []float32 // [HiddenSize]
}

EmbeddingWeights holds the token, position, and segment embedding tables plus the post-embedding layer norm.

type EncoderLayer ¶

type EncoderLayer struct {
	Attn    AttentionWeights
	AttnLN  LayerNormWeights
	FFNUp   LinearWeights // HiddenSize -> IntermediateSize
	FFNDown LinearWeights // IntermediateSize -> HiddenSize
	FFNLN   LayerNormWeights
}

EncoderLayer holds weights for one transformer encoder layer.

type LayerNormWeights ¶

type LayerNormWeights struct {
	Weight []float32 // [HiddenSize]
	Bias   []float32 // [HiddenSize]
}

LayerNormWeights holds weight and bias for layer normalization.

type LinearWeights ¶

type LinearWeights struct {
	Weight []float32 // [Out, In] -- already transposed for MatMul
	Bias   []float32 // [Out]
}

LinearWeights holds weight and bias for a linear layer. Weight is stored as [OutFeatures, InFeatures] (transposed for MatMul).

type Model ¶

type Model struct {
	Config    ModelConfig
	Embedding EmbeddingWeights
	Layers    []EncoderLayer
}

Model holds the weights and configuration for BERT inference. Weights are backed by mmap'd safetensors data (zero-copy).

Read-only after LoadModel. Concurrent Forward calls are safe when each caller supplies its own Scratch.

func LoadModel ¶

func LoadModel(st *SafeTensors, cfg ModelConfig) (*Model, error)

LoadModel loads a BERT model from a safetensors file using the given config.

func (*Model) Forward ¶

func (m *Model) Forward(s *Scratch, tokenIDs, attentionMask []int32) []float32

Forward runs the BERT encoder and returns the CLS embedding (L2-normalized). tokenIDs and attentionMask must be the same length (<= MaxPositionEmbeds).

The caller supplies a Scratch sized for the model's MaxPositionEmbeds. Each goroutine calling Forward concurrently must use its own Scratch.

type ModelConfig ¶

type ModelConfig struct {
	HiddenSize        int     `json:"hidden_size"`
	NumAttentionHeads int     `json:"num_attention_heads"`
	IntermediateSize  int     `json:"intermediate_size"`
	NumHiddenLayers   int     `json:"num_hidden_layers"`
	MaxPositionEmbeds int     `json:"max_position_embeddings"`
	VocabSize         int     `json:"vocab_size"`
	LayerNormEps      float64 `json:"layer_norm_eps"`
}

ModelConfig holds the BERT model hyperparameters, typically loaded from config.json in the model directory.

func ParseModelConfig ¶

func ParseModelConfig(data []byte) (ModelConfig, error)

ParseModelConfig reads a HuggingFace config.json file.

type Provider ¶

type Provider struct {
	// contains filtered or unexported fields
}

Provider implements embed.Provider using a pure Go BERT inference engine. Default model is bge-small-en-v1.5 (384-dim, 12-layer BERT encoder).

Thread-safety (RWMutex pattern):

Embed takes RLock for the duration of each per-text Encode + Forward. Multiple goroutines can hold RLocks concurrently; the model is read-only after LoadModel and each Embed iteration uses its own Scratch from the pool, so concurrent Forward is safe.
Close takes the full Lock and blocks until every in-flight RLock holder releases. After Close returns, model/tokenizer/ scratchPool/st are all nil; Embed checks under RLock and returns "bert: provider closed" cleanly without segfault.

Critical: the RLock must wrap BOTH the nil-check AND Encode + Forward. Releasing RLock between them would let Close Munmap the safetensors region while Forward is mid-read of float32 slices that point into mmap'd memory.

scratchPool holds Scratch instances reused across Forward calls. Each Embed iteration acquires a Scratch from the pool, runs Forward, returns the Scratch to the pool. Concurrent Embed goroutines each get their own Scratch instance.

Memory bound: each Scratch is ~14MB at maxSeq=512, hidden=384, intermediate=1536, heads=12. The pool grows under contention and shrinks during idle (sync.Pool semantics; entries are GC-eligible when not referenced). Peak live Scratches per Provider is bounded by maxWorkers (default min(GOMAXPROCS, 8) = ~112MB) under inner-loop fanout, plus one per concurrent caller goroutine holding RLock.

func New ¶

func New(cfg config.EmbeddingConfig) (*Provider, error)

New creates a BERT embedding provider. Downloads the model from HuggingFace on first use if not cached locally.

func (*Provider) Close ¶

func (p *Provider) Close() error

Close releases the mmap'd safetensors file. Takes the full write Lock; blocks until every concurrent Embed holding RLock has released. Without this guard, a concurrent Forward could read float32 slices that point into the mmap'd region after Munmap, causing a segfault.

Callers must NOT call Embed after Close returns; the model, tokenizer, and scratchPool fields are zeroed to make subsequent misuse return "bert: provider closed" rather than silently corrupt.

func (*Provider) ContextWindow ¶

func (p *Provider) ContextWindow() int

ContextWindow returns the model's maximum sequence length in tokens.

func (*Provider) Embed ¶

func (p *Provider) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed generates embeddings for the given texts. Returns one vector per input text in the same order. Returns nil, nil for empty input.

Concurrency model:

Single text (most common path; called from chunking and search in tight loops): runs inline without spawning goroutines.
Multiple texts: bounded errgroup fanout. Each text's Encode + Forward runs in its own goroutine, holding RLock for the duration. Worker count bounded by Provider.maxWorkers (default min(GOMAXPROCS, 8)).

On any goroutine error (provider closed, ctx cancelled), the errgroup's context is cancelled, in-flight goroutines exit at their next ctx check, and Embed returns (nil, err).

func (*Provider) ModelID ¶

func (p *Provider) ModelID() string

ModelID returns the model identifier for embedding provenance tracking.

type SafeTensors ¶

type SafeTensors struct {
	// contains filtered or unexported fields
}

SafeTensors provides zero-copy access to tensors stored in the HuggingFace safetensors format. The file is mmap'd read-only; float32 tensor data is accessed directly without copying.

Format: [8-byte header_len (uint64 LE)] [JSON header] [tensor data]

func OpenSafeTensors ¶

func OpenSafeTensors(path string) (*SafeTensors, error)

OpenSafeTensors opens a safetensors file via mmap for zero-copy access.

func (*SafeTensors) Close ¶

func (st *SafeTensors) Close() error

Close unmaps the file and releases resources.

func (*SafeTensors) GetFloat32 ¶

func (st *SafeTensors) GetFloat32(name string) ([]float32, []int, error)

GetFloat32 returns a float32 slice backed by the mmap'd data for the named tensor. The returned slice is valid until Close is called. The tensor must have dtype "F32".

func (*SafeTensors) Has ¶

func (st *SafeTensors) Has(name string) bool

Has reports whether the named tensor exists.

func (*SafeTensors) Names ¶

func (st *SafeTensors) Names() []string

Names returns all tensor names (excluding __metadata__).

type Scratch ¶

type Scratch struct {
	// contains filtered or unexported fields
}

Scratch holds pre-allocated buffers used by Forward. Each Forward call writes every buffer location before reading (verified in Layer A audit), so a recycled Scratch is safe to reuse across calls without zeroing.

Each goroutine running Forward must supply its own Scratch. Concurrent reuse of the same Scratch instance corrupts output.

func NewScratch ¶

func NewScratch(maxSeq int, cfg ModelConfig) *Scratch

NewScratch allocates a Scratch sized for the given configuration. Sized for max sequence length so the same Scratch handles any input up to MaxPositionEmbeds tokens.

type Tokenizer ¶

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer implements BERT's WordPiece tokenization pipeline. Supports loading from HuggingFace tokenizer.json or plain vocab.txt.

func NewTokenizerFromJSON ¶

func NewTokenizerFromJSON(data []byte) (*Tokenizer, error)

NewTokenizerFromJSON parses a HuggingFace tokenizer.json file and returns a configured Tokenizer. This is the preferred loading method as it captures the model's exact normalizer and vocab settings.

func NewTokenizerFromVocab ¶

func NewTokenizerFromVocab(data []byte) (*Tokenizer, error)

NewTokenizerFromVocab parses a plain vocab.txt file (one token per line, indexed by line number). Uses default BERT normalizer settings.

func (*Tokenizer) Encode ¶

func (t *Tokenizer) Encode(text string) (ids, mask, types []int32)

Encode tokenizes a text string and returns input tensors for BERT. Returns token IDs, attention mask (1 for real tokens, 0 for padding), and token type IDs (all 0 for single-segment input). The output is truncated to maxLen and does NOT include padding -- the caller can pad if needed for batching.

func (*Tokenizer) MaxLen ¶

func (t *Tokenizer) MaxLen() int

MaxLen returns the maximum sequence length.

func (*Tokenizer) SetMaxLen ¶

func (t *Tokenizer) SetMaxLen(n int)

SetMaxLen overrides the tokenizer's maximum sequence length. Used by the provider to clamp tokenizer truncation to the model's MaxPositionEmbeds when tokenizer.json declares a larger value than the model can actually process. Without this clamp, model.Forward panics with a slice-bounds error on the scratch buffers.

func (*Tokenizer) VocabSize ¶

func (t *Tokenizer) VocabSize() int

VocabSize returns the number of tokens in the vocabulary.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL