scoring

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 26, 2026 License: MIT Imports: 1 Imported by: 0

Documentation

Index

Constants

View Source
const DefaultBM25B = 0.75

DefaultBM25B is the default document length normalization parameter.

View Source
const DefaultBM25K1 = 1.2

DefaultBM25K1 is the default term frequency saturation parameter.

Variables

This section is empty.

Functions

This section is empty.

Types

type BM25Scorer

type BM25Scorer struct {
	K1 float64
	B  float64
}

BM25Scorer computes BM25 scores with configurable parameters.

func NewBM25Scorer

func NewBM25Scorer(k1, b float64) *BM25Scorer

NewBM25Scorer creates a BM25Scorer with the given parameters. If k1 or b is <= 0, the default values (k1=1.2, b=0.75) are used.

func (*BM25Scorer) Score

func (s *BM25Scorer) Score(queryTerms []string, docTermFreq map[string]int, docLen float64, avgDocLen float64, docCount int, docFreq map[string]int) float64

Score computes the BM25 score for a document given query terms. It computes IDF on the fly from docFreq and docCount.

Parameters:

  • queryTerms: tokenized query terms
  • docTermFreq: term frequency map for the document being scored
  • docLen: the document's length (total term count)
  • avgDocLen: average document length across the corpus
  • docCount: total number of documents in the corpus (N)
  • docFreq: number of documents containing each term (n(t))

Uses the standard BM25 formula:

sum over t in query of: IDF(t) * (tf * (k1+1)) / (tf + k1 * (1 - b + b * dl/avgdl))

with IDF(t) = log((N - n(t) + 0.5) / (n(t) + 0.5) + 1)

func (*BM25Scorer) ScoreWithIDF

func (s *BM25Scorer) ScoreWithIDF(queryTerms []string, docTermFreq map[string]int, idf map[string]float64, docLen float64, avgDocLen float64) float64

ScoreWithIDF is a convenience method for callers that already have precomputed IDF values and know the document length directly.

Parameters:

  • queryTerms: tokenized query terms
  • docTermFreq: term frequency map for the document
  • idf: precomputed IDF value for each term
  • docLen: the document's length (total term count)
  • avgDocLen: average document length across the corpus

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL