ranker

package
v3.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

View Source
const (
	LocationBoostValue = 0.05
	DefaultScoreValue  = 0.01
	BytesWordDivisor   = 2
)

Base value used to determine how much location matches should be boosted by

View Source
const StopwordDampenFactor = 0.1

StopwordDampenFactor is the multiplier applied to stopword term scores. A value of 0.1 means stopwords contribute 10% of their original weight.

Variables

This section is empty.

Functions

func AllStopwords

func AllStopwords(language string, matchLocations map[string][][]int) bool

AllStopwords returns true only if every key in matchLocations is a stopword for the given language. Returns false for unknown languages or empty maps. This is used as a safeguard: when the entire query consists of stopwords, no dampening is applied (enabling keyword pattern searches).

func CalculateDocumentFrequency

func CalculateDocumentFrequency(results []*common.FileJob) map[string]int

CalculateDocumentFrequency calculates the document frequency for all words across all documents, allowing us to know the number of documents for which a term appears. This is mostly used for TF-IDF calculation.

func CalculateDocumentTermFrequency

func CalculateDocumentTermFrequency(results []*common.FileJob) map[string]int

CalculateDocumentTermFrequency calculates the document term frequency for all words across all documents, letting us know how many times a term appears across the corpus. This is mostly used for snippet extraction.

func ClassifyMatchLocations

func ClassifyMatchLocations(
	content []byte,
	matchLocations map[string][][]int,
	language string,
) (declarations, usages map[string][][]int)

ClassifyMatchLocations classifies each match location as a declaration or usage based on the line it appears on. Returns two maps: declarations and usages, each containing the match locations that fall on declaration/usage lines respectively.

If the language has no declaration patterns, all matches are returned as usages (conservative: we can't identify declarations without patterns).

func ComputeMatchHash

func ComputeMatchHash(fj *common.FileJob) string

ComputeMatchHash returns a SHA-256 hex digest of the concatenated matched byte regions in fj, sorted by position. Returns "" if there are no match locations or no content.

func DeduplicateResults

func DeduplicateResults(results []*common.FileJob) []*common.FileJob

DeduplicateResults groups results by their MatchHash (computed if not already set), keeps the first result in each group (highest-scored, assuming the input is already sorted by score descending), and populates DuplicateCount and DuplicateLocations on the representative.

func HasDeclarationPatterns

func HasDeclarationPatterns(language string) bool

HasDeclarationPatterns returns true if the language has declaration patterns defined. Used to determine if filtering is possible.

func HasTestIntent

func HasTestIntent(queryTerms []string) bool

HasTestIntent returns true if any of the query terms indicate the user is searching for test-related code.

func IsDeclarationLine

func IsDeclarationLine(trimmedLine []byte, language string) bool

IsDeclarationLine checks if a line of code is a declaration based on language-specific heuristics. The line should have leading whitespace already trimmed.

func IsStopword

func IsStopword(language, word string) bool

IsStopword returns true if word is a common syntactic keyword for the given language. The check is case-insensitive. Returns false for unknown languages or empty inputs.

func IsTestFile

func IsTestFile(path string) bool

IsTestFile returns true if the file path looks like a test file based on common naming conventions across languages.

func RankResults

func RankResults(rankerName string, corpusCount int, results []*common.FileJob, structuralCfg *StructuralConfig, gravityStrength float64, noiseSensitivity float64, testPenalty float64, testIntent bool) []*common.FileJob

RankResults takes in the search results and applies chained ranking over them to produce a score and then sort those results and return them. The rankerName parameter selects the algorithm: "simple", "bm25", "tfidf", "structural", or anything else for classic TF-IDF. structuralCfg is only used when rankerName is "structural" and may be nil otherwise.

func SupportedDeclarationLanguages

func SupportedDeclarationLanguages() []string

SupportedDeclarationLanguages returns the list of languages that have declaration patterns defined.

Types

type DeclarationPattern

type DeclarationPattern struct {
	Prefix []byte // bytes that the trimmed line must start with
}

DeclarationPattern represents a line-start pattern that indicates a declaration.

type StructuralConfig

type StructuralConfig struct {
	WeightCode    float64
	WeightComment float64
	WeightString  float64
	OnlyCode      bool
	OnlyComments  bool
	OnlyStrings   bool
}

StructuralConfig holds weights and filters for the structural ranker.

func DefaultStructuralConfig

func DefaultStructuralConfig() StructuralConfig

DefaultStructuralConfig returns a StructuralConfig with sensible defaults.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL