Documentation
¶
Index ¶
- Constants
- func AllStopwords(language string, matchLocations map[string][][]int) bool
- func CalculateDocumentFrequency(results []*common.FileJob) map[string]int
- func CalculateDocumentTermFrequency(results []*common.FileJob) map[string]int
- func ClassifyMatchLocations(content []byte, matchLocations map[string][][]int, language string) (declarations, usages map[string][][]int)
- func ComputeMatchHash(fj *common.FileJob) string
- func DeduplicateResults(results []*common.FileJob) []*common.FileJob
- func HasDeclarationPatterns(language string) bool
- func HasTestIntent(queryTerms []string) bool
- func IsDeclarationLine(trimmedLine []byte, language string) bool
- func IsStopword(language, word string) bool
- func IsTestFile(path string) bool
- func RankResults(rankerName string, corpusCount int, results []*common.FileJob, ...) []*common.FileJob
- func SupportedDeclarationLanguages() []string
- type DeclarationPattern
- type StructuralConfig
Constants ¶
const ( LocationBoostValue = 0.05 DefaultScoreValue = 0.01 BytesWordDivisor = 2 )
Base value used to determine how much location matches should be boosted by
const StopwordDampenFactor = 0.1
StopwordDampenFactor is the multiplier applied to stopword term scores. A value of 0.1 means stopwords contribute 10% of their original weight.
Variables ¶
This section is empty.
Functions ¶
func AllStopwords ¶
AllStopwords returns true only if every key in matchLocations is a stopword for the given language. Returns false for unknown languages or empty maps. This is used as a safeguard: when the entire query consists of stopwords, no dampening is applied (enabling keyword pattern searches).
func CalculateDocumentFrequency ¶
CalculateDocumentFrequency calculates the document frequency for all words across all documents, allowing us to know the number of documents for which a term appears. This is mostly used for TF-IDF calculation.
func CalculateDocumentTermFrequency ¶
CalculateDocumentTermFrequency calculates the document term frequency for all words across all documents, letting us know how many times a term appears across the corpus. This is mostly used for snippet extraction.
func ClassifyMatchLocations ¶
func ClassifyMatchLocations( content []byte, matchLocations map[string][][]int, language string, ) (declarations, usages map[string][][]int)
ClassifyMatchLocations classifies each match location as a declaration or usage based on the line it appears on. Returns two maps: declarations and usages, each containing the match locations that fall on declaration/usage lines respectively.
If the language has no declaration patterns, all matches are returned as usages (conservative: we can't identify declarations without patterns).
func ComputeMatchHash ¶
ComputeMatchHash returns a SHA-256 hex digest of the concatenated matched byte regions in fj, sorted by position. Returns "" if there are no match locations or no content.
func DeduplicateResults ¶
DeduplicateResults groups results by their MatchHash (computed if not already set), keeps the first result in each group (highest-scored, assuming the input is already sorted by score descending), and populates DuplicateCount and DuplicateLocations on the representative.
func HasDeclarationPatterns ¶
HasDeclarationPatterns returns true if the language has declaration patterns defined. Used to determine if filtering is possible.
func HasTestIntent ¶
HasTestIntent returns true if any of the query terms indicate the user is searching for test-related code.
func IsDeclarationLine ¶
IsDeclarationLine checks if a line of code is a declaration based on language-specific heuristics. The line should have leading whitespace already trimmed.
func IsStopword ¶
IsStopword returns true if word is a common syntactic keyword for the given language. The check is case-insensitive. Returns false for unknown languages or empty inputs.
func IsTestFile ¶
IsTestFile returns true if the file path looks like a test file based on common naming conventions across languages.
func RankResults ¶
func RankResults(rankerName string, corpusCount int, results []*common.FileJob, structuralCfg *StructuralConfig, gravityStrength float64, noiseSensitivity float64, testPenalty float64, testIntent bool) []*common.FileJob
RankResults takes in the search results and applies chained ranking over them to produce a score and then sort those results and return them. The rankerName parameter selects the algorithm: "simple", "bm25", "tfidf", "structural", or anything else for classic TF-IDF. structuralCfg is only used when rankerName is "structural" and may be nil otherwise.
func SupportedDeclarationLanguages ¶
func SupportedDeclarationLanguages() []string
SupportedDeclarationLanguages returns the list of languages that have declaration patterns defined.
Types ¶
type DeclarationPattern ¶
type DeclarationPattern struct {
Prefix []byte // bytes that the trimmed line must start with
}
DeclarationPattern represents a line-start pattern that indicates a declaration.
type StructuralConfig ¶
type StructuralConfig struct {
WeightCode float64
WeightComment float64
WeightString float64
OnlyCode bool
OnlyComments bool
OnlyStrings bool
}
StructuralConfig holds weights and filters for the structural ranker.
func DefaultStructuralConfig ¶
func DefaultStructuralConfig() StructuralConfig
DefaultStructuralConfig returns a StructuralConfig with sensible defaults.