Documentation
¶
Index ¶
Constants ¶
const ( SnipSideMax int = 10 // Defines the maximum bytes either side of the match we are willing to return // The below are used for adding boosts to match conditions of snippets to hopefully produce the best match PhraseHeavyBoost = 20 SpaceBoundBoost = 5 ExactMatchBoost = 5 // Below is used to control CPU burn time trying to find the most relevant snippet RelevanceCutoff = 10_000 )
Variables ¶
This section is empty.
Functions ¶
func AddPhraseMatchLocations ¶
AddPhraseMatchLocations case-insensitively searches for the full query string in file content and adds any hits to the MatchLocations map. This gives a natural boost to lines containing the exact phrase since the key is long. Only operates on multi-word queries (contains a space).
func SnippetModeForExtension ¶
SnippetModeForExtension returns the snippet mode appropriate for a given file extension (without the leading dot). Prose files get "snippet" (free-text), code files get "lines" (line-based with line numbers).
Types ¶
type LineResult ¶
type LineResult struct {
LineNumber int // 1-based line number
Content string // plain text of the line
Locs [][]int // match positions within the line [start, end]
Score float64
}
LineResult represents a single line from a search result, with match positions for highlighting.
func FindMatchingLines ¶
func FindMatchingLines(res *common.FileJob, surroundLines int) []LineResult
FindMatchingLines finds the most relevant matching lines in a file based on pre-computed match locations from the search pipeline, and returns them with surrounding context lines sorted by line number.
type Snippet ¶
type Snippet struct {
Content string
StartPos int
EndPos int
Score float64
LineStart int
LineEnd int
}
func ExtractRelevant ¶
func ExtractRelevant(res *common.FileJob, documentFrequencies map[string]int, relLength int) []Snippet
ExtractRelevant looks through the locations using a sliding window style algorithm where it "brute forces" the solution by iterating over every location we have and look for all matches that fall into the supplied length and ranking based on how many we have.
This algorithm ranks using document frequencies that are kept for TF/IDF ranking with various other checks.