Documentation
¶
Overview ¶
Package utils provides shared, reusable algorithms. This file implements a generic BM25 search engine.
Usage:
type MyDoc struct { ID string; Body string }
corpus := []MyDoc{...}
engine := bm25.New(corpus, func(d MyDoc) string {
return d.ID + " " + d.Body
})
results := engine.Search("my query", 5)
Index ¶
- Constants
- func AudioFormat(path string) (string, error)
- func CalculateDefaultMaxContextRunes(contextWindow int) int
- func CreateHTTPClient(proxyURL string, timeout time.Duration) (*http.Client, error)
- func DerefStr(s *string, fallback string) string
- func DoRequestWithRetry(client *http.Client, req *http.Request) (*http.Response, error)
- func DownloadFile(urlStr, filename string, opts DownloadOptions) string
- func DownloadFileSimple(url, filename string) string
- func DownloadToFile(ctx context.Context, client *http.Client, req *http.Request, maxBytes int64) (string, error)
- func ExtractZipFile(zipPath string, targetDir string) error
- func FitToolFeedbackMessage(content string, maxLen int) string
- func FormatToolFeedbackMessage(toolName, explanation, argsPreview string) string
- func HtmlToMarkdown(htmlStr string) (string, error)
- func IsAudioFile(filename, contentType string) bool
- func MeasureContextRunes(messages []providers.Message) int
- func ResolveMaxContextRunes(configValue, contextWindow int) int
- func SanitizeFilename(filename string) string
- func SanitizeMessageContent(input string) string
- func SetDisableTruncation(enabled bool)
- func ToolCallExplanationDuplicatesContent(content string, toolCalls []providers.ToolCall) bool
- func Truncate(s string, maxLen int) string
- func TruncateContextSmart(messages []providers.Message, maxRunes int) []providers.Message
- func ValidateSkillIdentifier(identifier string) error
- func VisibleToolCallArgumentsPreview(tc providers.ToolCall, maxLen int) string
- func VisibleToolCallNameAndArguments(tc providers.ToolCall) (string, string)
- type BM25Engine
- type BM25Option
- type BM25Result
- type DownloadOptions
- type VisibleToolCall
- type VisibleToolCallExtraContent
- type VisibleToolCallFunction
Constants ¶
const ( // DefaultBM25K1 is the term-frequency saturation factor (typical range 1.2–2.0). // Higher values give more weight to repeated terms. DefaultBM25K1 = 1.2 // DefaultBM25B is the document-length normalization factor (0 = none, 1 = full). DefaultBM25B = 0.75 )
const ToolFeedbackContinuationHint = "Continuing the current task."
Variables ¶
This section is empty.
Functions ¶
func AudioFormat ¶ added in v0.2.4
func CalculateDefaultMaxContextRunes ¶ added in v0.2.4
CalculateDefaultMaxContextRunes computes a default context limit based on the model's context window. Strategy: Use 75% of the context window and convert to rune estimate.
Token-to-rune conversion ratios (conservative estimates):
- English: ~4 chars per token
- Chinese: ~1.5-2 chars per token
- Mixed: ~3 chars per token (used here for safety)
func CreateHTTPClient ¶ added in v0.2.3
CreateHTTPClient creates an HTTP client with optional proxy support. If proxyURL is empty, it uses the system environment proxy settings. Supported proxy schemes: http, https, socks5, socks5h.
func DerefStr ¶ added in v0.2.0
DerefStr dereferences a pointer to a string and returns the value or a fallback if the pointer is nil.
func DoRequestWithRetry ¶ added in v0.2.0
func DownloadFile ¶
func DownloadFile(urlStr, filename string, opts DownloadOptions) string
DownloadFile downloads a file from URL to a local temp directory. Returns the local file path or empty string on error.
func DownloadFileSimple ¶
DownloadFileSimple is a simplified version of DownloadFile without options
func DownloadToFile ¶ added in v0.2.0
func DownloadToFile(ctx context.Context, client *http.Client, req *http.Request, maxBytes int64) (string, error)
DownloadToFile streams an HTTP response body to a temporary file in small chunks (~32KB), keeping peak memory usage constant regardless of file size.
Parameters:
- ctx: context for cancellation/timeout
- client: HTTP client to use (caller controls timeouts, transport, etc.)
- req: fully prepared *http.Request (method, URL, headers, etc.)
- maxBytes: maximum bytes to download; 0 means no limit
Returns the path to the temporary file. The caller is responsible for removing it when done (defer os.Remove(path)).
On any error the temp file is cleaned up automatically.
func ExtractZipFile ¶ added in v0.2.0
ExtractZipFile extracts a ZIP archive from disk to targetDir. It reads entries one at a time from disk, keeping memory usage minimal.
Security: rejects path traversal attempts and symlinks.
func FitToolFeedbackMessage ¶ added in v0.2.8
FitToolFeedbackMessage keeps tool feedback within a single outbound message. It preserves the first line when possible and truncates the explanation body instead of letting the message be split into multiple chunks.
func FormatToolFeedbackMessage ¶ added in v0.2.7
FormatToolFeedbackMessage renders a tool feedback message for chat channels. It keeps the tool name on the first line for animation and can include both a human explanation and the serialized tool arguments in the body.
func HtmlToMarkdown ¶ added in v0.2.4
func IsAudioFile ¶
IsAudioFile checks if a file is an audio file based on its filename extension and content type.
func MeasureContextRunes ¶ added in v0.2.4
MeasureContextRunes calculates the total rune count of a message list. Includes content, reasoning content, and estimates for tool calls.
func ResolveMaxContextRunes ¶ added in v0.2.4
ResolveMaxContextRunes determines the final MaxContextRunes value to use. Priority: explicit config > auto-calculate > conservative default
func SanitizeFilename ¶
SanitizeFilename removes potentially dangerous characters from a filename and returns a safe version for local filesystem storage.
func SanitizeMessageContent ¶ added in v0.2.0
SanitizeMessageContent removes Unicode control characters, format characters (RTL overrides, zero-width characters), and other non-graphic characters that could confuse an LLM or cause display issues in the agent UI.
func SetDisableTruncation ¶ added in v0.2.2
func SetDisableTruncation(enabled bool)
SetDisableTruncation globally enables or disables string truncation
func ToolCallExplanationDuplicatesContent ¶ added in v0.2.8
func Truncate ¶
Truncate returns a truncated version of s with at most maxLen runes. Handles multi-byte Unicode characters properly. If the string is truncated, "..." is appended to indicate truncation.
func TruncateContextSmart ¶ added in v0.2.4
TruncateContextSmart intelligently truncates message history to fit within maxRunes.
Strategy:
- Always preserve system messages (they define the agent's behavior)
- Keep the most recent messages (they contain current context)
- Drop older middle messages when necessary
- Insert a truncation notice to inform the LLM
Returns the truncated message list.
func ValidateSkillIdentifier ¶ added in v0.2.0
ValidateSkillIdentifier validates that the given skill identifier (slug or registry name) is non-empty and does not contain path separators ("/", "\\") or ".." for security.
func VisibleToolCallArgumentsPreview ¶ added in v0.2.8
Types ¶
type BM25Engine ¶ added in v0.2.2
type BM25Engine[T any] struct { // contains filtered or unexported fields }
BM25Engine is a BM25 search engine over a generic corpus. T is the document type; the caller supplies a TextFunc that extracts the searchable text from each document.
The engine precomputes its index once at construction time and reuses it for subsequent searches. If the corpus content changes, construct a new engine.
func NewBM25Engine ¶ added in v0.2.2
func NewBM25Engine[T any](corpus []T, textFunc func(T) string, opts ...BM25Option) *BM25Engine[T]
NewBM25Engine creates a BM25Engine for the given corpus.
- corpus : slice of documents of any type T.
- textFunc : function that returns the searchable text for a document.
- opts : optional tuning (WithK1, WithB).
The corpus slice is referenced, not copied. Callers must not mutate it concurrently with Search().
func (*BM25Engine[T]) Search ¶ added in v0.2.2
func (e *BM25Engine[T]) Search(query string, topK int) []BM25Result[T]
Search ranks the corpus against query and returns the top-k results. Returns an empty slice (not nil) when there are no matches.
Complexity: O(|Q|×avgPostingLen + candidates × log k) per search after the one-time indexing work performed by NewBM25Engine.
type BM25Option ¶ added in v0.2.2
type BM25Option func(*bm25Config)
BM25Option is a functional option to configure a BM25Engine.
func WithB ¶ added in v0.2.2
func WithB(b float64) BM25Option
WithB overrides the document-length normalization factor (default 0.75).
func WithK1 ¶ added in v0.2.2
func WithK1(k1 float64) BM25Option
WithK1 overrides the term-frequency saturation constant (default 1.2).
type BM25Result ¶ added in v0.2.2
BM25Result is a single ranked result from a Search call.
type DownloadOptions ¶
type DownloadOptions struct {
Timeout time.Duration
ExtraHeaders map[string]string
LoggerPrefix string
ProxyURL string
}
DownloadOptions holds optional parameters for downloading files
type VisibleToolCall ¶ added in v0.2.8
type VisibleToolCall struct {
ID string `json:"id,omitempty"`
Type string `json:"type,omitempty"`
Function *VisibleToolCallFunction `json:"function,omitempty"`
ExtraContent *VisibleToolCallExtraContent `json:"extra_content,omitempty"`
}
func BuildVisibleToolCalls ¶ added in v0.2.8
func BuildVisibleToolCalls( toolCalls []providers.ToolCall, maxArgsLen int, ) []VisibleToolCall
type VisibleToolCallExtraContent ¶ added in v0.2.8
type VisibleToolCallExtraContent struct {
ToolFeedbackExplanation string `json:"tool_feedback_explanation,omitempty"`
}