Documentation
¶
Overview ¶
Package sanitize provides shared identifier sanitization for collection names.
Collection names in vector stores (Qdrant, chromem) must match: ^[a-z0-9_]{1,64}$ This package ensures all identifiers conform to this requirement.
Index ¶
Constants ¶
View Source
const ( // MaxIdentifierLength is the maximum length for collection name components. // Qdrant and chromem require collection names to be 1-64 characters. MaxIdentifierLength = 64 // HashSuffixLength is the length of the hash suffix added to truncated identifiers. // Format: _<8-char-hash> = 9 characters total HashSuffixLength = 9 // DefaultIdentifier is used when sanitization produces an empty result. DefaultIdentifier = "default" )
Variables ¶
This section is empty.
Functions ¶
func CollectionName ¶
CollectionName builds a collection name from tenant and project components.
Format: {sanitized_tenant}_{sanitized_project}_{suffix} Example: CollectionName("github.com/user", "my-project", "codebase")
-> "github_com_user_my_project_codebase"
The result is guaranteed to be valid for vector store collection names.
func Identifier ¶
Identifier sanitizes a string for use in collection names.
Rules applied:
- Converts to lowercase
- Replaces invalid characters with underscores
- Collapses multiple underscores
- Trims leading/trailing underscores
- Truncates to MaxIdentifierLength with hash suffix if too long
- Returns DefaultIdentifier if result would be empty
Examples:
"github.com/user" -> "github_com_user" "My Project!" -> "my_project" "" or "!!!" -> "default"
Types ¶
This section is empty.
Click to show internal directories.
Click to hide internal directories.