Documentation
¶
Overview ¶
Package sanitize provides shared identifier sanitization for collection names.
Collection names in vector stores (Qdrant, chromem) must match: ^[a-z0-9_]{1,64}$ This package ensures all identifiers conform to this requirement.
Package sanitize provides shared identifier sanitization and input validation.
Index ¶
- Constants
- Variables
- func CollectionName(tenant, project, suffix string) string
- func Identifier(s string) string
- func SafeBasename(path string) (string, error)
- func SanitizeAndValidateTenantID(id string) (string, error)
- func ValidateGlobPattern(pattern string) error
- func ValidateGlobPatterns(patterns []string) error
- func ValidatePath(path, allowedRoot string) (string, error)
- func ValidateProjectID(id string) error
- func ValidateProjectPath(path string) (string, error)
- func ValidateTeamID(id string) error
- func ValidateTenantID(id string) error
Constants ¶
const ( // MaxIdentifierLength is the maximum length for collection name components. // Qdrant and chromem require collection names to be 1-64 characters. MaxIdentifierLength = 64 // HashSuffixLength is the length of the hash suffix added to truncated identifiers. // Format: _<8-char-hash> = 9 characters total HashSuffixLength = 9 // DefaultIdentifier is used when sanitization produces an empty result. DefaultIdentifier = "default" )
Variables ¶
var ( // ErrPathTraversal indicates a path contains directory traversal sequences. ErrPathTraversal = errors.New("path contains directory traversal") // ErrAbsolutePath indicates an absolute path was provided where relative was expected. ErrAbsolutePath = errors.New("absolute path not allowed") // ErrInvalidTenantID indicates the tenant ID format is invalid. ErrInvalidTenantID = errors.New("invalid tenant ID format") // ErrInvalidTeamID indicates the team ID format is invalid. ErrInvalidTeamID = errors.New("invalid team ID format") // ErrInvalidProjectID indicates the project ID format is invalid. ErrInvalidProjectID = errors.New("invalid project ID format") // ErrInvalidPattern indicates a glob/regex pattern is dangerous. ErrInvalidPattern = errors.New("invalid or dangerous pattern") // ErrEmptyPath indicates an empty path was provided. ErrEmptyPath = errors.New("path cannot be empty") )
Validation errors for security checks.
Functions ¶
func CollectionName ¶
CollectionName builds a collection name from tenant and project components.
Format: {sanitized_tenant}_{sanitized_project}_{suffix} Example: CollectionName("github.com/user", "my-project", "codebase")
-> "github_com_user_my_project_codebase"
The result is guaranteed to be valid for vector store collection names.
func Identifier ¶
Identifier sanitizes a string for use in collection names.
Rules applied:
- Converts to lowercase
- Replaces invalid characters with underscores
- Collapses multiple underscores
- Trims leading/trailing underscores
- Truncates to MaxIdentifierLength with hash suffix if too long
- Returns DefaultIdentifier if result would be empty
Examples:
"github.com/user" -> "github_com_user" "My Project!" -> "my_project" "" or "!!!" -> "default"
func SafeBasename ¶ added in v0.4.0
SafeBasename returns the base name of a path after validation. This is a secure replacement for filepath.Base() on untrusted input.
func SanitizeAndValidateTenantID ¶ added in v0.4.0
SanitizeAndValidateTenantID sanitizes a tenant ID and validates the result. This is the recommended way to process user-provided tenant IDs.
func ValidateGlobPattern ¶ added in v0.4.0
ValidateGlobPattern checks a glob pattern for dangerous constructs. Returns nil if the pattern is safe, or an error describing the issue.
func ValidateGlobPatterns ¶ added in v0.4.0
ValidateGlobPatterns validates a slice of glob patterns.
func ValidatePath ¶ added in v0.4.0
ValidatePath checks a path for security issues:
- No directory traversal (..)
- Resolves to absolute path and validates it stays within expected root
- Returns the cleaned, absolute path or an error
If allowedRoot is empty, only traversal checks are performed. If allowedRoot is provided, the path must resolve within that directory.
func ValidateProjectID ¶ added in v0.4.0
ValidateProjectID checks that a project ID conforms to expected format. Project IDs follow the same rules as tenant IDs.
func ValidateProjectPath ¶ added in v0.4.0
ValidateProjectPath validates a project path for MCP tool use. Returns the validated absolute path.
func ValidateTeamID ¶ added in v0.4.0
ValidateTeamID checks that a team ID conforms to expected format. Team IDs follow the same rules as tenant IDs.
func ValidateTenantID ¶ added in v0.4.0
ValidateTenantID checks that a tenant ID conforms to expected format. Tenant IDs should be lowercase alphanumeric with underscores, 1-64 chars.
Types ¶
This section is empty.