sanitize

package
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 1, 2026 License: GPL-3.0 Imports: 7 Imported by: 0

Documentation

Overview

Package sanitize provides shared identifier sanitization for collection names.

Collection names in vector stores (Qdrant, chromem) must match: ^[a-z0-9_]{1,64}$ This package ensures all identifiers conform to this requirement.

Package sanitize provides shared identifier sanitization and input validation.

Index

Constants

View Source
const (
	// MaxIdentifierLength is the maximum length for collection name components.
	// Qdrant and chromem require collection names to be 1-64 characters.
	MaxIdentifierLength = 64

	// HashSuffixLength is the length of the hash suffix added to truncated identifiers.
	// Format: _<8-char-hash> = 9 characters total
	HashSuffixLength = 9

	// DefaultIdentifier is used when sanitization produces an empty result.
	DefaultIdentifier = "default"
)

Variables

View Source
var (
	// ErrPathTraversal indicates a path contains directory traversal sequences.
	ErrPathTraversal = errors.New("path contains directory traversal")

	// ErrAbsolutePath indicates an absolute path was provided where relative was expected.
	ErrAbsolutePath = errors.New("absolute path not allowed")

	// ErrInvalidTenantID indicates the tenant ID format is invalid.
	ErrInvalidTenantID = errors.New("invalid tenant ID format")

	// ErrInvalidTeamID indicates the team ID format is invalid.
	ErrInvalidTeamID = errors.New("invalid team ID format")

	// ErrInvalidProjectID indicates the project ID format is invalid.
	ErrInvalidProjectID = errors.New("invalid project ID format")

	// ErrInvalidPattern indicates a glob/regex pattern is dangerous.
	ErrInvalidPattern = errors.New("invalid or dangerous pattern")

	// ErrEmptyPath indicates an empty path was provided.
	ErrEmptyPath = errors.New("path cannot be empty")
)

Validation errors for security checks.

Functions

func CollectionName

func CollectionName(tenant, project, suffix string) string

CollectionName builds a collection name from tenant and project components.

Format: {sanitized_tenant}_{sanitized_project}_{suffix} Example: CollectionName("github.com/user", "my-project", "codebase")

-> "github_com_user_my_project_codebase"

The result is guaranteed to be valid for vector store collection names.

func Identifier

func Identifier(s string) string

Identifier sanitizes a string for use in collection names.

Rules applied:

  • Converts to lowercase
  • Replaces invalid characters with underscores
  • Collapses multiple underscores
  • Trims leading/trailing underscores
  • Truncates to MaxIdentifierLength with hash suffix if too long
  • Returns DefaultIdentifier if result would be empty

Examples:

"github.com/user" -> "github_com_user"
"My Project!"     -> "my_project"
"" or "!!!"       -> "default"

func SafeBasename added in v0.4.0

func SafeBasename(path string) (string, error)

SafeBasename returns the base name of a path after validation. This is a secure replacement for filepath.Base() on untrusted input.

func SanitizeAndValidateTenantID added in v0.4.0

func SanitizeAndValidateTenantID(id string) (string, error)

SanitizeAndValidateTenantID sanitizes a tenant ID and validates the result. This is the recommended way to process user-provided tenant IDs.

func ValidateGlobPattern added in v0.4.0

func ValidateGlobPattern(pattern string) error

ValidateGlobPattern checks a glob pattern for dangerous constructs. Returns nil if the pattern is safe, or an error describing the issue.

func ValidateGlobPatterns added in v0.4.0

func ValidateGlobPatterns(patterns []string) error

ValidateGlobPatterns validates a slice of glob patterns.

func ValidatePath added in v0.4.0

func ValidatePath(path, allowedRoot string) (string, error)

ValidatePath checks a path for security issues:

  • No directory traversal (..)
  • Resolves to absolute path and validates it stays within expected root
  • Returns the cleaned, absolute path or an error

If allowedRoot is empty, only traversal checks are performed. If allowedRoot is provided, the path must resolve within that directory.

func ValidateProjectID added in v0.4.0

func ValidateProjectID(id string) error

ValidateProjectID checks that a project ID conforms to expected format. Project IDs follow the same rules as tenant IDs.

func ValidateProjectPath added in v0.4.0

func ValidateProjectPath(path string) (string, error)

ValidateProjectPath validates a project path for MCP tool use. Returns the validated absolute path.

func ValidateTeamID added in v0.4.0

func ValidateTeamID(id string) error

ValidateTeamID checks that a team ID conforms to expected format. Team IDs follow the same rules as tenant IDs.

func ValidateTenantID added in v0.4.0

func ValidateTenantID(id string) error

ValidateTenantID checks that a tenant ID conforms to expected format. Tenant IDs should be lowercase alphanumeric with underscores, 1-64 chars.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL