sanitize

package
v0.3.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2026 License: GPL-3.0 Imports: 3 Imported by: 0

Documentation

Overview

Package sanitize provides shared identifier sanitization for collection names.

Collection names in vector stores (Qdrant, chromem) must match: ^[a-z0-9_]{1,64}$ This package ensures all identifiers conform to this requirement.

Index

Constants

View Source
const (
	// MaxIdentifierLength is the maximum length for collection name components.
	// Qdrant and chromem require collection names to be 1-64 characters.
	MaxIdentifierLength = 64

	// HashSuffixLength is the length of the hash suffix added to truncated identifiers.
	// Format: _<8-char-hash> = 9 characters total
	HashSuffixLength = 9

	// DefaultIdentifier is used when sanitization produces an empty result.
	DefaultIdentifier = "default"
)

Variables

This section is empty.

Functions

func CollectionName

func CollectionName(tenant, project, suffix string) string

CollectionName builds a collection name from tenant and project components.

Format: {sanitized_tenant}_{sanitized_project}_{suffix} Example: CollectionName("github.com/user", "my-project", "codebase")

-> "github_com_user_my_project_codebase"

The result is guaranteed to be valid for vector store collection names.

func Identifier

func Identifier(s string) string

Identifier sanitizes a string for use in collection names.

Rules applied:

  • Converts to lowercase
  • Replaces invalid characters with underscores
  • Collapses multiple underscores
  • Trims leading/trailing underscores
  • Truncates to MaxIdentifierLength with hash suffix if too long
  • Returns DefaultIdentifier if result would be empty

Examples:

"github.com/user" -> "github_com_user"
"My Project!"     -> "my_project"
"" or "!!!"       -> "default"

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL