sanitize

package

v0.4.12 Latest Latest Go to latest Published: Jun 19, 2026 License: AGPL-3.0 Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/eleboucher/memini

Links

Open Source Insights

Documentation ¶

Overview ¶

Package sanitize provides write-path content hygiene for memini: stripping unambiguous corruption (always-on) and detecting "script-salad" garble (opt-in). It exists because ingestion stores whatever a harness sends, and an upstream model/harness glitch can hand memini a garbled digest that then surfaces in recall verbatim.

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Clean ¶

func Clean(s string) string

Clean strips unambiguous corruption from text before it is embedded or persisted: invalid UTF-8 byte sequences, the U+FFFD replacement character, C0/C1 control codes (except tab, newline, carriage return), and Unicode non-character code points. It deliberately does NOT touch valid printable text in any language — legitimate Chinese, Japanese, Arabic, or emoji content passes through untouched. A string that is pure binary garbage cleans to (or near) empty, which the caller can then reject.

func Garbled ¶

func Garbled(s string) bool

Garbled reports whether text looks like script-salad — Latin glued to CJK glued to Cyrillic with no separators, the signature of garbled multilingual model output (e.g. `I'm这个家制品 with在上世纪`). It is a heuristic, not a proof: it CANNOT tell semantically-random mixing from a rare legitimate case, so callers must treat a positive as "downrank/flag", never "delete". It is off by default for exactly this reason.

Only *glued* transitions between two different real scripts count — a space or punctuation break resets adjacency, so ordinary code-switching ("the 这个 thing") scores zero. Han, kana, and hangul collapse into one CJK bucket so legitimate Japanese (Han+kana) and CJK-with-embedded-Latin tech terms ("使用React框架") stay well under the threshold.

Types ¶

This section is empty.

Source Files ¶

View all Source files

sanitize.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL