c1zsanitize

package
v0.12.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 3, 2026 License: Apache-2.0 Imports: 25 Imported by: 0

Documentation

Overview

Package c1zsanitize transforms a real .c1z snapshot into an identity-stripped copy whose graph topology, cardinalities, and annotation structure are preserved. The output is suitable for shipping to internal development environments where the original customer data must not appear.

The whole transform is driven by a single per-c1z HMAC-SHA256 secret. Same input → same output within one c1z so cross-references stay coherent; different across c1zs whose secrets differ so an attacker holding multiple sanitized outputs cannot correlate them.

v0.1 reads and writes the v1/v2 sqlite-zstd .c1z format via connectorstore.Reader / Writer. v3 c1z3 output will land in v0.2 once the storage-engine-v4 PR stack merges.

Index

Constants

View Source
const MinSecretBytes = 32

MinSecretBytes is the minimum length of a per-c1z secret. Anything shorter is rejected by Sanitize; in practice operators should use 32 random bytes from a CSPRNG.

Variables

This section is empty.

Functions

func LoadOrGenerateSecret

func LoadOrGenerateSecret(flagPath, outPath string) ([]byte, bool, error)

LoadOrGenerateSecret returns the per-c1z HMAC secret. When flagPath is set it loads and length-checks that file. Otherwise it mints a fresh CSPRNG secret and writes it next to outPath, refusing to clobber an existing one so a prior run's reversible mapping is never silently replaced. generated reports whether a new secret was minted so the caller can tell the operator to archive it.

func Sanitize

func Sanitize(ctx context.Context, src connectorstore.Reader, dst connectorstore.Writer, opts Options) error

Sanitize copies records from src to dst, transforming identifiers, names, free text, emails, and timestamps under the per-c1z secret. One destination sync is opened per source sync; parent_sync_id linkage is preserved via a srcSyncID → dstSyncID map maintained for the duration of the call.

func SanitizeID

func SanitizeID(secret []byte, input string) string

SanitizeID returns a deterministic, irreversible transform of input under the per-c1z secret. Same input → same output within a c1z; different across c1zs whose secrets differ.

Empty input returns empty output so callers can transform optional fields without checking presence first.

func SecretPath

func SecretPath(flagPath, outPath string) string

SecretPath returns where the per-c1z HMAC secret is read from or written to: the explicit flag path when set, otherwise a file next to the sanitized output.

Types

type Options

type Options struct {
	// Secret is the per-c1z HMAC key. Must be at least MinSecretBytes.
	// The operator chooses whether to archive or discard it; the
	// sanitizer never persists it on its own.
	Secret []byte

	// TimestampAnchor is the wall-clock value the newest timestamp in
	// the source c1z lands on. All other timestamps shift by the same
	// delta so relative deltas are preserved. Defaults to time.Now()
	// when zero.
	TimestampAnchor time.Time

	// AllowUnknownAnnotations controls behavior when an annotation's
	// Any type URL is not in the handler registry. The zero value is
	// the safe default: unknown annotations are dropped and a log line
	// names the type URL, so a newly-added annotation type carrying
	// customer data can never pass through unsanitized. Set true to
	// pass unknown annotations through unchanged — convenient for
	// development against new annotation types, dangerous on real
	// customer data.
	AllowUnknownAnnotations bool
}

Options configures a sanitization run.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL