visbaseline

package
v1.18.14 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 3, 2026 License: BSD-3-Clause Imports: 18 Imported by: 0

Documentation

Overview

Package visbaseline captures and diffs unauthenticated vs authenticated PeeringDB API responses for all 13 object types, with a strict no-PII-in-repo guarantee enforced by the Redact function in this package.

The central hard constraint (phase 57 D-02) is: raw PeeringDB data must NOT be committed in raw form. Authenticated responses carry email, phone, and legal-name data that upstream withholds from anonymous callers; this package inspects those responses in memory, emits a placeholder-only shape, and produces a structural diff report for downstream test fixtures.

Index

Constants

View Source
const (
	PlaceholderString = "<auth-only:string>"
	PlaceholderNumber = "<auth-only:number>"
	PlaceholderBool   = "<auth-only:bool>"
)

Placeholder strings substituted for redacted values in committed auth fixtures. The "<auth-only:TYPE>" format is intentionally not valid JSON in any identifier position, making it impossible to confuse with real data and trivial to grep for in CI audits.

View Source
const DefaultStatePath = "/tmp/pdb-vis-capture-state.json"

DefaultStatePath is the canonical location of the capture checkpoint file. /tmp is used deliberately so the checkpoint survives process exit but not reboots, matching operator expectations for a one-shot CLI.

View Source
const ReportSchemaVersion = 1

ReportSchemaVersion is the frozen schema version for diff.json. Bump on any non-additive change to the emitted structure.

Variables

View Source
var AllTypes = []string{
	"campus", "carrier", "carrierfac", "fac", "ix", "ixfac",
	"ixlan", "ixpfx", "net", "netfac", "netixlan", "org", "poc",
}

AllTypes is the canonical list of PeeringDB object types captured in the visibility baseline walk. Mirrored from cmd/pdbcompat-check/main.go (lines 22-26) because Go import-direction hygiene forbids `internal` packages from importing `cmd` packages.

View Source
var PIIFields = []string{
	"address1",
	"address2",
	"city",
	"email",
	"latitude",
	"longitude",
	"name",
	"phone",
	"policy_email",
	"policy_phone",
	"sales_email",
	"sales_phone",
	"state",
	"tech_email",
	"tech_phone",
	"zipcode",
}

PIIFields is the authoritative allow-list of field names that carry personal data in PeeringDB responses. Any field whose JSON name appears in this list is ALWAYS replaced with a placeholder in redacted auth fixtures, regardless of whether the same field is present in the corresponding anonymous response.

Derived from internal/peeringdb/types.go: Organization, Facility, InternetExchange, Poc, Campus, and association tables (netfac, ixfac). The list is deliberately conservative — a field is PII if it identifies or locates an individual or organisation's physical presence.

Deliberately excluded (not PII by this policy):

  • notes: business-owned free text, publishable.
  • country: geographic region, not a locating identifier on its own.
  • website, url, looking_glass, route_server: publishable URLs.
  • name_long, aka: organisation/network display names, not personal names.

The list is sorted for stable review diffs when it is updated.

View Source
var ProdTypes = []string{"net", "org", "poc"}

ProdTypes is the reduced list used for the prod confirmation pass per phase 57 D-04. Only the high-signal privacy types are fetched against production to stay well inside the rate-limit quota.

Functions

func BuildReport

func BuildReport(ctx context.Context, cfg BuildReportConfig) error

BuildReport walks BaselineRoot, runs Diff per type, and emits DIFF.md + diff.json at OutDir. See BuildReportConfig for the dual-shape contract.

GO-CFG-1 fail-fast validation:

  • BaselineRoot must be non-empty and a readable directory.
  • OutDir must be non-empty and must NOT be the filesystem root.
  • Single-target shape requires both anon/ and auth/ subdirs present.
  • Multi-target shape requires at least one target subdir with both anon/ and auth/ present.

func CleanupStatePath

func CleanupStatePath(path string) error

CleanupStatePath removes the checkpoint file. A missing file is not an error — removal is idempotent so callers (New / Run completion) do not have to pre-check existence.

func IsPIIField

func IsPIIField(name string) bool

IsPIIField reports whether the given JSON field name is in the PII allow-list. Comparison is exact (case-sensitive) to match PeeringDB's lowercase-underscore field naming convention.

func Redact

func Redact(anonBytes, authBytes []byte) ([]byte, error)

Redact applies the phase 57 redaction policy to an authenticated PeeringDB response and returns bytes safe to commit. The policy:

  1. Preserve envelope shape, row count, field names, integer row ids, and the `visible` enum value (controlled vocabulary we need for the diff).
  2. Replace any value whose field name appears in PIIFields with a typed placeholder, regardless of whether the field appears in anon. This is defence-in-depth against upstream anon bugs.
  3. For non-PII fields, replace values with a typed placeholder when: - the auth row has no matching anon row (match by `id`), OR - the anon row lacks the field name. Otherwise keep the auth value verbatim (anon already publishes it).
  4. Re-marshal with json.MarshalIndent. Go's encoding/json sorts map keys lexicographically in MarshalIndent output since Go 1.12, so the result is deterministic — same input bytes always produce same output bytes.

Redact is a pure function: no filesystem, no network, no global state, no goroutines. Errors from json.Unmarshal / json.MarshalIndent are wrapped with %w. Input bytes are never echoed in error messages.

func RedactDir

func RedactDir(ctx context.Context, cfg RedactDirConfig) error

RedactDir walks a raw-auth staging tree, pairs each page with its anon counterpart, runs Redact, and writes the redacted bytes under Dst.

Errors are fail-fast: a missing anon counterpart, a malformed JSON, or a failed write halts the walk. Partial output under Dst is left as-is for the operator to inspect — the caller is expected to `rm -rf` Dst before a retry. Raw auth input bytes are never echoed in error messages (T-57-05).

RedactDir honours ctx between files: cancellation mid-walk returns ctx.Err() wrapped.

func WriteJSON

func WriteJSON(w io.Writer, rep Report) error

WriteJSON serialises the report as JSON with stable (alphabetical) map key ordering. The "generated" field is taken from rep.GeneratedAt; callers that want deterministic output across runs should stamp a fixed time.

WriteJSON normalises nil Fields slices to []FieldDelta{} so they marshal as "[]" rather than "null". Empty-array is semantically "no deltas observed"; the schema stays stable for downstream JSON consumers that expect arrays. See 57-03-PLAN.md Task 2 note.

HTML escaping is disabled so that placeholder strings such as "<auth-only:string>" remain greppable in diff.json and committed fixtures — json.Encoder's default escapes "<" and ">" to "\u003c"/"\u003e" which would defeat the CI grep audit for placeholder sentinels. Matches the encoder configuration in internal/visbaseline/redact.go.

func WriteMarkdown

func WriteMarkdown(w io.Writer, rep Report) error

WriteMarkdown emits one table per PeeringDB type, preceded by a TOC. Types are sorted alphabetically; field deltas within each type are emitted in their existing slice order (Diff already sorts alphabetically).

Types

type BuildReportConfig

type BuildReportConfig struct {
	BaselineRoot string
	OutDir       string
	GeneratedAt  time.Time // optional; defaults to time.Now().UTC()
	Logger       *slog.Logger
}

BuildReportConfig parameterises a BuildReport run. Grouped per GO-CS-5.

BaselineRoot is resolved in one of two shapes:

  1. Single-target: BaselineRoot directly contains "anon/" and "auth/" subdirs (e.g. testdata/visibility-baseline/beta). BuildReport emits a single DIFF.md + diff.json at OutDir covering the type-level deltas for that target, with top-level keys namespaced by type name only.

  2. Multi-target: BaselineRoot contains per-target subdirs each with "anon/" + "auth/" (e.g. testdata/visibility-baseline/ containing beta/ and prod/). BuildReport emits a unified DIFF.md + diff.json at OutDir whose type keys are namespaced "{target}/{type}" AND emits per-target auxiliary DIFF-{target}.md files for reviewers who only want one target at a time.

The caller picks the shape by choosing BaselineRoot; BuildReport auto-detects which one is present. An ambiguous tree (both direct anon/auth subdirs AND per-target subdirs) is rejected as a user error.

type Capture

type Capture struct {
	// contains filtered or unexported fields
}

Capture coordinates the per-tuple walk of PeeringDB pages for the phase 57 visibility baseline. Construct via New; run with Run.

func New

func New(cfg Config) (*Capture, error)

New validates cfg and constructs a Capture. Fails fast when required fields are missing or when auth mode is requested without an API key — no HTTP calls happen in New.

func (*Capture) Run

func (c *Capture) Run(ctx context.Context) (string, error)

Run executes the capture walk. Returns the path to the private /tmp directory holding raw auth bytes (for the redactor to consume in plan 03), or an error. Context cancellation is honoured between tuples and during rate-limit sleeps — returned error wraps ctx.Err().

On successful completion the checkpoint file is removed. On early exit (error, cancel) the checkpoint persists so a later invocation can resume.

type Config

type Config struct {
	// Target is the capture label ("beta" | "prod"). Used only as a tag
	// inside State tuples; the actual base URL is BaseURL.
	Target string

	// BaseURL is the PeeringDB root (e.g. https://beta.peeringdb.com). No
	// trailing slash. Must be non-empty.
	BaseURL string

	// Modes is the subset of {"anon", "auth"} to capture.
	Modes []string

	// Types is the subset of PeeringDB object types to walk. Use AllTypes
	// for beta; ProdTypes for prod.
	Types []string

	// Pages is the number of pages per (mode, type) to fetch. Defaults to
	// 2 when <1. The phase 57 baseline uses 2.
	Pages int

	// OutDir is the REPO-SIDE output root for anon fixtures. Auth fixtures
	// are written EXCLUSIVELY under a private /tmp directory created per
	// run — NEVER under OutDir. This invariant is enforced by the capture
	// loop and asserted by TestCaptureWritesAuthBytesToTmpOnly.
	OutDir string

	// APIKey is the PeeringDB API key. Required when Modes contains "auth".
	// Must never be logged (T-57-05); the Capture implementation uses the
	// peeringdb.Client's built-in Authorization header path, which does
	// not log the value.
	APIKey string

	// StatePath is the checkpoint file path. Defaults to DefaultStatePath
	// when empty.
	StatePath string

	// Logger receives structured capture events. Required.
	Logger *slog.Logger

	// InterTupleDelay is an optional pause between tuples. Tests set this
	// small enough to keep wall-clock low; prod runs leave it zero (the
	// peeringdb.Client rate limiter already paces requests).
	InterTupleDelay time.Duration

	// RateLimitJitter is the extra pause added on top of RateLimitError
	// RetryAfter before re-fetching the same tuple. Defaults to 5 seconds
	// in prod; tests override to 1ms to keep runtime low.
	RateLimitJitter time.Duration

	// ClientOverride, when non-nil, is used by New instead of constructing
	// a *peeringdb.Client internally. TESTS ONLY. Prod callers must leave
	// nil.
	//
	// Rationale: tests need to call client.SetRateLimit and
	// SetRetryBaseDelay before Run to keep wall-clock under 1 second. A
	// package-global mutable hook is race-prone under t.Parallel(); a
	// per-Config field has no shared mutable state.
	//
	// When both Modes contains "auth" and ClientOverride is set, the
	// override is used for BOTH anon and auth clients — tests that need
	// to assert auth-header presence should set the override WithAPIKey
	// and inspect the request on the httptest server side.
	ClientOverride *peeringdb.Client

	// PromptReader is the source for the Resume/Restart prompt. Defaults
	// to os.Stdin when nil. Tests inject strings.NewReader.
	PromptReader io.Reader

	// PromptWriter is the sink for the Resume/Restart prompt text.
	// Defaults to os.Stderr when nil. Tests inject io.Discard.
	PromptWriter io.Writer
}

Config parameterises a capture run. Grouped per GO-CS-5 (input structs for >2-arg callers).

Required fields: Target, BaseURL, Modes, Types, OutDir, Logger. Optional fields fall back to documented defaults. When Modes contains "auth", APIKey MUST be set or New returns an error.

type FieldDelta

type FieldDelta struct {
	Name          string `json:"name"`
	AuthOnly      bool   `json:"auth_only"`
	Placeholder   string `json:"placeholder,omitempty"`     // "<auth-only:TYPE>" sentinel
	RowsAdded     int    `json:"rows_added,omitempty"`      // count of auth rows with this field absent-in-anon
	ValueSetDrift bool   `json:"value_set_drift,omitempty"` // true for "visible" when new enum values appear
	IsPII         bool   `json:"is_pii,omitempty"`          // IsPIIField(Name) at construction time
}

FieldDelta describes a single field-level observation. It DOES NOT carry field values, lengths, hashes, or any signal that could fingerprint the underlying data. See threat T-57-02 and 57-RESEARCH.md Pitfall 4.

type PromptAnswer

type PromptAnswer int

PromptAnswer enumerates operator responses to the resume/restart prompt.

const (
	// Restart is the safe default — discard old state and enumerate fresh
	// tuples. Returned on EOF, empty input, unrecognised keywords.
	Restart PromptAnswer = iota
	// Resume continues with the saved state, skipping Done=true tuples.
	Resume
)

func PromptResumeOrRestart

func PromptResumeOrRestart(r io.Reader, w io.Writer) PromptAnswer

PromptResumeOrRestart asks the operator via r (typically os.Stdin) whether to resume from a saved checkpoint or restart. Writes the prompt text to w (typically os.Stderr so it does not contaminate stdout pipelines). Reads one line via bufio.NewScanner.

Safe defaults: any input that is not recognised as a resume keyword returns Restart. This includes EOF, empty lines, garbage, and "no"/"restart". The bias is "when in doubt, start fresh" — a replayed fetch is cheap; a skipped tuple is a silent correctness failure.

Resume keywords (case-insensitive, TrimSpace): "resume", "r", "continue", "c".

type RedactDirConfig

type RedactDirConfig struct {
	AuthSrc string
	AnonDir string
	Dst     string
	Logger  *slog.Logger
}

RedactDirConfig parameterises a RedactDir run. Grouped per GO-CS-5 because the caller passes more than two arguments.

Layout expectations:

  • AuthSrc points at a raw-auth staging tree laid down by Capture, i.e. .../auth/api/{type}/page-N.json. Both the "auth/api/..." subpath and the flat "api/..." form are accepted: RedactDir walks AuthSrc and treats every *.json file whose relative path contains ".../{type}/ page-N.json" as a page to redact.
  • AnonDir points at the repo-side anon mirror, i.e. .../anon/api/{type}/ page-N.json. RedactDir maps each auth page to its anon counterpart by {type}+{page}; missing pairs are an error (an anon page is the only source of truth for "which fields are already public", so skipping it would over-disclose).
  • Dst is the repo-side redacted auth destination, i.e. .../auth/api/{type}/page-N.json. RedactDir creates directories as needed with mode 0700 and writes files with mode 0600.

type Report

type Report struct {
	SchemaVersion int                   `json:"schema_version"`
	GeneratedAt   time.Time             `json:"generated"`
	Targets       []string              `json:"targets"`
	Types         map[string]TypeReport `json:"types"`
}

Report is the per-run diff artifact. It carries aggregate counts and field-level deltas for each PeeringDB type, but NEVER actual field values (except for the controlled enum `visible`). See threat T-57-02.

type State

type State struct {
	Version int     `json:"version"`
	Tuples  []Tuple `json:"tuples"`
}

State is the capture checkpoint.

State carries ONLY tuple metadata. No response bytes, no API keys, no payload. See threat T-57-04 in the phase 57-02 threat register: the checkpoint file is written to /tmp and could be read by other users on multi-tenant hosts, so it must never contain sensitive data. The TestCheckpointContainsNoPayload unit test enforces this invariant by asserting the serialised top-level and tuple key sets are exactly the declared JSON tags.

func LoadState

func LoadState(path string) (*State, error)

LoadState reads and deserialises a State from path. Returns a wrapped os.ErrNotExist when the file is missing (callers may errors.Is-check), a wrapped json error on parse failure, and an error for unsupported versions.

func (*State) Advance

func (s *State) Advance(t Tuple, path string) error

Advance flips the matching tuple's Done flag to true and persists via Save. Matching is by (Target, Mode, Type, Page). Returns error if no tuple matches.

func (*State) PendingTuples

func (s *State) PendingTuples() []Tuple

PendingTuples returns the Done=false tuples in their existing slice order. The returned slice is a copy — callers may mutate without affecting State.

func (*State) Save

func (s *State) Save(path string) error

Save serialises s to path atomically. The write goes to path+".tmp" first with mode 0600, then os.Rename moves it into place. POSIX rename on the same filesystem is atomic — a concurrent reader sees either the old state or the new state, never a partial write.

Version is auto-stamped to the current schema version on Save if unset.

type Tuple

type Tuple struct {
	Target string `json:"target"` // "beta" | "prod"
	Mode   string `json:"mode"`   // "anon" | "auth"
	Type   string `json:"type"`   // PeeringDB object type (poc, net, …)
	Page   int    `json:"page"`   // 1-based page number
	Done   bool   `json:"done"`   // true once bytes are on disk
}

Tuple identifies one unit of work in the capture walk: one (target, mode, type, page) combination. Tuples are created up-front by EnumerateTuples and flipped to Done=true as each successful fetch+write completes.

Field names map to lowercase JSON keys so the persisted file uses the PeeringDB-style lowercase convention and so the T-57-04 whitelist test can assert exact-key equality.

func EnumerateTuples

func EnumerateTuples(target string, modes []string, types []string, pages int) []Tuple

EnumerateTuples produces the ordered (mode outer, type middle, page inner) tuple space for a capture run. Order matters: anon fetches come before auth fetches so an interrupted partial run keeps the anon-only signal intact for downstream inspection. Within a mode, iterate types alphabetically, and within a type, pages ascending.

Returns tuples with Done=false; Target is stamped on every tuple.

func (Tuple) String

func (t Tuple) String() string

String returns a deterministic stringification suitable for log attributes and error wrapping: "{target}/{mode}/{type}/page-{N}".

type TypeReport

type TypeReport struct {
	AnonRowCount      int          `json:"anon_row_count"`
	AuthRowCount      int          `json:"auth_row_count"`
	AuthOnlyRowCount  int          `json:"auth_only_row_count"`
	VisibleValuesAnon []string     `json:"visible_values_anon,omitempty"`
	VisibleValuesAuth []string     `json:"visible_values_auth,omitempty"`
	Fields            []FieldDelta `json:"fields"`
}

TypeReport describes the anon vs auth delta for a single PeeringDB type. Row counts + field names + the controlled `visible` enum only — no values, no lengths, no hashes.

func Diff

func Diff(typeName string, anonBytes, authBytes []byte) (TypeReport, error)

Diff compares an anon/auth envelope pair for a single PeeringDB type and returns a TypeReport. The caller is responsible for assembling TypeReports into a full Report (covering all 13 types and both targets).

Diff is a pure function: same input bytes yield byte-stable output.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL