Documentation
¶
Overview ¶
Package visbaseline captures and diffs unauthenticated vs authenticated PeeringDB API responses for all 13 object types, with a strict no-PII-in-repo guarantee enforced by the Redact function in this package.
The central hard constraint (phase 57 D-02) is: raw PeeringDB data must NOT be committed in raw form. Authenticated responses carry email, phone, and legal-name data that upstream withholds from anonymous callers; this package inspects those responses in memory, emits a placeholder-only shape, and produces a structural diff report for downstream test fixtures.
Index ¶
- Constants
- Variables
- func BuildReport(ctx context.Context, cfg BuildReportConfig) error
- func CleanupStatePath(path string) error
- func IsPIIField(name string) bool
- func Redact(anonBytes, authBytes []byte) ([]byte, error)
- func RedactDir(ctx context.Context, cfg RedactDirConfig) error
- func WriteJSON(w io.Writer, rep Report) error
- func WriteMarkdown(w io.Writer, rep Report) error
- type BuildReportConfig
- type Capture
- type Config
- type FieldDelta
- type PromptAnswer
- type RedactDirConfig
- type Report
- type State
- type Tuple
- type TypeReport
Constants ¶
const ( PlaceholderString = "<auth-only:string>" PlaceholderNumber = "<auth-only:number>" PlaceholderBool = "<auth-only:bool>" )
Placeholder strings substituted for redacted values in committed auth fixtures. The "<auth-only:TYPE>" format is intentionally not valid JSON in any identifier position, making it impossible to confuse with real data and trivial to grep for in CI audits.
const DefaultStatePath = "/tmp/pdb-vis-capture-state.json"
DefaultStatePath is the canonical location of the capture checkpoint file. /tmp is used deliberately so the checkpoint survives process exit but not reboots, matching operator expectations for a one-shot CLI.
const ReportSchemaVersion = 1
ReportSchemaVersion is the frozen schema version for diff.json. Bump on any non-additive change to the emitted structure.
Variables ¶
var AllTypes = []string{
"campus", "carrier", "carrierfac", "fac", "ix", "ixfac",
"ixlan", "ixpfx", "net", "netfac", "netixlan", "org", "poc",
}
AllTypes is the canonical list of PeeringDB object types captured in the visibility baseline walk. Mirrored from cmd/pdbcompat-check/main.go (lines 22-26) because Go import-direction hygiene forbids `internal` packages from importing `cmd` packages.
var PIIFields = []string{
"address1",
"address2",
"city",
"email",
"latitude",
"longitude",
"name",
"phone",
"policy_email",
"policy_phone",
"sales_email",
"sales_phone",
"state",
"tech_email",
"tech_phone",
"zipcode",
}
PIIFields is the authoritative allow-list of field names that carry personal data in PeeringDB responses. Any field whose JSON name appears in this list is ALWAYS replaced with a placeholder in redacted auth fixtures, regardless of whether the same field is present in the corresponding anonymous response.
Derived from internal/peeringdb/types.go: Organization, Facility, InternetExchange, Poc, Campus, and association tables (netfac, ixfac). The list is deliberately conservative — a field is PII if it identifies or locates an individual or organisation's physical presence.
Deliberately excluded (not PII by this policy):
- notes: business-owned free text, publishable.
- country: geographic region, not a locating identifier on its own.
- website, url, looking_glass, route_server: publishable URLs.
- name_long, aka: organisation/network display names, not personal names.
The list is sorted for stable review diffs when it is updated.
var ProdTypes = []string{"net", "org", "poc"}
ProdTypes is the reduced list used for the prod confirmation pass per phase 57 D-04. Only the high-signal privacy types are fetched against production to stay well inside the rate-limit quota.
Functions ¶
func BuildReport ¶
func BuildReport(ctx context.Context, cfg BuildReportConfig) error
BuildReport walks BaselineRoot, runs Diff per type, and emits DIFF.md + diff.json at OutDir. See BuildReportConfig for the dual-shape contract.
GO-CFG-1 fail-fast validation:
- BaselineRoot must be non-empty and a readable directory.
- OutDir must be non-empty and must NOT be the filesystem root.
- Single-target shape requires both anon/ and auth/ subdirs present.
- Multi-target shape requires at least one target subdir with both anon/ and auth/ present.
func CleanupStatePath ¶
CleanupStatePath removes the checkpoint file. A missing file is not an error — removal is idempotent so callers (New / Run completion) do not have to pre-check existence.
func IsPIIField ¶
IsPIIField reports whether the given JSON field name is in the PII allow-list. Comparison is exact (case-sensitive) to match PeeringDB's lowercase-underscore field naming convention.
func Redact ¶
Redact applies the phase 57 redaction policy to an authenticated PeeringDB response and returns bytes safe to commit. The policy:
- Preserve envelope shape, row count, field names, integer row ids, and the `visible` enum value (controlled vocabulary we need for the diff).
- Replace any value whose field name appears in PIIFields with a typed placeholder, regardless of whether the field appears in anon. This is defence-in-depth against upstream anon bugs.
- For non-PII fields, replace values with a typed placeholder when: - the auth row has no matching anon row (match by `id`), OR - the anon row lacks the field name. Otherwise keep the auth value verbatim (anon already publishes it).
- Re-marshal with json.MarshalIndent. Go's encoding/json sorts map keys lexicographically in MarshalIndent output since Go 1.12, so the result is deterministic — same input bytes always produce same output bytes.
Redact is a pure function: no filesystem, no network, no global state, no goroutines. Errors from json.Unmarshal / json.MarshalIndent are wrapped with %w. Input bytes are never echoed in error messages.
func RedactDir ¶
func RedactDir(ctx context.Context, cfg RedactDirConfig) error
RedactDir walks a raw-auth staging tree, pairs each page with its anon counterpart, runs Redact, and writes the redacted bytes under Dst.
Errors are fail-fast: a missing anon counterpart, a malformed JSON, or a failed write halts the walk. Partial output under Dst is left as-is for the operator to inspect — the caller is expected to `rm -rf` Dst before a retry. Raw auth input bytes are never echoed in error messages (T-57-05).
RedactDir honours ctx between files: cancellation mid-walk returns ctx.Err() wrapped.
func WriteJSON ¶
WriteJSON serialises the report as JSON with stable (alphabetical) map key ordering. The "generated" field is taken from rep.GeneratedAt; callers that want deterministic output across runs should stamp a fixed time.
WriteJSON normalises nil Fields slices to []FieldDelta{} so they marshal as "[]" rather than "null". Empty-array is semantically "no deltas observed"; the schema stays stable for downstream JSON consumers that expect arrays. See 57-03-PLAN.md Task 2 note.
HTML escaping is disabled so that placeholder strings such as "<auth-only:string>" remain greppable in diff.json and committed fixtures — json.Encoder's default escapes "<" and ">" to "\u003c"/"\u003e" which would defeat the CI grep audit for placeholder sentinels. Matches the encoder configuration in internal/visbaseline/redact.go.
Types ¶
type BuildReportConfig ¶
type BuildReportConfig struct {
BaselineRoot string
OutDir string
GeneratedAt time.Time // optional; defaults to time.Now().UTC()
Logger *slog.Logger
}
BuildReportConfig parameterises a BuildReport run. Grouped per GO-CS-5.
BaselineRoot is resolved in one of two shapes:
Single-target: BaselineRoot directly contains "anon/" and "auth/" subdirs (e.g. testdata/visibility-baseline/beta). BuildReport emits a single DIFF.md + diff.json at OutDir covering the type-level deltas for that target, with top-level keys namespaced by type name only.
Multi-target: BaselineRoot contains per-target subdirs each with "anon/" + "auth/" (e.g. testdata/visibility-baseline/ containing beta/ and prod/). BuildReport emits a unified DIFF.md + diff.json at OutDir whose type keys are namespaced "{target}/{type}" AND emits per-target auxiliary DIFF-{target}.md files for reviewers who only want one target at a time.
The caller picks the shape by choosing BaselineRoot; BuildReport auto-detects which one is present. An ambiguous tree (both direct anon/auth subdirs AND per-target subdirs) is rejected as a user error.
type Capture ¶
type Capture struct {
// contains filtered or unexported fields
}
Capture coordinates the per-tuple walk of PeeringDB pages for the phase 57 visibility baseline. Construct via New; run with Run.
func New ¶
New validates cfg and constructs a Capture. Fails fast when required fields are missing or when auth mode is requested without an API key — no HTTP calls happen in New.
func (*Capture) Run ¶
Run executes the capture walk. Returns the path to the private /tmp directory holding raw auth bytes (for the redactor to consume in plan 03), or an error. Context cancellation is honoured between tuples and during rate-limit sleeps — returned error wraps ctx.Err().
On successful completion the checkpoint file is removed. On early exit (error, cancel) the checkpoint persists so a later invocation can resume.
type Config ¶
type Config struct {
// Target is the capture label ("beta" | "prod"). Used only as a tag
// inside State tuples; the actual base URL is BaseURL.
Target string
// BaseURL is the PeeringDB root (e.g. https://beta.peeringdb.com). No
// trailing slash. Must be non-empty.
BaseURL string
// Modes is the subset of {"anon", "auth"} to capture.
Modes []string
// Types is the subset of PeeringDB object types to walk. Use AllTypes
// for beta; ProdTypes for prod.
Types []string
// Pages is the number of pages per (mode, type) to fetch. Defaults to
// 2 when <1. The phase 57 baseline uses 2.
Pages int
// OutDir is the REPO-SIDE output root for anon fixtures. Auth fixtures
// are written EXCLUSIVELY under a private /tmp directory created per
// run — NEVER under OutDir. This invariant is enforced by the capture
// loop and asserted by TestCaptureWritesAuthBytesToTmpOnly.
OutDir string
// APIKey is the PeeringDB API key. Required when Modes contains "auth".
// Must never be logged (T-57-05); the Capture implementation uses the
// peeringdb.Client's built-in Authorization header path, which does
// not log the value.
APIKey string
// StatePath is the checkpoint file path. Defaults to DefaultStatePath
// when empty.
StatePath string
// Logger receives structured capture events. Required.
Logger *slog.Logger
// InterTupleDelay is an optional pause between tuples. Tests set this
// small enough to keep wall-clock low; prod runs leave it zero (the
// peeringdb.Client rate limiter already paces requests).
InterTupleDelay time.Duration
// RateLimitJitter is the extra pause added on top of RateLimitError
// RetryAfter before re-fetching the same tuple. Defaults to 5 seconds
// in prod; tests override to 1ms to keep runtime low.
RateLimitJitter time.Duration
// ClientOverride, when non-nil, is used by New instead of constructing
// a *peeringdb.Client internally. TESTS ONLY. Prod callers must leave
// nil.
//
// Rationale: tests need to call client.SetRateLimit and
// SetRetryBaseDelay before Run to keep wall-clock under 1 second. A
// package-global mutable hook is race-prone under t.Parallel(); a
// per-Config field has no shared mutable state.
//
// When both Modes contains "auth" and ClientOverride is set, the
// override is used for BOTH anon and auth clients — tests that need
// to assert auth-header presence should set the override WithAPIKey
// and inspect the request on the httptest server side.
ClientOverride *peeringdb.Client
// PromptReader is the source for the Resume/Restart prompt. Defaults
// to os.Stdin when nil. Tests inject strings.NewReader.
PromptReader io.Reader
// PromptWriter is the sink for the Resume/Restart prompt text.
// Defaults to os.Stderr when nil. Tests inject io.Discard.
PromptWriter io.Writer
}
Config parameterises a capture run. Grouped per GO-CS-5 (input structs for >2-arg callers).
Required fields: Target, BaseURL, Modes, Types, OutDir, Logger. Optional fields fall back to documented defaults. When Modes contains "auth", APIKey MUST be set or New returns an error.
type FieldDelta ¶
type FieldDelta struct {
Name string `json:"name"`
AuthOnly bool `json:"auth_only"`
Placeholder string `json:"placeholder,omitempty"` // "<auth-only:TYPE>" sentinel
RowsAdded int `json:"rows_added,omitempty"` // count of auth rows with this field absent-in-anon
ValueSetDrift bool `json:"value_set_drift,omitempty"` // true for "visible" when new enum values appear
IsPII bool `json:"is_pii,omitempty"` // IsPIIField(Name) at construction time
}
FieldDelta describes a single field-level observation. It DOES NOT carry field values, lengths, hashes, or any signal that could fingerprint the underlying data. See threat T-57-02 and 57-RESEARCH.md Pitfall 4.
type PromptAnswer ¶
type PromptAnswer int
PromptAnswer enumerates operator responses to the resume/restart prompt.
const ( // Restart is the safe default — discard old state and enumerate fresh // tuples. Returned on EOF, empty input, unrecognised keywords. Restart PromptAnswer = iota // Resume continues with the saved state, skipping Done=true tuples. Resume )
func PromptResumeOrRestart ¶
func PromptResumeOrRestart(r io.Reader, w io.Writer) PromptAnswer
PromptResumeOrRestart asks the operator via r (typically os.Stdin) whether to resume from a saved checkpoint or restart. Writes the prompt text to w (typically os.Stderr so it does not contaminate stdout pipelines). Reads one line via bufio.NewScanner.
Safe defaults: any input that is not recognised as a resume keyword returns Restart. This includes EOF, empty lines, garbage, and "no"/"restart". The bias is "when in doubt, start fresh" — a replayed fetch is cheap; a skipped tuple is a silent correctness failure.
Resume keywords (case-insensitive, TrimSpace): "resume", "r", "continue", "c".
type RedactDirConfig ¶
RedactDirConfig parameterises a RedactDir run. Grouped per GO-CS-5 because the caller passes more than two arguments.
Layout expectations:
- AuthSrc points at a raw-auth staging tree laid down by Capture, i.e. .../auth/api/{type}/page-N.json. Both the "auth/api/..." subpath and the flat "api/..." form are accepted: RedactDir walks AuthSrc and treats every *.json file whose relative path contains ".../{type}/ page-N.json" as a page to redact.
- AnonDir points at the repo-side anon mirror, i.e. .../anon/api/{type}/ page-N.json. RedactDir maps each auth page to its anon counterpart by {type}+{page}; missing pairs are an error (an anon page is the only source of truth for "which fields are already public", so skipping it would over-disclose).
- Dst is the repo-side redacted auth destination, i.e. .../auth/api/{type}/page-N.json. RedactDir creates directories as needed with mode 0700 and writes files with mode 0600.
type Report ¶
type Report struct {
SchemaVersion int `json:"schema_version"`
GeneratedAt time.Time `json:"generated"`
Targets []string `json:"targets"`
Types map[string]TypeReport `json:"types"`
}
Report is the per-run diff artifact. It carries aggregate counts and field-level deltas for each PeeringDB type, but NEVER actual field values (except for the controlled enum `visible`). See threat T-57-02.
type State ¶
State is the capture checkpoint.
State carries ONLY tuple metadata. No response bytes, no API keys, no payload. See threat T-57-04 in the phase 57-02 threat register: the checkpoint file is written to /tmp and could be read by other users on multi-tenant hosts, so it must never contain sensitive data. The TestCheckpointContainsNoPayload unit test enforces this invariant by asserting the serialised top-level and tuple key sets are exactly the declared JSON tags.
func LoadState ¶
LoadState reads and deserialises a State from path. Returns a wrapped os.ErrNotExist when the file is missing (callers may errors.Is-check), a wrapped json error on parse failure, and an error for unsupported versions.
func (*State) Advance ¶
Advance flips the matching tuple's Done flag to true and persists via Save. Matching is by (Target, Mode, Type, Page). Returns error if no tuple matches.
func (*State) PendingTuples ¶
PendingTuples returns the Done=false tuples in their existing slice order. The returned slice is a copy — callers may mutate without affecting State.
func (*State) Save ¶
Save serialises s to path atomically. The write goes to path+".tmp" first with mode 0600, then os.Rename moves it into place. POSIX rename on the same filesystem is atomic — a concurrent reader sees either the old state or the new state, never a partial write.
Version is auto-stamped to the current schema version on Save if unset.
type Tuple ¶
type Tuple struct {
Target string `json:"target"` // "beta" | "prod"
Mode string `json:"mode"` // "anon" | "auth"
Type string `json:"type"` // PeeringDB object type (poc, net, …)
Page int `json:"page"` // 1-based page number
Done bool `json:"done"` // true once bytes are on disk
}
Tuple identifies one unit of work in the capture walk: one (target, mode, type, page) combination. Tuples are created up-front by EnumerateTuples and flipped to Done=true as each successful fetch+write completes.
Field names map to lowercase JSON keys so the persisted file uses the PeeringDB-style lowercase convention and so the T-57-04 whitelist test can assert exact-key equality.
func EnumerateTuples ¶
EnumerateTuples produces the ordered (mode outer, type middle, page inner) tuple space for a capture run. Order matters: anon fetches come before auth fetches so an interrupted partial run keeps the anon-only signal intact for downstream inspection. Within a mode, iterate types alphabetically, and within a type, pages ascending.
Returns tuples with Done=false; Target is stamped on every tuple.
type TypeReport ¶
type TypeReport struct {
AnonRowCount int `json:"anon_row_count"`
AuthRowCount int `json:"auth_row_count"`
AuthOnlyRowCount int `json:"auth_only_row_count"`
VisibleValuesAnon []string `json:"visible_values_anon,omitempty"`
VisibleValuesAuth []string `json:"visible_values_auth,omitempty"`
Fields []FieldDelta `json:"fields"`
}
TypeReport describes the anon vs auth delta for a single PeeringDB type. Row counts + field names + the controlled `visible` enum only — no values, no lengths, no hashes.
func Diff ¶
func Diff(typeName string, anonBytes, authBytes []byte) (TypeReport, error)
Diff compares an anon/auth envelope pair for a single PeeringDB type and returns a TypeReport. The caller is responsible for assembling TypeReports into a full Report (covering all 13 types and both targets).
Diff is a pure function: same input bytes yield byte-stable output.