injectioncorpus

package
v1.0.0-beta.113 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 19, 2026 License: MIT Imports: 8 Imported by: 0

README

injection_corpus

Labeled prompt-injection examples consumed by the agentic-governance embedding classifier. Phase 2 of ADR-043.

File format

JSONL. One record per line. Lines starting with # are treated as comments and skipped. Schema:

{
  "id": "stable-record-id",
  "text": "the labeled input",
  "signal": "instruction-override",
  "source": "internal-seed-v0/instruction_override"
}
Field Required Notes
id yes Stable identifier — used as governance.injection.top_match_id when this record is the nearest neighbor. Phase 3 detonator writes hex sha256 of text.
text yes The labeled input (injection attempt or benign counter-example).
signal yes One of the buckets enumerated in ADR-043 line 206. Becomes governance.injection.signal on match.
source no Provenance string. Persisted for audit; not used in classification.

Signal buckets (ADR-043)

Signal Meaning
instruction-override Attempt to redirect, replace, or invalidate the prior instruction context. Covers the eight regex categories in the legacy injection_patterns.go.
network-egress Markdown-URL exfil, instructions to call outbound network APIs.
data-access Instructions to enumerate or summarize data the agent has access to (sources, queries, profiles).
code-exec Instructions targeting code execution / shell tooling.
filesystem-read Instructions targeting filesystem reads.
exfil-email Instructions to send email or messages out of band.
secret-access Instructions targeting credentials, env vars, API keys.
cred-enum Instructions to enumerate users, identities, accounts.
benign Legitimate text that uses adjacent vocabulary; teaches the boundary.

Bootstrap source: internal_seed_v0.jsonl

A small (~35 records) hand-curated seed, sourced from:

  1. The Examples []string fields of DefaultInjectionPatterns in injection_patterns.go — the existing regex placeholder. These are direct-injection examples.
  2. Hand-authored indirect-injection examples covering the OSINT threat shapes ADR-043 centers on: page-embedded directives, markdown-URL exfil, tool-shadowing.
  3. Hand-authored benign counter-examples: legitimate text that uses words like system, instructions, override, rules, base64, etc. without being an injection.

This seed exists to prove the loader contract end-to-end and to provide a non-trivial smoke-test corpus. It is not the production corpus. The Phase 2 measurement protocol will demonstrate what this seed can and cannot detect, and the Phase 3 detonator will broaden distribution against real OSINT scrape input.

Adding additional sources

Future PRs:

  • deepset_v1.jsonl — vendored subset of deepset/prompt-injections (CC-BY-4.0, attribution required).
  • greshake_v1.jsonl — derived from Greshake scenarios/ (greshake/llm-security, MIT). Indirect-injection gold.
  • detonator-{tenant}.jsonl — written by the Phase 3 detonator; per-tenant in production.

Each additional source gets its own Source entry in the loader configuration; the classifier aggregates across all of them. License attribution lives in docs/operations/NN-injection-corpus-attribution.md per the ADR-043 §"Bootstrap corpus" section.

Documentation

Overview

Package injectioncorpus loads labeled injection examples into the shape consumed by graph/query.EmbeddingClassifier.

Phase 2 of ADR-043: one bootstrap source, JSONL on disk, normalized to []*query.DomainExamples on read. JSONL is chosen over single-JSON so the Phase 3 detonator can append labels without rewriting the file.

One record per line:

{"id": "sha256-hex", "text": "...", "signal": "instruction-override", "source": "internal-seed-v0"}

The `text` field becomes Example.Query (the substrate predates injection use; the field name is a query-router legacy). The `signal` field becomes Example.Intent — that is the surface rules match on via governance.injection.signal.

Index

Constants

View Source
const (
	OptionKeyID     = "id"
	OptionKeySource = "source"
)

OptionKey constants name the keys used inside Example.Options for corpus-loaded records. Phase 2b runtime-side accessors read these when building governance.injection.top_match_id triples; centralising the key names here keeps the contract typo-proof.

Variables

This section is empty.

Functions

func Load

func Load(sources []Source) ([]*query.DomainExamples, error)

Load reads one or more JSONL corpus files and returns the result as the []*query.DomainExamples shape the classifier consumes.

Aggregates errors across files via errors.Join so a misconfigured deployment sees every problem on a single boot. Also detects duplicate record IDs across sources — Phase 3 detonator workers and vendored public corpora overlapping the internal seed are realistic collision paths, and silent dedup hides which record actually became the nearest-neighbor.

Types

type Record

type Record struct {
	// ID is a stable identifier for the record. Recommended: hex
	// sha256 of the text. Becomes governance.injection.top_match_id
	// when this record is the nearest neighbor.
	ID string `json:"id"`

	// Text is the labeled input (the injection attempt, or a
	// benign counter-example).
	Text string `json:"text"`

	// Signal is the bucket the text belongs to (one of the values
	// enumerated in ADR-043 line 206). The classifier emits this
	// as governance.injection.signal on match.
	Signal string `json:"signal"`

	// Source identifies the origin of the record (e.g.,
	// "internal-seed-v0", "deepset/prompt-injections@<sha>").
	// Persisted for provenance; not used in classification.
	Source string `json:"source,omitempty"`
}

Record is the on-disk JSONL shape. Kept in this package rather than graph/query so the corpus format can evolve without touching the query-router substrate.

type Source

type Source struct {
	// Domain is a free-form tag (e.g., "injection-internal-seed",
	// "injection-deepset"). Surfaces in DomainExamples.Domain.
	Domain string

	// Version is the corpus revision tag (e.g., "v0.1").
	Version string

	// Path is the JSONL file to load.
	Path string
}

Source describes one corpus file plus the domain label that will be attached to its examples. The domain label is metadata only — the classifier aggregates examples across domains.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL