Documentation
¶
Overview ¶
Package realworld is the real-world ingest subsystem: it exports the public metadata and git history of a handful of the largest GitHub-native repositories into a pinned, normalized corpus, then seeds that corpus into a Githome instance so the read, write, search, git, and event paths can be exercised at a scale the small development fixture never reaches.
The subsystem is two stages that meet at one on-disk format, the Corpus:
- Stage A (export) turns a live source — a git mirror, the public GraphQL API, the GH Archive event stream — into a normalized Corpus written to a snapshot directory, pinned by a manifest so a later upstream change cannot silently move the numbers.
- Stage B (seed) reads a Corpus snapshot and writes it into a target store and git store through the bulk-seed write path, preserving the real numbers and timestamps.
Stage A needs network and, for the bulk API, credentials, so it is not exercised in unit tests; Stage B runs entirely against a local snapshot and a SQLite store, so the seeding, pseudonymization, replay, and capture logic is fully testable on the small fixtures checked in under testdata.
Index ¶
- Constants
- Variables
- func ExportToSnapshot(ctx context.Context, ex Exporter, refs []RepoRef, m *Manifest, root string) error
- func SelectSampleNumbers(numbers []int64) []int64
- func WriteCorpus(root string, c *Corpus) error
- type CaptureItem
- type CaptureKind
- type Checkpoint
- type Comment
- type CommitStatus
- type Corpus
- type DropNote
- type Exporter
- type FixtureExporter
- type GitMirrorPlan
- type Issue
- type Label
- type Manifest
- type OpClass
- type PRFile
- type Provenance
- type Pseudonymizer
- type PullRequest
- type RateBudget
- type ReactorPool
- type ReplayMode
- type ReplayPlan
- type RepoManifest
- type RepoRef
- type RequestMix
- type Review
- type ReviewComment
- type ScheduledRequest
- type SeedResult
- type TimelineEvent
Constants ¶
const ManifestName = "realworld-manifest.json"
ManifestName is the manifest filename at the root of a snapshot directory.
const ManifestSchema = 1
ManifestSchema is the manifest format version, bumped when the manifest shape changes so an old reader refuses a corpus it cannot interpret.
Variables ¶
var DefaultReactorPool = ReactorPool{Size: 200, Seed: 0x6e7e}
DefaultReactorPool is the reactor pool a corpus uses unless a manifest overrides it: 200 synthetic reactors, a fixed assignment seed.
var ErrRequiresNetwork = errors.New("realworld: this export source requires network access and credentials")
ErrRequiresNetwork is returned by an exporter whose source needs network or credentials that are not configured, so a caller in an offline or unit-test environment gets a clear signal rather than a silent empty corpus.
Functions ¶
func ExportToSnapshot ¶
func ExportToSnapshot(ctx context.Context, ex Exporter, refs []RepoRef, m *Manifest, root string) error
ExportToSnapshot runs an exporter over every repo in refs, writes each corpus into the snapshot root, and fills the manifest with the measured row counts and the provenance. A repo whose source is unreachable (ErrRequiresNetwork) is recorded as a drop rather than failing the whole run, so an offline build produces an honest partial snapshot the manifest names as partial. The manifest the caller passes is updated in place and saved at the root.
func SelectSampleNumbers ¶
SelectSampleNumbers returns the earliest, middle, and latest of a set of numbers, deduplicated and sorted. A short set returns just its distinct members. This is the "span the range" rule: the differ checks the oldest row, a middle row, and the newest, so a regression that only touches old or only touches new rows is still caught.
func WriteCorpus ¶
WriteCorpus writes one corpus into a snapshot directory, creating the per-repo layout. It does not write the manifest; the caller owns the manifest because it spans every repo in the snapshot.
Types ¶
type CaptureItem ¶
type CaptureItem struct {
Kind CaptureKind `json:"kind"`
Number int64 `json:"number"`
CapturedAt time.Time `json:"captured_at,omitzero"`
}
CaptureItem is one entry in the capture plan: a kind and the subject number to fetch. CapturedAt is zero in the plan and stamped when the response is actually captured.
func BuildCapturePlan ¶
func BuildCapturePlan(c *Corpus) []CaptureItem
BuildCapturePlan selects the golden-response sample for a corpus: the earliest/middle/latest issue, pull request, comment, timeline, review, and status, so the differ has old and new rows of every kind. The plan is the differ-capture sample manifest the build records.
type CaptureKind ¶
type CaptureKind string
CaptureKind names one kind of golden response.
const ( CaptureIssue CaptureKind = "issue" CapturePull CaptureKind = "pull" CaptureComment CaptureKind = "comment" CaptureTimeline CaptureKind = "timeline" CaptureReview CaptureKind = "review" CaptureStatus CaptureKind = "status" )
The capture kinds: one golden response per kind, sampled across the number range so the differ checks old and new rows of each.
type Checkpoint ¶
Checkpoint is the resumable-export journal: it records which repo/table pairs an export run has finished, so a run interrupted by rate-limit exhaustion or a crash resumes at the first unfinished pair instead of re-exporting from the top. It is a plain value the caller persists alongside the snapshot.
func (*Checkpoint) IsDone ¶
func (c *Checkpoint) IsDone(ref RepoRef, table string) bool
IsDone reports whether ref's table has already been exported.
func (*Checkpoint) Mark ¶
func (c *Checkpoint) Mark(ref RepoRef, table string)
Mark records ref's table as exported.
type Comment ¶
type Comment struct {
ID int64 `json:"id"`
IssueNumber int64 `json:"issue_number"`
Author string `json:"author"`
Body string `json:"body"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
Reactions map[string]int `json:"reactions,omitempty"`
AuthorAssociation string `json:"author_association,omitempty"`
}
Comment is one conversation comment. ID is the dataset id, used to order the db_id allocation deterministically; IssueNumber joins it to its issue.
type CommitStatus ¶
type CommitStatus struct {
SHA string `json:"sha"`
Context string `json:"context"`
State string `json:"state"`
Description string `json:"description,omitempty"`
TargetURL string `json:"target_url,omitempty"`
CreatedAt time.Time `json:"created_at"`
}
CommitStatus is one external pass/fail report against a head sha under a context. An automation-heavy repo carries many contexts per sha.
type Corpus ¶
type Corpus struct {
Repo RepoRef `json:"repo"`
Issues []Issue `json:"issues"`
PullRequests []PullRequest `json:"pull_requests"`
Comments []Comment `json:"comments"`
Reviews []Review `json:"reviews"`
ReviewComments []ReviewComment `json:"review_comments"`
TimelineEvents []TimelineEvent `json:"timeline_events"`
PRFiles []PRFile `json:"pr_files"`
CommitStatuses []CommitStatus `json:"commit_statuses"`
}
Corpus is the normalized metadata of one repository: the eight tables the public dataset and the GraphQL export both reduce to, with people named by login string and cross-references named by number or id. Stage B resolves the logins to user pks and the numbers to row pks as it seeds. A Corpus is the unit Stage A writes and Stage B reads; one snapshot directory holds one Corpus per repository.
func ReadCorpus ¶
ReadCorpus reads one repo's corpus back from a snapshot directory.
type DropNote ¶
type DropNote struct {
What string `json:"what"`
Count int `json:"count,omitempty"`
Reason string `json:"reason"`
}
DropNote records one bounded or skipped piece of a corpus build, with the reason, so coverage is never silently capped.
type Exporter ¶
type Exporter interface {
// Export returns the corpus for ref, or ErrRequiresNetwork when the source
// is not reachable in this environment.
Export(ctx context.Context, ref RepoRef) (*Corpus, error)
// Source names the exporter for logs and the manifest provenance.
Source() string
}
Exporter produces the metadata corpus for one repository from one source.
type FixtureExporter ¶
type FixtureExporter struct {
Root string
}
FixtureExporter reads a corpus back from a snapshot directory. It is the offline source: a previously exported snapshot, or a small checked-in fixture, re-read as if freshly exported. It is the exporter the tests and the seed-only CLI path use.
func (FixtureExporter) Source ¶
func (e FixtureExporter) Source() string
Source identifies the fixture exporter.
type GitMirrorPlan ¶
GitMirrorPlan is the recipe to mirror a repository's history into a git store, expressed as the commands to run rather than run inline, so the plan is testable and the network/disk-heavy execution is an explicit, separate step. The maintenance pass (repack, bitmap, commit-graph, multi-pack-index) is what makes a freshly cloned giant serve cold reads at the same speed a warmed long-lived repository does.
func (GitMirrorPlan) Commands ¶
func (p GitMirrorPlan) Commands(dest string) [][]string
Commands returns the git invocations the plan runs, in order: a bare mirror clone, a reset of the advertised tip to the pin so a fetch benchmark has real new commits to deliver, and the maintenance pass. dest is the bare repo path in the git store.
type Issue ¶
type Issue struct {
Number int64 `json:"number"`
NodeID string `json:"node_id,omitempty"`
IsPullRequest bool `json:"is_pull_request"`
Title string `json:"title"`
Body string `json:"body,omitempty"`
State string `json:"state"`
StateReason string `json:"state_reason,omitempty"`
Author string `json:"author"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
ClosedAt *time.Time `json:"closed_at,omitempty"`
Labels []Label `json:"labels,omitempty"`
Assignees []string `json:"assignees,omitempty"`
MilestoneTitle string `json:"milestone_title,omitempty"`
MilestoneNumber int64 `json:"milestone_number,omitempty"`
Reactions map[string]int `json:"reactions,omitempty"`
CommentCount int `json:"comment_count"`
Locked bool `json:"locked,omitempty"`
LockReason string `json:"lock_reason,omitempty"`
}
Issue is one row of the shared issue/PR table: an issue when IsPullRequest is false, the issue half of a pull request when true. Number is the per-repo number preserved verbatim. Reactions are counts per content (`{"+1": 5, "heart": 2}`), materialized into rows against the reactor pool at seed time. NodeID is the dataset's node id, recorded for provenance but never written: Githome mints its own GraphQL ids.
type Label ¶
type Label struct {
Name string `json:"name"`
Color string `json:"color,omitempty"`
Description string `json:"description,omitempty"`
}
Label is one label carried on an issue, deduped per repository at seed time.
type Manifest ¶
type Manifest struct {
Schema int `json:"schema"`
// Note is a human description of this corpus build; it is not load-bearing.
Note string `json:"note,omitempty"`
// DatasetRevision pins the metadata source (the dataset repo commit, or the
// GraphQL export run id). It is the metadata analog of the per-repo SHA.
DatasetRevision string `json:"dataset_revision"`
// FixtureTier names the tier this corpus serves: rw-smoke, rw-meta,
// rw-write, rw-git, or rw-full. Tiers bound how much a CI leg loads.
FixtureTier string `json:"fixture_tier"`
// Pseudonymized is true when logins and bodies were run through the
// pseudonymizer, so the corpus carries no real identities.
Pseudonymized bool `json:"pseudonymized"`
// Reactor records the bounded synthetic reactor pool the seeder materializes
// reaction counts against; reactions are the one MODELED count in a corpus.
Reactor ReactorPool `json:"reactor"`
// Repos is one entry per repository in this corpus.
Repos []RepoManifest `json:"repos"`
// SeederVersion and SchemaVersion pin the tooling and the store schema the
// corpus was built against, the rest of the reproducibility checklist.
SeederVersion string `json:"seeder_version,omitempty"`
SchemaVersion int `json:"schema_version,omitempty"`
// Dropped records anything this build bounded or skipped — a truncated
// table, an unreachable source, a sampled range — so a partial corpus never
// reads as a complete one.
Dropped []DropNote `json:"dropped,omitempty"`
}
Manifest pins a corpus and records what was measured and what was synthesized, so a corpus is reproducible and no reader mistakes a modeled value for a real one. It is the single file that freezes the corpus: the dataset revision and the per-repo git pins are its OFFICIAL anchors, the reactor pool and any pseudonymization are its MODELED notes, and Measured holds the row counts the seeder actually wrote rather than any count asserted up front.
func LoadManifest ¶
LoadManifest reads and validates a manifest from disk.
func NewManifest ¶
NewManifest builds a manifest for a tier with the default reactor pool and the current schema version, ready for the seeder to fill Measured into.
type OpClass ¶
type OpClass string
OpClass is one of the five operation classes the SLOs are stated against. The per-repo mix weights how often each class appears in a synthetic replay.
const ( // OpXCond is a conditional read: a GET that the client expects to answer // 304 from an ETag or since-cursor (the poll flood). OpXCond OpClass = "X-cond" // OpRMeta is a metadata read: an issue view, a list page, a PR view. OpRMeta OpClass = "R-meta" // OpRGit is a git read served over HTTP: a tree or blob fetch. OpRGit OpClass = "R-git" // OpTGit is a git transport operation: a clone or fetch. OpTGit OpClass = "T-git" // OpWMeta is a metadata write: open an issue, comment, merge a PR, apply a // label. OpWMeta OpClass = "W-meta" )
type PRFile ¶
type PRFile struct {
PRNumber int64 `json:"pr_number"`
Path string `json:"path"`
Additions int `json:"additions"`
Deletions int `json:"deletions"`
Status string `json:"status"`
PreviousFilename string `json:"previous_filename,omitempty"`
}
PRFile is one changed file of a pull request. These are not seeded as state: a PR's file list is derived from the git diff at request time. They are kept as a correctness oracle so the diff path can be checked against recorded add/delete counts.
type Provenance ¶
type Provenance string
Provenance records where a corpus value came from, so a reader never mistakes a modeled number for a measured one. It is carried in the manifest per repo and per synthesized field.
const ( // Official is copied verbatim from the public source: a real number, a real // timestamp, a real body. Official Provenance = "OFFICIAL" // Derived is computed from official data by a documented rule: a comment // count recounted from the comments, an event payload rendered from typed // columns. Derived Provenance = "DERIVED" // Modeled is synthesized to hit a realistic shape where the public source // has no per-row truth: the reaction reactor pool, the synthetic webhook // subscriptions, a pseudonymized login. Modeled Provenance = "MODELED" )
type Pseudonymizer ¶
type Pseudonymizer struct {
// RedactBodies replaces issue, comment, and review bodies with a
// length-preserving placeholder so the corpus carries no real prose while
// the marshaled-payload size stays realistic.
RedactBodies bool
// contains filtered or unexported fields
}
Pseudonymizer rewrites a corpus so it carries no real identities: every person login becomes a stable synthetic handle, and, when RedactBodies is set, every free-text body becomes a length-preserving placeholder. It is a pure transform — same input, same output — so a pseudonymized corpus is as reproducible as the original, and the mapping is recorded so a captured response can be compared field for field.
The repository's own owner and name are not pseudonymized: they identify the repository, not a person, and the manifest already records them. A login that also happens to be the owner is still rewritten where it appears as an author, actor, or assignee, so no real handle survives in the issue and event bodies.
func NewPseudonymizer ¶
func NewPseudonymizer(redactBodies bool) *Pseudonymizer
NewPseudonymizer returns a pseudonymizer with an empty mapping.
func (*Pseudonymizer) Apply ¶
func (p *Pseudonymizer) Apply(c *Corpus) *Corpus
Apply returns a pseudonymized copy of the corpus. The original is left unchanged. Logins are assigned in the corpus's first-seen order so the mapping is deterministic.
func (*Pseudonymizer) Mapping ¶
func (p *Pseudonymizer) Mapping() map[string]string
Mapping returns the login-to-pseudonym map built so far, copied so the caller cannot mutate the pseudonymizer's state.
type PullRequest ¶
type PullRequest struct {
Number int64 `json:"number"`
Merged bool `json:"merged"`
MergedAt *time.Time `json:"merged_at,omitempty"`
MergedBy string `json:"merged_by,omitempty"`
MergeCommitSHA string `json:"merge_commit_sha,omitempty"`
BaseRef string `json:"base_ref"`
HeadRef string `json:"head_ref"`
HeadSHA string `json:"head_sha"`
Additions int `json:"additions"`
Deletions int `json:"deletions"`
ChangedFiles int `json:"changed_files"`
Draft bool `json:"draft,omitempty"`
MaintainerCanModify bool `json:"maintainer_can_modify,omitempty"`
}
PullRequest is the PR-only extension joined to an Issue on Number.
type RateBudget ¶
type RateBudget struct {
// contains filtered or unexported fields
}
RateBudget bounds how many API points an export run may spend, the way the GraphQL API meters cost in points rather than requests. It is honest about exhaustion: once the budget is spent, Spend refuses further work so an export stops and checkpoints rather than hammering a throttled endpoint.
func NewRateBudget ¶
func NewRateBudget(total int) *RateBudget
NewRateBudget returns a budget of total points.
func (*RateBudget) Remaining ¶
func (b *RateBudget) Remaining() int
Remaining reports how many points are left, or a large number when the budget is unbounded (total <= 0).
func (*RateBudget) Spend ¶
func (b *RateBudget) Spend(n int) bool
Spend charges n points, returning false when the budget cannot cover them. The caller checkpoints and stops on false rather than proceeding.
func (*RateBudget) Spent ¶
func (b *RateBudget) Spent() int
Spent reports how many points have been charged.
type ReactorPool ¶
ReactorPool is the synthetic identity pool reaction counts are materialized against. Size bounds how many reactor users exist; Seed fixes the assignment so two builds produce the same rows.
type ReplayMode ¶
type ReplayMode string
ReplayMode names how a schedule's arrivals were generated.
const ( // SyntheticMix drives the request mix at a fixed rate; it answers whether the // SLOs hold at a chosen size and rate. SyntheticMix ReplayMode = "synthetic-mix" // TraceDriven replays the real event timeline, time-compressed, so the load // carries the real burstiness rather than a smooth arrival. TraceDriven ReplayMode = "trace-driven" )
type ReplayPlan ¶
type ReplayPlan struct {
Repo string `json:"repo"`
Mode ReplayMode `json:"mode"`
Mix RequestMix `json:"mix,omitempty"`
Compression float64 `json:"compression,omitempty"`
ReadWriteRatio int `json:"read_write_ratio,omitempty"`
Requests []ScheduledRequest `json:"requests"`
}
ReplayPlan is the full schedule for one repo, plus the parameters that shaped it, so a run is reproducible and a reviewer can see why the load looks the way it does. Compression and ReadWriteRatio are recorded for the trace-driven mode per the no-silent-caps rule.
func PlanSyntheticMix ¶
func PlanSyntheticMix(repo string, mix RequestMix, count, rps int) ReplayPlan
PlanSyntheticMix builds a fixed-rate schedule that holds the repo's request mix exactly over count requests at rps requests per second. Arrivals are evenly spaced (the harness's open model adds no jitter of its own), and the class of each arrival is assigned by the largest-remainder method so the realized class counts match the mix proportions exactly and deterministically — no RNG, so two builds of the same plan are identical.
func PlanTraceDriven ¶
func PlanTraceDriven(c *Corpus, compression float64, readWriteRatio int) ReplayPlan
PlanTraceDriven builds a schedule from the corpus's real event timeline. It extracts every state-changing event with its real timestamp (issue opened, comment added, PR merged, every timeline event), sorts by time, and time-compresses by compression so a year of history replays in a tractable wall-clock while the relative burstiness is preserved — a release-day spike stays a spike. Between writes it injects readWriteRatio reads of the repo's read classes, so the replay is not write-only. The compression factor and the ratio are carried on the plan so the run records them.
type RepoManifest ¶
type RepoManifest struct {
Repo RepoRef `json:"repo"`
Provenance Provenance `json:"provenance"`
Rows map[string]int `json:"rows,omitempty"`
// GitBytes and TreeEntries are the measured git artifact; zero until a
// mirror is cloned and measured.
GitBytes int64 `json:"git_bytes,omitempty"`
TreeEntries int `json:"tree_entries,omitempty"`
}
RepoManifest pins one repository's git side and records the measured row counts of its metadata, with the provenance of the whole entry.
type RepoRef ¶
type RepoRef struct {
Owner string `json:"owner"`
Name string `json:"name"`
DefaultBranch string `json:"default_branch"`
MirrorURL string `json:"mirror_url,omitempty"`
PinnedSHA string `json:"pinned_sha,omitempty"`
}
RepoRef identifies one repository in a corpus and pins its git history. The owner and name preserve the real namespace so URLs match real GitHub paths; MirrorURL and PinnedSHA freeze the git side the way DatasetRevision freezes the metadata side.
func ReadRepoRef ¶
ReadRepoRef reads only the pinned RepoRef (repo.json) for one repo in a snapshot, without loading any table. It lets a streaming caller record the pinned git side in a manifest without materializing the corpus; a missing or unreadable repo.json falls back to the ref passed in.
type RequestMix ¶
RequestMix is the per-class weighting of a repo's load, in whole-number percentages that sum to 100. It is a MODELED shape: it is derived from the repo's real read/write character and the subsystem that repo stresses, not measured per request, so it is recorded in the manifest as such.
func MixFor ¶
func MixFor(nwo string) RequestMix
MixFor returns the request mix for a repo by owner/name, falling back to the aggregate platform mix.
type Review ¶
type Review struct {
ID int64 `json:"id"`
PRNumber int64 `json:"pr_number"`
Author string `json:"author"`
State string `json:"state"`
Body string `json:"body,omitempty"`
SubmittedAt *time.Time `json:"submitted_at,omitempty"`
CommitID string `json:"commit_id,omitempty"`
}
Review is one act of reviewing a pull request.
type ReviewComment ¶
type ReviewComment struct {
ID int64 `json:"id"`
PRNumber int64 `json:"pr_number"`
ReviewID int64 `json:"review_id"`
Author string `json:"author"`
Body string `json:"body"`
Path string `json:"path"`
Line *int64 `json:"line,omitempty"`
Side string `json:"side,omitempty"`
DiffHunk string `json:"diff_hunk,omitempty"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
InReplyToID *int64 `json:"in_reply_to_id,omitempty"`
}
ReviewComment is one inline review comment anchored to a diff line. ReviewID joins it to its review; InReplyToID threads a reply under the comment that started a conversation, so threads reassemble.
type ScheduledRequest ¶
type ScheduledRequest struct {
Offset time.Duration `json:"offset"`
Class OpClass `json:"class"`
Repo string `json:"repo"`
Number int64 `json:"number,omitempty"`
}
ScheduledRequest is one planned request: its offset from the start of the replay, its operation class, the repo it targets, and the subject it touches (an issue or PR number for a metadata op, empty for a transport op). It is the unit the load harness fires.
type SeedResult ¶
SeedResult reports what a corpus seed wrote, so the caller can fold it into the manifest as the measured artifact.
func SeedCorpus ¶
func SeedCorpus(ctx context.Context, st *store.Store, c *Corpus, reactor ReactorPool) (*SeedResult, error)
SeedCorpus writes one corpus into a store through the bulk-seed write path, preserving every per-repo number and timestamp. The whole repository loads in one transaction so it lands whole or rolls back. The caller migrates the store first (or passes a migrated one); SeedCorpus does not migrate, so a multi-repo seed shares one schema.
Determinism: the tables are seeded in a fixed order (issues by number, comments and reviews by id, and so on), so the db_id sequence advances the same way on every run, and reactions are materialized against the bounded reactor pool with a fixed assignment, so two seeds of the same corpus produce identical databases.
func SeedSnapshot ¶
func SeedSnapshot(ctx context.Context, st *store.Store, root string, ref RepoRef, reactor ReactorPool) (*SeedResult, error)
SeedSnapshot seeds one repo's corpus straight from its on-disk snapshot, streaming each table from disk and releasing it before the next, so the seeder never holds the whole corpus in memory at once. Peak memory is one table plus the foreign-key resolution maps, not the sum of every table's rows, which is what lets a multi-hundred-thousand-row repo seed without loading gigabytes of bodies into RAM. It is the scale counterpart to SeedCorpus, which takes an already-materialized corpus and is the path the in-memory and pseudonymized flows use.
type TimelineEvent ¶
type TimelineEvent struct {
ID int64 `json:"id"`
IssueNumber int64 `json:"issue_number"`
EventType string `json:"event_type"`
Actor string `json:"actor,omitempty"`
CreatedAt time.Time `json:"created_at"`
LabelName string `json:"label_name,omitempty"`
LabelColor string `json:"label_color,omitempty"`
Assignee string `json:"assignee_login,omitempty"`
Milestone string `json:"milestone_title,omitempty"`
TitleFrom string `json:"title_from,omitempty"`
TitleTo string `json:"title_to,omitempty"`
RefType string `json:"ref_type,omitempty"`
RefNumber int64 `json:"ref_number,omitempty"`
LockReason string `json:"lock_reason,omitempty"`
Data map[string]any `json:"data,omitempty"`
}
TimelineEvent is one lifecycle event. EventType maps to the issue_events event column; the typed columns and Data blob render into the event payload. This is the largest table in an automation-heavy corpus.