flakereport

package

v1.31.0 Latest Latest Go to latest Published: Apr 29, 2026 License: MIT Imports: 23 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/temporalio/temporal

Links

Open Source Insights

Documentation ¶

Overview ¶

Package flakereport implements Bayesian commit bisection for identifying which commit most likely introduced a flaky test in a CI system.

Problem ¶

A flaky test is one whose failure probability changes at some point in the commit history — typically because a code change introduced a race condition, timing sensitivity, or environmental dependency. Given a rolling window of CI runs (each associated with the HEAD commit at the time), we want to rank candidate commits by their posterior probability of being the "transition commit" that caused the flakiness.

Data ¶

The raw input is a set of test runs downloaded from GitHub Actions artifacts. Each run records whether a specific test passed or failed, and is tagged with the workflow RunID. RunIDs are mapped to commit SHAs via the GitHub Actions API (WorkflowRun.HeadSHA). Runs are then grouped into CommitObservation records: for each (test, commit SHA) pair we count the number of passing and failing runs.

Model ¶

The inference model assumes exactly one transition commit c in the history:

Before commit c: the test has a background failure probability p_before.
At commit c and all later commits: the test has an elevated failure probability p_after.

Neither p_before nor p_after is known, so both are assigned independent uniform (Beta(1,1)) priors. The model treats them as nuisance parameters and marginalizes them out, yielding the Beta-Binomial marginal likelihood:

P(data | transition at c) = BetaBinomial(n_before, k_before; 1, 1)
                           × BetaBinomial(n_after,  k_after;  1, 1)

where n_before / k_before are the total runs / failures before commit c, and n_after / k_after are the total runs / failures at and after commit c. The closed form is:

BetaBinomial(n, k; α, β) = Beta(k+α, n-k+β) / Beta(α, β)

All arithmetic is performed in log-space to avoid floating-point underflow, with the log-sum-exp trick applied during normalization.

Commit Priors ¶

The prior probability that a given commit is the transition commit is not uniform across all commits. Commits that only touch documentation, CI configuration, or test files cannot plausibly affect production code behaviour, so they receive a reduced prior weight (down to 0.05×). Commits that touch source code receive the default weight of 1.0. These heuristic weights are fetched from the GitHub API in parallel and applied before running the inference.

Inference Algorithm ¶

For N commits with observations, the algorithm runs in O(N) time using prefix sums:

Compute prefix sums of failures and passes across the commit sequence.
For each candidate transition commit i, use the prefix sums to split the data into "before" and "after" segments and evaluate the log marginal likelihood.
Add the log prior weight to each log-likelihood.
Normalize via log-sum-exp to obtain posterior probabilities that sum to 1.

Output ¶

Commits are ranked by posterior probability. Only commits above a configurable minimum probability threshold (default 0.50) are reported. Results are surfaced in the GitHub Actions step summary, a markdown report artifact, and optionally a Slack message listing the "hottest" commits (those with the highest aggregate posterior probability across all analyzed tests).

Limitations ¶

The model assumes a single transition in the observed window. If multiple commits each contributed independently to flakiness, or if the failure rate oscillates, the model may identify the wrong commit or produce a diffuse posterior with no strong suspect. The model also requires a minimum of 5 failures and 30 total runs before it will attempt bisection on a test, to avoid over-fitting noisy data.

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewCliApp ¶

func NewCliApp() *cli.App

NewCliApp instantiates a new instance of the CLI application

Types ¶

type ArtifactJob ¶

type ArtifactJob struct {
	Repo         string
	RunID        int64
	RunCreatedAt time.Time
	Artifact     WorkflowArtifact
	TempDir      string
	RunNumber    int
	TotalRuns    int
	ArtifactNum  int
}

ArtifactJob represents a job to download and process an artifact

type ArtifactResult ¶

type ArtifactResult struct {
	Failures []TestFailure
	AllRuns  []TestRun
	Error    error
}

ArtifactResult represents the result of processing an artifact

type ArtifactsResponse ¶

type ArtifactsResponse struct {
	TotalCount int                `json:"total_count"`
	Artifacts  []WorkflowArtifact `json:"artifacts"`
}

ArtifactsResponse represents the GitHub API response for artifacts

type BisectConfig ¶

type BisectConfig struct {
	Repo           string
	TopN           int // max tests to analyze; 0 = all qualifying tests
	MinFailures    int
	MinRuns        int
	MinProbability float64 // only report tests whose top suspect exceeds this (0–1); 0 = report all
}

BisectConfig holds configuration for a bisect analysis run.

type BisectResult ¶

type BisectResult struct {
	CommitSHA     string
	CommitIdx     int
	Probability   float64 // posterior P(this commit introduced the flakiness)
	PassesBefore  int
	FailsBefore   int
	PassesAfter   int
	FailsAfter    int
	CommitTitle   string
	CommitAuthor  string
	CommitDate    string // formatted date of the commit, e.g. "2024-01-15"
	HeuristicNote string // e.g. "only touches .github/ — deprioritized"
}

BisectResult is one candidate culprit commit with its posterior probability.

type CommitMeta ¶

type CommitMeta struct {
	SHA         string
	Title       string
	Author      string
	CommittedAt time.Time
	Files       []string // relative paths of changed files
}

CommitMeta holds changed-file info fetched from the GitHub API. GET /repos/{owner}/{repo}/commits/{sha}

type CommitObservation ¶

type CommitObservation struct {
	CommitSHA     string
	CommitIdx     int     // chronological index (0 = oldest)
	Prior         float64 // prior weight (1.0 = uniform; adjusted by heuristics)
	HeuristicNote string  // reason for prior adjustment, if any
	Passes        int
	Fails         int
}

CommitObservation holds aggregated pass/fail data for a single (test, commit) pair.

type FailedTestRecord ¶

type FailedTestRecord struct {
	SuiteName   string `json:"suite_name"`
	TestName    string `json:"test_name"`
	FailureDate string `json:"failure_date"`
	Link        string `json:"link"`
	FailureType string `json:"failure_type"`
}

FailedTestRecord represents a single test failure for the failures.json analytics export

type ReportSummary ¶

type ReportSummary struct {
	FlakyTests         []TestReport
	Timeouts           []TestReport  // Tests ending with "(timeout)"
	Crashes            []TestReport  // Tests containing "crash"
	CIBreakers         []TestReport  // Tests that failed all retries (3x) in a single job
	Suites             []SuiteReport // Per-suite flake breakdown
	TotalFailures      int           // Total raw failure count
	TotalTestRuns      int           // Total test executions (all tests, all runs)
	OverallFailureRate float64       // Overall failures per 1000 test runs
	TotalFlakyCount    int           // Total flaky tests (not just top 10)
	TotalWorkflowRuns  int           // Total workflow runs analyzed
	SuccessfulRuns     int           // Workflow runs that succeeded
}

ReportSummary contains all processed report data

type SlackBlock ¶

type SlackBlock struct {
	Type   string      `json:"type"`
	Text   *SlackText  `json:"text,omitempty"`
	Fields []SlackText `json:"fields,omitempty"`
}

type SlackMessage ¶

type SlackMessage struct {
	Text   string       `json:"text"`
	Blocks []SlackBlock `json:"blocks"`
}

SlackMessage represents Slack Block Kit message

type SlackText ¶

type SlackText struct {
	Type string `json:"type"`
	Text string `json:"text"`
}

type SuiteReport ¶

type SuiteReport struct {
	SuiteName   string    // Test suite name from JUnit XML
	FlakeRate   float64   // Percentage of job executions with at least one non-retry failure
	FailedRuns  int       // Number of job executions with at least one non-retry failure
	TotalRuns   int       // Total number of job executions where this suite appeared
	LastFailure time.Time // Timestamp of the most recent failure
}

SuiteReport represents aggregated flake data for a test suite

type TestBisectReport ¶

type TestBisectReport struct {
	TestName    string
	TopSuspects []BisectResult // sorted by Probability descending
	TotalObs    int            // total observations (pass + fail) used
	Skipped     bool           // true if below signal or confidence threshold
}

TestBisectReport is the full bisect output for a single test.

type TestFailure ¶

type TestFailure struct {
	ClassName  string    // Test class/module name
	Name       string    // Test function name
	SuiteName  string    // Top-level test suite name
	ArtifactID string    // Artifact identifier from GitHub
	RunID      int64     // GitHub Actions run ID
	JobID      string    // GitHub Actions job ID (or "unknown")
	MatrixName string    // DB config name from artifact name (e.g. "sqlite", "cassandra")
	Timestamp  time.Time // When the workflow run was created
}

TestFailure represents a single test failure extracted from JUnit XML

type TestReport ¶

type TestReport struct {
	TestName     string    // Normalized test name (retry suffix stripped)
	FailureCount int       // Total number of failures
	TotalRuns    int       // Total number of times this test ran (including successes)
	GitHubURLs   []string  // Up to max_links failure URLs
	LastFailure  time.Time // Timestamp of the most recent failure
}

TestReport represents aggregated failures for a single test

type TestRun ¶

type TestRun struct {
	SuiteName  string // Top-level test suite name
	Name       string // Test name
	Failed     bool   // Whether the test failed
	Skipped    bool   // Whether the test was skipped
	RunID      int64  // Workflow run ID
	JobID      string // GitHub Actions job ID (unique per matrix job/shard)
	MatrixName string // DB config name from artifact name (e.g. "sqlite", "cassandra")
}

TestRun represents a test execution (success or failure)

type WorkflowArtifact ¶

type WorkflowArtifact struct {
	ID        int64     `json:"id"`
	Name      string    `json:"name"`
	CreatedAt time.Time `json:"created_at"`
	Expired   bool      `json:"expired"`
}

WorkflowArtifact represents a downloadable artifact

type WorkflowRun ¶

type WorkflowRun struct {
	ID         int64     `json:"id"`
	Number     int       `json:"run_number"`
	CreatedAt  time.Time `json:"created_at"`
	Status     string    `json:"status"`
	Conclusion string    `json:"conclusion"`
	HeadBranch string    `json:"head_branch"`
	HeadSHA    string    `json:"head_sha"`
}

WorkflowRun represents a GitHub Actions workflow run

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL