couples

package
v1.0.0-rc.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2026 License: Apache-2.0, Apache-2.0 Imports: 28 Imported by: 0

README

Couples Analysis

Preface

In a software system, components that change together, stay together. Logical coupling often differs from static dependency coupling.

Problem

Sometimes two files are logically coupled (e.g., a frontend view and a backend API handler) but have no direct static reference. When one changes, the other must change. If a developer forgets this, bugs occur.

How analyzer solves it

The Couples analyzer looks at the commit history to find files that are frequently modified in the same commit ("co-changed"). It also looks at developers who work on the same files.

Historical context

Logical Coupling (or Evolutionary Coupling) analysis has been a research topic since the late 90s. It reveals hidden dependencies that static analysis misses.

Real world examples

  • Hidden Dependencies: Discovering that changing Config.java almost always requires changing DeployScript.sh.
  • Team Coordination: Identifying developers who should coordinate because they frequently touch the same code areas.

How analyzer works here

  1. Commit Analysis: For each commit, it lists the set of changed files.
  2. Co-occurrence Matrix: It builds a matrix where matrix[fileA][fileB] counts how many commits included both fileA and fileB.
  3. Developer Coupling: Similarly, it tracks which developers edit the same files, building a developer-developer interaction matrix.

Limitations

  • Commit Granularity: It assumes that atomic commits represent logical units of work. "Squash commits" or bad commit discipline can skew results.
  • Matrix Size: The file matrix can get very large ($N^2$) for large repositories.

Further plans

  • Temporal coupling: detecting files changed within a short time window but not necessarily in the same commit.

Documentation

Overview

Package couples provides couples functionality.

Index

Constants

View Source
const (
	ConfigCouplesCouplingThresholdHigh      = "Couples.CouplingThresholdHigh"
	ConfigCouplesOwnershipFewThreshold      = "Couples.OwnershipFewThreshold"
	ConfigCouplesOwnershipModerateThreshold = "Couples.OwnershipModerateThreshold"
	ConfigCouplesBatchCouplingThreshold     = "Couples.BatchCouplingThreshold"
	ConfigCouplesHLLPrecision               = "Couples.HLLPrecision"
	ConfigCouplesTopKPerFile                = "Couples.TopKPerFile"
	ConfigCouplesMinEdgeWeight              = "Couples.MinEdgeWeight"
)

Configuration option keys for the couples analyzer.

View Source
const (
	ReportSectionTitle = "COUPLES"

	MetricTotalFiles      = "Total Files"
	MetricTotalDevelopers = "Total Developers"
	MetricTotalCoChanges  = "Total Co-Changes"
	MetricHighlyCoupled   = "Highly Coupled Pairs"
	MetricAvgCoupling     = "Avg Coupling"

	// DistStrongMin is the minimum coupling strength for "Strong" distribution bucket.
	DistStrongMin   = 0.7
	DistModerateMin = 0.4
	DistWeakMin     = 0.1
	DistLabelStrong = "Strong (>70%)"
	DistLabelMod    = "Moderate (40-70%)"
	DistLabelWeak   = "Weak (10-40%)"
	DistLabelNone   = "Minimal (<10%)"

	// IssueSeverityHighMin is the minimum coupling strength for "high" severity issues.
	IssueSeverityHighMin = 0.7
	IssueSeverityMedMin  = 0.4

	DefaultStatusMsg = "No coupling data available"
)

Section rendering constants.

View Source
const (
	KindFileCoupling = "file_coupling"
	KindDevMatrix    = "dev_matrix"
	KindOwnership    = "ownership"
	KindAggregate    = "aggregate"
)

Store record kind constants.

View Source
const (
	DefaultTopKPerFile   = 100
	DefaultMinEdgeWeight = 2
	DefaultMaxDevs       = 20
)

Default limits for bounded store output.

View Source
const (
	// CouplesMaximumMeaningfulContextSize is the maximum number of files in a commit
	// to consider for coupling analysis. Commits exceeding this threshold are skipped
	// because they are typically bulk operations (vendor updates, mass renames,
	// formatting) that produce noise rather than meaningful coupling signal.
	// Memory impact: N files → N² coupling entries × ~200 bytes.
	// At 200 files: 40K entries ≈ 8 MB. At 1000 files: 1M entries ≈ 200 MB.
	CouplesMaximumMeaningfulContextSize = 200
)
View Source
const CouplingThresholdHigh = 10

CouplingThresholdHigh is the coupling strength threshold.

Variables

View Source
var ErrInvalidMatrix = errors.New("invalid couples report: expected []map[int]int64 for PeopleMatrix")

ErrInvalidMatrix indicates the report doesn't contain expected matrix data.

View Source
var ErrInvalidNames = errors.New("invalid couples report: expected []string for ReversedPeopleDict")

ErrInvalidNames indicates the report doesn't contain expected names data.

View Source
var ErrInvalidReversedPeopleDict = errors.New("expected []string for reversedPeopleDict")

ErrInvalidReversedPeopleDict indicates a type assertion failure for reversedPeopleDict.

View Source
var ErrUnexpectedAggregator = errors.New("unexpected aggregator type: expected *couples.Aggregator")

ErrUnexpectedAggregator indicates a type assertion failure for the aggregator.

Functions

func FilterTopDevs

func FilterTopDevs(matrix []map[int]int64, names []string, limit int) (filtered []map[int]int64, filteredNames []string)

FilterTopDevs limits a developer coupling matrix to the top N developers ranked by diagonal value (activity). Returns the original data if within limit.

func GenerateStoreSections

func GenerateStoreSections(reader analyze.ReportReader) ([]plotpage.Section, error)

GenerateStoreSections reads pre-computed coupling data from a ReportReader and builds the same plot sections as GenerateSections, without materializing a full Report or dense O(N²) matrix.

func RegisterPlotSections

func RegisterPlotSections()

RegisterPlotSections registers the couples plot section renderer with the analyze package.

Types

type AggregateData

type AggregateData struct {
	TotalFiles          int     `json:"total_files"           yaml:"total_files"`
	TotalDevelopers     int     `json:"total_developers"      yaml:"total_developers"`
	TotalCoChanges      int64   `json:"total_co_changes"      yaml:"total_co_changes"`
	AvgCouplingStrength float64 `json:"avg_coupling_strength" yaml:"avg_coupling_strength"`
	HighlyCoupledPairs  int     `json:"highly_coupled_pairs"  yaml:"highly_coupled_pairs"`
}

AggregateData contains summary statistics.

type AggregateMetric

type AggregateMetric struct {
	metrics.MetricMeta
}

AggregateMetric computes summary statistics.

func NewAggregateMetric

func NewAggregateMetric() *AggregateMetric

NewAggregateMetric creates the aggregate metric.

func (*AggregateMetric) ComputeWithOptions

func (m *AggregateMetric) ComputeWithOptions(input *ReportData, opts MetricOptions) AggregateData

ComputeWithOptions calculates aggregate statistics with configurable thresholds.

type Aggregator

type Aggregator struct {
	// contains filtered or unexported fields
}

Aggregator implements analyze.Aggregator for the couples analyzer. It accumulates the file co-occurrence matrix, per-person file touches, per-person commit counts, and rename tracking from the TC stream.

func (*Aggregator) Add

func (a *Aggregator) Add(tc analyze.TC) error

Add ingests a single per-commit TC into the aggregator.

func (*Aggregator) Close

func (a *Aggregator) Close() error

Close releases all resources. Idempotent.

func (*Aggregator) Collect

func (a *Aggregator) Collect() error

Collect reloads spilled file coupling state back into memory.

func (*Aggregator) DiscardState

func (a *Aggregator) DiscardState()

DiscardState clears all in-memory cumulative state without serialization.

func (*Aggregator) DrainCommitStats

func (a *Aggregator) DrainCommitStats() (stats map[string]any, tickHashes map[int][]gitlib.Hash)

DrainCommitStats implements analyze.CommitStatsDrainer. It extracts and clears per-commit data, returning the same shape as ExtractCommitTimeSeries.

func (*Aggregator) EstimatedStateSize

func (a *Aggregator) EstimatedStateSize() int64

EstimatedStateSize returns the current in-memory footprint in bytes.

func (*Aggregator) FlushAllTicks

func (a *Aggregator) FlushAllTicks() ([]analyze.TICK, error)

FlushAllTicks returns a single TICK containing all accumulated coupling data. Couples accumulates cumulatively across all commits.

func (*Aggregator) FlushTick

func (a *Aggregator) FlushTick(tick int) (analyze.TICK, error)

FlushTick returns the aggregated TICK for the given tick index.

func (*Aggregator) RestoreSpillState

func (a *Aggregator) RestoreSpillState(info analyze.AggregatorSpillInfo)

RestoreSpillState points the aggregator at a previously-saved spill directory.

func (*Aggregator) Spill

func (a *Aggregator) Spill() (int64, error)

Spill writes accumulated file coupling state to disk to free memory.

func (*Aggregator) SpillState

func (a *Aggregator) SpillState() analyze.AggregatorSpillInfo

SpillState returns the current on-disk spill state for checkpoint persistence.

type CommitData

type CommitData struct {
	// CouplingFiles is the list of files forming the coupling context
	// (already filtered by CouplesMaximumMeaningfulContextSize).
	CouplingFiles []string
	// AuthorFiles maps file name to touch count for this commit's author.
	AuthorFiles map[string]int
	// Renames holds rename pairs detected in this commit.
	Renames []RenamePair
	// CommitCounted is true when this commit incremented the author's commit count.
	CommitCounted bool
}

CommitData is the per-commit TC payload emitted by Consume(). It captures the coupling context, author-file touches, renames, and whether the author's commit count was incremented.

type CommitSummary

type CommitSummary struct {
	FilesTouched int `json:"files_touched"`
	AuthorID     int `json:"author_id"`
}

CommitSummary holds per-commit summary data for timeseries output.

type ComputedMetrics

type ComputedMetrics struct {
	FileCoupling      []FileCouplingData      `json:"file_coupling"      yaml:"file_coupling"`
	DeveloperCoupling []DeveloperCouplingData `json:"developer_coupling" yaml:"developer_coupling"`
	FileOwnership     []FileOwnershipData     `json:"file_ownership"     yaml:"file_ownership"`
	Aggregate         AggregateData           `json:"aggregate"          yaml:"aggregate"`
}

ComputedMetrics holds all computed metric results for the couples analyzer.

func ComputeAllMetrics

func ComputeAllMetrics(report analyze.Report) (*ComputedMetrics, error)

ComputeAllMetrics runs all couples metrics and returns the results.

func ComputeAllMetricsWithOptions

func ComputeAllMetricsWithOptions(report analyze.Report, opts MetricOptions) (*ComputedMetrics, error)

ComputeAllMetricsWithOptions runs all couples metrics with configurable thresholds.

func (*ComputedMetrics) AnalyzerName

func (m *ComputedMetrics) AnalyzerName() string

AnalyzerName returns the analyzer identifier.

func (*ComputedMetrics) ToJSON

func (m *ComputedMetrics) ToJSON() any

ToJSON returns the metrics in JSON-serializable format.

func (*ComputedMetrics) ToYAML

func (m *ComputedMetrics) ToYAML() any

ToYAML returns the metrics in YAML-serializable format.

type DeveloperCouplingData

type DeveloperCouplingData struct {
	Developer1      string  `json:"developer1"                 yaml:"developer1"`
	Developer1Email string  `json:"developer1_email,omitempty" yaml:"developer1_email,omitempty"`
	Developer2      string  `json:"developer2"                 yaml:"developer2"`
	Developer2Email string  `json:"developer2_email,omitempty" yaml:"developer2_email,omitempty"`
	SharedFiles     int64   `json:"shared_file_changes"        yaml:"shared_file_changes"`
	Strength        float64 `json:"coupling_strength"          yaml:"coupling_strength"`
}

DeveloperCouplingData contains coupling data for a developer pair.

type DeveloperCouplingMetric

type DeveloperCouplingMetric struct {
	metrics.MetricMeta
}

DeveloperCouplingMetric computes developer collaboration coupling.

func NewDeveloperCouplingMetric

func NewDeveloperCouplingMetric() *DeveloperCouplingMetric

NewDeveloperCouplingMetric creates the developer coupling metric.

func (*DeveloperCouplingMetric) Compute

Compute calculates developer coupling data.

type FileCouplingData

type FileCouplingData struct {
	File1     string  `json:"file1"             yaml:"file1"`
	File2     string  `json:"file2"             yaml:"file2"`
	CoChanges int64   `json:"co_changes"        yaml:"co_changes"`
	Strength  float64 `json:"coupling_strength" yaml:"coupling_strength"`
}

FileCouplingData contains coupling data for a file pair.

type FileCouplingMetric

type FileCouplingMetric struct {
	metrics.MetricMeta
}

FileCouplingMetric computes file co-change coupling.

func NewFileCouplingMetric

func NewFileCouplingMetric() *FileCouplingMetric

NewFileCouplingMetric creates the file coupling metric.

func (*FileCouplingMetric) Compute

func (m *FileCouplingMetric) Compute(input *ReportData) []FileCouplingData

Compute calculates file coupling data.

type FileOwnershipData

type FileOwnershipData struct {
	File           string `json:"file"                      yaml:"file"`
	Lines          int    `json:"lines"                     yaml:"lines"`
	Contributors   int    `json:"contributors"              yaml:"contributors"`
	TopContributor string `json:"top_contributor,omitempty" yaml:"top_contributor,omitempty"`
}

FileOwnershipData contains ownership information for a file.

func SortOwnershipByRisk

func SortOwnershipByRisk(ownership []FileOwnershipData) []FileOwnershipData

SortOwnershipByRisk returns a copy sorted by contributors ascending (highest risk first).

type FileOwnershipMetric

type FileOwnershipMetric struct {
	metrics.MetricMeta
}

FileOwnershipMetric computes file ownership information.

func NewFileOwnershipMetric

func NewFileOwnershipMetric() *FileOwnershipMetric

NewFileOwnershipMetric creates the file ownership metric.

func (*FileOwnershipMetric) Compute

func (m *FileOwnershipMetric) Compute(input *ReportData) []FileOwnershipData

Compute calculates file ownership data.

Uses HyperLogLog sketches per file to estimate contributor cardinality instead of maintaining a map[int]bool per file. This reduces memory from O(F × D) to O(F × 2^p) where p is the HLL precision. Compute calculates file ownership data using default options.

func (*FileOwnershipMetric) ComputeWithOptions

func (m *FileOwnershipMetric) ComputeWithOptions(input *ReportData, opts MetricOptions) []FileOwnershipData

ComputeWithOptions calculates file ownership data with configurable HLL precision.

type HistoryAnalyzer

type HistoryAnalyzer struct {
	*analyze.BaseHistoryAnalyzer[*ComputedMetrics]
	common.IdentityMixin

	TreeDiff *plumbing.TreeDiffAnalyzer

	PeopleNumber int

	// TopKPerFile limits the number of file coupling pairs emitted by WriteToStoreFromAggregator.
	TopKPerFile int
	// MinEdgeWeight is the minimum co-change count for an edge to be emitted.
	MinEdgeWeight int64
	// contains filtered or unexported fields
}

HistoryAnalyzer identifies co-change coupling between files and developers.

func NewHistoryAnalyzer

func NewHistoryAnalyzer() *HistoryAnalyzer

NewHistoryAnalyzer creates a new HistoryAnalyzer.

func (*HistoryAnalyzer) ApplySnapshot

func (c *HistoryAnalyzer) ApplySnapshot(snap analyze.PlumbingSnapshot)

ApplySnapshot restores plumbing state from a previously captured snapshot.

func (*HistoryAnalyzer) Boot

func (c *HistoryAnalyzer) Boot() error

Boot restores the analyzer from hibernated state. Re-initializes the merge tracker for the next chunk.

func (*HistoryAnalyzer) CPUHeavy

func (c *HistoryAnalyzer) CPUHeavy() bool

CPUHeavy returns false because coupling analysis is lightweight file-pair bookkeeping.

func (*HistoryAnalyzer) CheckpointSize

func (c *HistoryAnalyzer) CheckpointSize() int64

CheckpointSize returns an estimated size of the checkpoint in bytes.

func (*HistoryAnalyzer) CleanupSpills

func (c *HistoryAnalyzer) CleanupSpills()

CleanupSpills is a no-op. The aggregator owns spill cleanup.

func (*HistoryAnalyzer) Configure

func (c *HistoryAnalyzer) Configure(facts map[string]any) error

Configure sets up the analyzer with the provided facts.

func (*HistoryAnalyzer) Consume

func (c *HistoryAnalyzer) Consume(_ context.Context, ac *analyze.Context) (analyze.TC, error)

Consume processes a single commit and returns a TC with coupling data.

func (*HistoryAnalyzer) CreateReportSection

func (c *HistoryAnalyzer) CreateReportSection(report analyze.Report) analyze.ReportSection

CreateReportSection creates a ReportSection from report data.

func (*HistoryAnalyzer) Description

func (c *HistoryAnalyzer) Description() string

Description returns a human-readable description of the analyzer.

func (*HistoryAnalyzer) Descriptor

func (c *HistoryAnalyzer) Descriptor() analyze.Descriptor

Descriptor returns stable analyzer metadata.

func (*HistoryAnalyzer) ExtractCommitTimeSeries

func (c *HistoryAnalyzer) ExtractCommitTimeSeries(report analyze.Report) map[string]any

ExtractCommitTimeSeries implements analyze.CommitTimeSeriesProvider. It extracts per-commit coupling summary data for the unified timeseries output.

func (*HistoryAnalyzer) Flag

func (c *HistoryAnalyzer) Flag() string

Flag returns the CLI flag for the analyzer.

func (*HistoryAnalyzer) Fork

Fork creates a copy of the analyzer for parallel processing. Each fork gets its own independent copies of mutable state (slices and maps).

func (*HistoryAnalyzer) GenerateSections

func (c *HistoryAnalyzer) GenerateSections(report analyze.Report) (sections []plotpage.Section, err error)

GenerateSections returns the sections for combined reports.

func (*HistoryAnalyzer) Hibernate

func (c *HistoryAnalyzer) Hibernate() error

Hibernate compresses the analyzer's state to reduce memory usage. Clears ephemeral working state that is chunk-scoped.

func (*HistoryAnalyzer) Initialize

func (c *HistoryAnalyzer) Initialize(_ *gitlib.Repository) error

Initialize prepares the analyzer for processing commits.

func (*HistoryAnalyzer) ListConfigurationOptions

func (c *HistoryAnalyzer) ListConfigurationOptions() []pipeline.ConfigurationOption

ListConfigurationOptions returns the configuration options for the analyzer.

func (*HistoryAnalyzer) LoadCheckpoint

func (c *HistoryAnalyzer) LoadCheckpoint(dir string) error

LoadCheckpoint restores the analyzer state from the given directory.

func (*HistoryAnalyzer) MapDependencies

func (c *HistoryAnalyzer) MapDependencies() []string

MapDependencies returns the required plumbing analyzers.

func (*HistoryAnalyzer) Merge

func (c *HistoryAnalyzer) Merge(branches []analyze.HistoryAnalyzer)

Merge combines results from forked analyzer branches.

func (*HistoryAnalyzer) Name

func (c *HistoryAnalyzer) Name() string

Name returns the name of the analyzer.

func (*HistoryAnalyzer) NewAggregator

NewAggregator creates a new aggregator for this analyzer.

func (*HistoryAnalyzer) ReleaseSnapshot

func (c *HistoryAnalyzer) ReleaseSnapshot(_ analyze.PlumbingSnapshot)

ReleaseSnapshot releases any resources owned by the snapshot.

func (*HistoryAnalyzer) SaveCheckpoint

func (c *HistoryAnalyzer) SaveCheckpoint(dir string) error

SaveCheckpoint writes the analyzer state to the given directory.

func (*HistoryAnalyzer) SequentialOnly

func (c *HistoryAnalyzer) SequentialOnly() bool

SequentialOnly returns false because couples analysis can be parallelized.

func (*HistoryAnalyzer) SnapshotPlumbing

func (c *HistoryAnalyzer) SnapshotPlumbing() analyze.PlumbingSnapshot

SnapshotPlumbing captures the current plumbing output state for one commit.

func (*HistoryAnalyzer) WriteToStoreFromAggregator

func (c *HistoryAnalyzer) WriteToStoreFromAggregator(
	ctx context.Context,
	agg analyze.Aggregator,
	w analyze.ReportWriter,
) error

WriteToStoreFromAggregator implements analyze.DirectStoreWriter. It handles Collect() internally with filtered merging: only coupling data for currently-existing files is loaded. This avoids materializing the full O(F²) coupling map for historical files (which can reach 50+ GB on kubernetes). Streams bounded pre-computed data:

  • "file_coupling": top-K file coupling pairs.
  • "dev_matrix": bounded developer coupling matrix (top-N devs).
  • "ownership": per-file contributor counts.
  • "aggregate": summary statistics.

type MetricOptions

type MetricOptions struct {
	CouplingThresholdHigh      int
	OwnershipFewThreshold      int
	OwnershipModerateThreshold int
	BatchCouplingThreshold     int
	HLLPrecision               int
}

MetricOptions holds configurable thresholds for couples metric computation.

func DefaultMetricOptions

func DefaultMetricOptions() MetricOptions

DefaultMetricOptions returns MetricOptions populated with package-level defaults.

type OwnershipBucket

type OwnershipBucket struct {
	Label string `json:"label" yaml:"label"`
	Count int    `json:"count" yaml:"count"`
}

OwnershipBucket categorizes files by their contributor count.

func BucketOwnership

func BucketOwnership(ownership []FileOwnershipData) []OwnershipBucket

BucketOwnership groups file ownership data into contributor count categories.

func BucketOwnershipWithThresholds

func BucketOwnershipWithThresholds(ownership []FileOwnershipData, fewThreshold, moderateThreshold int) []OwnershipBucket

BucketOwnershipWithThresholds groups file ownership data with configurable thresholds.

type RenamePair

type RenamePair struct {
	FromName string
	ToName   string
}

RenamePair represents a file rename detected in a single commit.

type ReportData

type ReportData struct {
	PeopleMatrix       []map[int]int64
	PeopleFiles        [][]int
	Files              []string
	FilesLines         []int
	FilesMatrix        []map[int]int64
	ReversedPeopleDict []string
}

ReportData is the parsed input data for couples metrics computation.

func ParseReportData

func ParseReportData(report analyze.Report) (*ReportData, error)

ParseReportData extracts ReportData from an analyzer report.

type ReportSection

type ReportSection struct {
	analyze.BaseReportSection
	// contains filtered or unexported fields
}

ReportSection implements analyze.ReportSection for couples analysis.

func NewReportSection

func NewReportSection(report analyze.Report) *ReportSection

NewReportSection creates a ReportSection from a couples report.

func (*ReportSection) AllIssues

func (s *ReportSection) AllIssues() []analyze.Issue

AllIssues returns all coupled file pairs sorted by strength descending.

func (*ReportSection) Distribution

func (s *ReportSection) Distribution() []analyze.DistributionItem

Distribution returns coupling strength distribution categories.

func (*ReportSection) KeyMetrics

func (s *ReportSection) KeyMetrics() []analyze.Metric

KeyMetrics returns the key metrics for the couples section.

func (*ReportSection) TopIssues

func (s *ReportSection) TopIssues(n int) []analyze.Issue

TopIssues returns the top N most coupled file pairs.

type StoreDevMatrix

type StoreDevMatrix struct {
	Names  []string        `json:"names"`
	Matrix []map[int]int64 `json:"matrix"`
}

StoreDevMatrix holds a bounded developer coupling matrix for store serialization.

type TickData

type TickData struct {
	// Files maps file -> otherFile -> co-occurrence count.
	Files map[string]map[string]int
	// People is per-author file touch counts, indexed by author ID.
	People []map[string]int
	// PeopleCommits is per-author commit counts, indexed by author ID.
	PeopleCommits []int
	// Renames accumulated during this tick.
	Renames []RenamePair
	// CommitStats holds per-commit summary data for timeseries output.
	CommitStats map[string]*CommitSummary
}

TickData is the per-tick aggregated payload stored in analyze.TICK.Data.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL