sentiment

package
v1.0.0-rc.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2026 License: Apache-2.0, Apache-2.0 Imports: 29 Imported by: 0

README

Sentiment Analysis

Preface

The mood of the development team is often reflected in their communication. Comments in code can serve as a proxy for team morale and code quality culture.

Problem

Burnout, frustration, and toxic environments are killer for productivity. It's hard to detect these trends in a distributed team without reading every message.

How analyzer solves it

The Sentiment analyzer scans source code comments introduced in each commit. It classifies the text as Positive, Negative, or Neutral using VADER sentiment analysis enhanced with software engineering domain adjustments.

Historical context

Sentiment analysis (Opinion Mining) is a subfield of NLP. Applying it to software engineering (SE) data is a growing research area (e.g., "Emotion Mining in Software Engineering"). VADER (Hutto & Gilbert, 2014) is a rule-based model designed for short text; it achieves F1=0.96 on social media text. However, it requires domain adjustment for SE contexts where technical terms like "kill", "abort", and "fatal" are emotionally neutral.

Real world examples

  • Burnout Detection: A trend of increasingly negative comments might indicate team stress.
  • Hotspots: Specific files or modules that trigger negative comments (e.g., "this hack fix again").
  • Technical Debt: Comments containing "workaround", "hack", "kludge" signal accumulated shortcuts.

How analyzer works here

  1. Comment Extraction: Uses UAST to find comment nodes in the added/changed code.
  2. Preprocessing: Cleans up the text (removes code snippets, formatting). Supports Unicode/multilingual comments.
  3. SE-Domain Adjustment: Technical terms that VADER misclassifies are adjusted toward neutral (e.g., "kill process") or toward negative (e.g., "hacky workaround").
  4. Length-Weighted Scoring: Longer comments carry more weight in the final score.
  5. Regression-Based Trend: Linear regression instead of first/last comparison for robust trend detection.

Key Features

  • Multilingual comment extraction — Unicode-aware regex patterns support CJK, Cyrillic, Arabic, and all Unicode scripts
  • SE-domain lexicon — Technical terms adjusted to avoid false negatives/positives
  • Length-weighted scoring — Longer, more substantive comments carry proportionally more weight
  • Linear regression trend — Robust to outliers and intermediate noise
  • Enhanced visualization — Multi-series plot with threshold bands, trend line, comment count, and pie chart distribution

Limitations

  • VADER is English-optimized: Sentiment scoring accuracy degrades for non-English text
  • Sarcasm: "Great job breaking production" might be classified as positive
  • Context: Domain-specific meaning beyond the lexicon is not captured

Documentation

Overview

Package sentiment provides sentiment functionality.

Index

Constants

View Source
const (
	// ConfigCommentSentimentMinLength is the configuration key for the minimum comment length.
	ConfigCommentSentimentMinLength = "CommentSentiment.MinLength"
	// ConfigCommentSentimentGap is the configuration key for the sentiment gap threshold.
	ConfigCommentSentimentGap = "CommentSentiment.Gap"
	// ConfigCommentSentimentNeutralizerWeight is the configuration key for the SE domain neutralizer weight.
	ConfigCommentSentimentNeutralizerWeight = "CommentSentiment.NeutralizerWeight"
	// ConfigCommentSentimentMaxWeightRatio is the configuration key for the max comment weight ratio.
	ConfigCommentSentimentMaxWeightRatio = "CommentSentiment.MaxWeightRatio"
	// ConfigCommentSentimentPositiveThreshold is the config key for the positive sentiment classification threshold.
	ConfigCommentSentimentPositiveThreshold = "CommentSentiment.PositiveThreshold"
	// ConfigCommentSentimentNegativeThreshold is the config key for the negative sentiment classification threshold.
	ConfigCommentSentimentNegativeThreshold = "CommentSentiment.NegativeThreshold"
	// ConfigCommentSentimentTrendThreshold is the config key for the trend direction threshold.
	ConfigCommentSentimentTrendThreshold = "CommentSentiment.TrendThreshold"
	// ConfigCommentSentimentLowRiskThreshold is the config key for the low sentiment risk threshold.
	ConfigCommentSentimentLowRiskThreshold = "CommentSentiment.LowSentimentRiskThreshold"

	// DefaultCommentSentimentCommentMinLength is the default minimum comment length for sentiment analysis.
	DefaultCommentSentimentCommentMinLength = 20
	// DefaultCommentSentimentGap is the default gap threshold for sentiment analysis.
	DefaultCommentSentimentGap = float32(0.5)

	// CommentLettersRatio defines the minimum ratio of letters in a comment.
	CommentLettersRatio = 0.6
)
View Source
const (
	SentimentPositiveThreshold = 0.6
	SentimentNegativeThreshold = 0.4
)

Sentiment thresholds and constants.

View Source
const (
	KindTimeSeries = "time_series"
	KindTrend      = "trend"
	KindAggregate  = "aggregate"
)

Store record kind constants.

View Source
const DimSentiment = "sentiment"

DimSentiment is the dimension name for sentiment time series extraction.

View Source
const (
	MinCommentLengthThresholdHigh = 10
)

MinCommentLengthThresholdHigh is the minimum character length for a comment to be included in sentiment analysis.

Variables

This section is empty.

Functions

func AggregateCommitsToTicks

func AggregateCommitsToTicks(
	commentsByCommit map[string][]string,
	commitsByTick map[int][]gitlib.Hash,
) (commentsByTick map[int][]string, emotionsByTick map[int]float32)

AggregateCommitsToTicks groups per-commit comment data into per-tick comments and emotions by merging all commits belonging to each tick.

func ComputeSentiment

func ComputeSentiment(comments []string) float32

ComputeSentiment returns a score in [0, 1] for the given comments. 0 = negative, 0.5 = neutral, 1 = positive. Uses VADER with SE-domain adjustments for NLP-based analysis. Empty comments yield 0 (no comment implies no sentiment signal). Comments are weighted by length (longer comments carry more signal).

func ComputeSentimentWithOptions

func ComputeSentimentWithOptions(comments []string, opts ScorerOptions) float32

ComputeSentimentWithOptions returns a sentiment score with configurable parameters.

func GenerateStoreSections

func GenerateStoreSections(reader analyze.ReportReader) ([]plotpage.Section, error)

GenerateStoreSections reads pre-computed sentiment data from a ReportReader and builds the same plot sections as GenerateSections, without materializing a full Report or recomputing metrics.

func RegisterPlotSections

func RegisterPlotSections()

RegisterPlotSections registers the sentiment plot section renderer with the analyze package.

func RegisterStoreTimeSeriesExtractor

func RegisterStoreTimeSeriesExtractor()

RegisterStoreTimeSeriesExtractor registers the sentiment analyzer's store-based time series extractor with the anomaly package for cross-analyzer anomaly detection.

func RenderTerminal

func RenderTerminal(metrics *ComputedMetrics) string

RenderTerminal returns a colored, human-readable terminal representation.

Types

type AggregateData

type AggregateData struct {
	TotalTicks       int     `json:"total_ticks"       yaml:"total_ticks"`
	TotalComments    int     `json:"total_comments"    yaml:"total_comments"`
	TotalCommits     int     `json:"total_commits"     yaml:"total_commits"`
	AverageSentiment float32 `json:"average_sentiment" yaml:"average_sentiment"`
	PositiveTicks    int     `json:"positive_ticks"    yaml:"positive_ticks"`
	NeutralTicks     int     `json:"neutral_ticks"     yaml:"neutral_ticks"`
	NegativeTicks    int     `json:"negative_ticks"    yaml:"negative_ticks"`
}

AggregateData contains summary statistics.

type Analyzer

type Analyzer struct {
	*analyze.BaseHistoryAnalyzer[*ComputedMetrics]
	common.NoStateHibernation

	UAST  *plumbing.UASTChangesAnalyzer
	Ticks *plumbing.TicksSinceStart

	MinCommentLength int
	Gap              float32
	// contains filtered or unexported fields
}

Analyzer tracks comment sentiment across commit history.

func NewAnalyzer

func NewAnalyzer() *Analyzer

NewAnalyzer creates a new sentiment analyzer.

func (*Analyzer) ApplySnapshot

func (s *Analyzer) ApplySnapshot(snap analyze.PlumbingSnapshot)

ApplySnapshot restores plumbing state from a previously captured snapshot.

func (*Analyzer) CPUHeavy

func (s *Analyzer) CPUHeavy() bool

CPUHeavy indicates this analyzer does heavy computation.

func (*Analyzer) Configure

func (s *Analyzer) Configure(facts map[string]any) error

Configure sets up the analyzer with the provided facts.

func (*Analyzer) Consume

func (s *Analyzer) Consume(ctx context.Context, ac *analyze.Context) (analyze.TC, error)

Consume processes a single commit and returns a TC with extracted comments. The analyzer does not retain any per-commit state; all output is in the TC.

func (*Analyzer) ExtractCommitTimeSeries

func (s *Analyzer) ExtractCommitTimeSeries(report analyze.Report) map[string]any

ExtractCommitTimeSeries extracts per-commit sentiment data from a finalized report.

func (*Analyzer) Fork

func (s *Analyzer) Fork(n int) []analyze.HistoryAnalyzer

Fork creates a copy of the analyzer for parallel processing.

func (*Analyzer) GenerateChart

func (s *Analyzer) GenerateChart(report analyze.Report) (components.Charter, error)

GenerateChart implements PlotGenerator interface.

func (*Analyzer) GenerateSections

func (s *Analyzer) GenerateSections(report analyze.Report) ([]plotpage.Section, error)

GenerateSections returns the sections for combined reports.

func (*Analyzer) Initialize

func (s *Analyzer) Initialize(_ *gitlib.Repository) error

Initialize prepares the analyzer for processing commits.

func (*Analyzer) Merge

func (s *Analyzer) Merge(_ []analyze.HistoryAnalyzer)

Merge is a no-op. Per-commit results are emitted as TCs.

func (*Analyzer) NeedsUAST

func (s *Analyzer) NeedsUAST() bool

NeedsUAST returns true to enable the UAST pipeline.

func (*Analyzer) ReleaseSnapshot

func (s *Analyzer) ReleaseSnapshot(snap analyze.PlumbingSnapshot)

ReleaseSnapshot releases UAST trees owned by the snapshot.

func (*Analyzer) SnapshotPlumbing

func (s *Analyzer) SnapshotPlumbing() analyze.PlumbingSnapshot

SnapshotPlumbing captures the current plumbing output state for parallel execution.

func (*Analyzer) WriteToStore

func (s *Analyzer) WriteToStore(ctx context.Context, ticks []analyze.TICK, w analyze.ReportWriter) error

WriteToStore implements analyze.StoreWriter. It extracts comment data from TICKs, computes sentiment scores, and streams pre-computed metrics as individual records:

  • "time_series": per-tick TimeSeriesData records (sorted by tick).
  • "trend": single TrendData record.
  • "aggregate": single AggregateData record.

type CommitResult

type CommitResult struct {
	// Comments contains filtered comment texts from this commit's UAST changes.
	Comments []string
}

CommitResult is the per-commit TC payload for the sentiment analyzer. It holds the filtered comment texts extracted from UAST changes for a single commit.

type ComputedMetrics

type ComputedMetrics struct {
	TimeSeries          []TimeSeriesData         `json:"time_series"           yaml:"time_series"`
	Trend               TrendData                `json:"trend"                 yaml:"trend"`
	LowSentimentPeriods []LowSentimentPeriodData `json:"low_sentiment_periods" yaml:"low_sentiment_periods"`
	Aggregate           AggregateData            `json:"aggregate"             yaml:"aggregate"`
}

ComputedMetrics holds all computed metric results for the sentiment analyzer.

func ComputeAllMetrics

func ComputeAllMetrics(report analyze.Report) (*ComputedMetrics, error)

ComputeAllMetrics runs all sentiment metrics with default options.

func ComputeAllMetricsWithOptions

func ComputeAllMetricsWithOptions(report analyze.Report, opts MetricOptions) (*ComputedMetrics, error)

ComputeAllMetricsWithOptions runs all sentiment metrics with configurable thresholds.

func (*ComputedMetrics) AnalyzerName

func (m *ComputedMetrics) AnalyzerName() string

AnalyzerName returns the name of the analyzer that produced these metrics.

func (*ComputedMetrics) ToJSON

func (m *ComputedMetrics) ToJSON() any

ToJSON returns the metrics in a format suitable for JSON marshaling.

func (*ComputedMetrics) ToYAML

func (m *ComputedMetrics) ToYAML() any

ToYAML returns the metrics in a format suitable for YAML marshaling.

type LowSentimentPeriodData

type LowSentimentPeriodData struct {
	Tick      int      `json:"tick"       yaml:"tick"`
	Sentiment float32  `json:"sentiment"  yaml:"sentiment"`
	Comments  []string `json:"comments"   yaml:"comments"`
	RiskLevel string   `json:"risk_level" yaml:"risk_level"`
}

LowSentimentPeriodData identifies periods with negative sentiment.

type MetricOptions

type MetricOptions struct {
	PositiveThreshold      float64
	NegativeThreshold      float64
	TrendThreshold         float64
	LowSentimentRiskThresh float64
}

MetricOptions holds configurable thresholds for sentiment metrics computation.

func DefaultMetricOptions

func DefaultMetricOptions() MetricOptions

DefaultMetricOptions returns default sentiment metric options.

type ReportData

type ReportData struct {
	EmotionsByTick map[int]float32
	CommentsByTick map[int][]string
	CommitsByTick  map[int][]gitlib.Hash
	TickBounds     map[int]analyze.TickBounds
}

ReportData is the parsed input data for sentiment metrics computation.

func ParseReportData

func ParseReportData(report analyze.Report) (*ReportData, error)

ParseReportData extracts ReportData from an analyzer report. Expects canonical format: comments_by_commit and commits_by_tick.

type ScorerOptions

type ScorerOptions struct {
	NeutralizerWeight float64
	MaxWeightRatio    float64
}

ScorerOptions holds configurable parameters for sentiment scoring.

func DefaultScorerOptions

func DefaultScorerOptions() ScorerOptions

DefaultScorerOptions returns ScorerOptions populated with package-level defaults.

type TickData

type TickData struct {
	// CommentsByCommit maps commit hash (hex) to comment texts.
	CommentsByCommit map[string][]string
}

TickData is the per-tick aggregated payload for the sentiment analyzer. It holds per-commit comments for the canonical report format.

type TimeSeriesData

type TimeSeriesData struct {
	Tick           int     `json:"tick"                 yaml:"tick"`
	StartTime      string  `json:"start_time,omitempty" yaml:"start_time,omitempty"`
	EndTime        string  `json:"end_time,omitempty"   yaml:"end_time,omitempty"`
	Sentiment      float32 `json:"sentiment"            yaml:"sentiment"`
	CommentCount   int     `json:"comment_count"        yaml:"comment_count"`
	CommitCount    int     `json:"commit_count"         yaml:"commit_count"`
	Classification string  `json:"classification"       yaml:"classification"`
}

TimeSeriesData contains sentiment data for a time period.

type TrendData

type TrendData struct {
	StartTick      int     `json:"start_tick"      yaml:"start_tick"`
	EndTick        int     `json:"end_tick"        yaml:"end_tick"`
	StartSentiment float32 `json:"start_sentiment" yaml:"start_sentiment"`
	EndSentiment   float32 `json:"end_sentiment"   yaml:"end_sentiment"`
	TrendDirection string  `json:"trend_direction" yaml:"trend_direction"`
	ChangePercent  float64 `json:"change_percent"  yaml:"change_percent"`
}

TrendData contains trend information.

Directories

Path Synopsis
Package lexicons provides multilingual sentiment dictionaries for code comment analysis.
Package lexicons provides multilingual sentiment dictionaries for code comment analysis.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL