sentiment

package

v1.0.0-rc.1 Latest Latest Go to latest Published: May 13, 2026 License: Apache-2.0, Apache-2.0 Imports: 29 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Sumatoshi-tech/codefang

Links

Open Source Insights

README ¶

Sentiment Analysis

Preface

The mood of the development team is often reflected in their communication. Comments in code can serve as a proxy for team morale and code quality culture.

Problem

Burnout, frustration, and toxic environments are killer for productivity. It's hard to detect these trends in a distributed team without reading every message.

How analyzer solves it

The Sentiment analyzer scans source code comments introduced in each commit. It classifies the text as Positive, Negative, or Neutral using VADER sentiment analysis enhanced with software engineering domain adjustments.

Historical context

Sentiment analysis (Opinion Mining) is a subfield of NLP. Applying it to software engineering (SE) data is a growing research area (e.g., "Emotion Mining in Software Engineering"). VADER (Hutto & Gilbert, 2014) is a rule-based model designed for short text; it achieves F1=0.96 on social media text. However, it requires domain adjustment for SE contexts where technical terms like "kill", "abort", and "fatal" are emotionally neutral.

Real world examples

Burnout Detection: A trend of increasingly negative comments might indicate team stress.
Hotspots: Specific files or modules that trigger negative comments (e.g., "this hack fix again").
Technical Debt: Comments containing "workaround", "hack", "kludge" signal accumulated shortcuts.

How analyzer works here

Comment Extraction: Uses UAST to find comment nodes in the added/changed code.
Preprocessing: Cleans up the text (removes code snippets, formatting). Supports Unicode/multilingual comments.
SE-Domain Adjustment: Technical terms that VADER misclassifies are adjusted toward neutral (e.g., "kill process") or toward negative (e.g., "hacky workaround").
Length-Weighted Scoring: Longer comments carry more weight in the final score.
Regression-Based Trend: Linear regression instead of first/last comparison for robust trend detection.

Key Features

Multilingual comment extraction — Unicode-aware regex patterns support CJK, Cyrillic, Arabic, and all Unicode scripts
SE-domain lexicon — Technical terms adjusted to avoid false negatives/positives
Length-weighted scoring — Longer, more substantive comments carry proportionally more weight
Linear regression trend — Robust to outliers and intermediate noise
Enhanced visualization — Multi-series plot with threshold bands, trend line, comment count, and pie chart distribution

Limitations

VADER is English-optimized: Sentiment scoring accuracy degrades for non-English text
Sarcasm: "Great job breaking production" might be classified as positive
Context: Domain-specific meaning beyond the lexicon is not captured

Documentation ¶

Overview ¶

Package sentiment provides sentiment functionality.

Index ¶

Constants
func AggregateCommitsToTicks(commentsByCommit map[string][]string, commitsByTick map[int][]gitlib.Hash) (commentsByTick map[int][]string, emotionsByTick map[int]float32)
func ComputeSentiment(comments []string) float32
func ComputeSentimentWithOptions(comments []string, opts ScorerOptions) float32
func GenerateStoreSections(reader analyze.ReportReader) ([]plotpage.Section, error)
func RegisterPlotSections()
func RegisterStoreTimeSeriesExtractor()
func RenderTerminal(metrics *ComputedMetrics) string
type AggregateData
type Analyzer
- func NewAnalyzer() *Analyzer
- func (s *Analyzer) ApplySnapshot(snap analyze.PlumbingSnapshot)
- func (s *Analyzer) CPUHeavy() bool
- func (s *Analyzer) Configure(facts map[string]any) error
- func (s *Analyzer) Consume(ctx context.Context, ac *analyze.Context) (analyze.TC, error)
- func (s *Analyzer) ExtractCommitTimeSeries(report analyze.Report) map[string]any
- func (s *Analyzer) Fork(n int) []analyze.HistoryAnalyzer
- func (s *Analyzer) GenerateChart(report analyze.Report) (components.Charter, error)
- func (s *Analyzer) GenerateSections(report analyze.Report) ([]plotpage.Section, error)
- func (s *Analyzer) Initialize(_ *gitlib.Repository) error
- func (s *Analyzer) Merge(_ []analyze.HistoryAnalyzer)
- func (s *Analyzer) NeedsUAST() bool
- func (s *Analyzer) ReleaseSnapshot(snap analyze.PlumbingSnapshot)
- func (s *Analyzer) SnapshotPlumbing() analyze.PlumbingSnapshot
- func (s *Analyzer) WriteToStore(ctx context.Context, ticks []analyze.TICK, w analyze.ReportWriter) error
type CommitResult
type ComputedMetrics
- func ComputeAllMetrics(report analyze.Report) (*ComputedMetrics, error)
- func ComputeAllMetricsWithOptions(report analyze.Report, opts MetricOptions) (*ComputedMetrics, error)
- func (m *ComputedMetrics) AnalyzerName() string
- func (m *ComputedMetrics) ToJSON() any
- func (m *ComputedMetrics) ToYAML() any
type LowSentimentPeriodData
type MetricOptions
- func DefaultMetricOptions() MetricOptions
type ReportData
- func ParseReportData(report analyze.Report) (*ReportData, error)
type ScorerOptions
- func DefaultScorerOptions() ScorerOptions
type TickData
type TimeSeriesData
type TrendData

Constants ¶

View Source

const (
	// ConfigCommentSentimentMinLength is the configuration key for the minimum comment length.
	ConfigCommentSentimentMinLength = "CommentSentiment.MinLength"
	// ConfigCommentSentimentGap is the configuration key for the sentiment gap threshold.
	ConfigCommentSentimentGap = "CommentSentiment.Gap"
	// ConfigCommentSentimentNeutralizerWeight is the configuration key for the SE domain neutralizer weight.
	ConfigCommentSentimentNeutralizerWeight = "CommentSentiment.NeutralizerWeight"
	// ConfigCommentSentimentMaxWeightRatio is the configuration key for the max comment weight ratio.
	ConfigCommentSentimentMaxWeightRatio = "CommentSentiment.MaxWeightRatio"
	// ConfigCommentSentimentPositiveThreshold is the config key for the positive sentiment classification threshold.
	ConfigCommentSentimentPositiveThreshold = "CommentSentiment.PositiveThreshold"
	// ConfigCommentSentimentNegativeThreshold is the config key for the negative sentiment classification threshold.
	ConfigCommentSentimentNegativeThreshold = "CommentSentiment.NegativeThreshold"
	// ConfigCommentSentimentTrendThreshold is the config key for the trend direction threshold.
	ConfigCommentSentimentTrendThreshold = "CommentSentiment.TrendThreshold"
	// ConfigCommentSentimentLowRiskThreshold is the config key for the low sentiment risk threshold.
	ConfigCommentSentimentLowRiskThreshold = "CommentSentiment.LowSentimentRiskThreshold"

	// DefaultCommentSentimentCommentMinLength is the default minimum comment length for sentiment analysis.
	DefaultCommentSentimentCommentMinLength = 20
	// DefaultCommentSentimentGap is the default gap threshold for sentiment analysis.
	DefaultCommentSentimentGap = float32(0.5)

	// CommentLettersRatio defines the minimum ratio of letters in a comment.
	CommentLettersRatio = 0.6
)

View Source

const (
	SentimentPositiveThreshold = 0.6
	SentimentNegativeThreshold = 0.4
)

Sentiment thresholds and constants.

View Source

const (
	KindTimeSeries = "time_series"
	KindTrend      = "trend"
	KindAggregate  = "aggregate"
)

Store record kind constants.

View Source

const DimSentiment = "sentiment"

DimSentiment is the dimension name for sentiment time series extraction.

View Source

const (
	MinCommentLengthThresholdHigh = 10
)

MinCommentLengthThresholdHigh is the minimum character length for a comment to be included in sentiment analysis.

Variables ¶

This section is empty.

Functions ¶

func AggregateCommitsToTicks ¶

func AggregateCommitsToTicks(
	commentsByCommit map[string][]string,
	commitsByTick map[int][]gitlib.Hash,
) (commentsByTick map[int][]string, emotionsByTick map[int]float32)

AggregateCommitsToTicks groups per-commit comment data into per-tick comments and emotions by merging all commits belonging to each tick.

func ComputeSentiment ¶

func ComputeSentiment(comments []string) float32

ComputeSentiment returns a score in [0, 1] for the given comments. 0 = negative, 0.5 = neutral, 1 = positive. Uses VADER with SE-domain adjustments for NLP-based analysis. Empty comments yield 0 (no comment implies no sentiment signal). Comments are weighted by length (longer comments carry more signal).

func ComputeSentimentWithOptions ¶

func ComputeSentimentWithOptions(comments []string, opts ScorerOptions) float32

ComputeSentimentWithOptions returns a sentiment score with configurable parameters.

func GenerateStoreSections ¶

func GenerateStoreSections(reader analyze.ReportReader) ([]plotpage.Section, error)

GenerateStoreSections reads pre-computed sentiment data from a ReportReader and builds the same plot sections as GenerateSections, without materializing a full Report or recomputing metrics.

func RegisterPlotSections ¶

func RegisterPlotSections()

RegisterPlotSections registers the sentiment plot section renderer with the analyze package.

func RegisterStoreTimeSeriesExtractor ¶

func RegisterStoreTimeSeriesExtractor()

RegisterStoreTimeSeriesExtractor registers the sentiment analyzer's store-based time series extractor with the anomaly package for cross-analyzer anomaly detection.

func RenderTerminal ¶

func RenderTerminal(metrics *ComputedMetrics) string

RenderTerminal returns a colored, human-readable terminal representation.

Types ¶

type AggregateData ¶

type AggregateData struct {
	TotalTicks       int     `json:"total_ticks"       yaml:"total_ticks"`
	TotalComments    int     `json:"total_comments"    yaml:"total_comments"`
	TotalCommits     int     `json:"total_commits"     yaml:"total_commits"`
	AverageSentiment float32 `json:"average_sentiment" yaml:"average_sentiment"`
	PositiveTicks    int     `json:"positive_ticks"    yaml:"positive_ticks"`
	NeutralTicks     int     `json:"neutral_ticks"     yaml:"neutral_ticks"`
	NegativeTicks    int     `json:"negative_ticks"    yaml:"negative_ticks"`
}

AggregateData contains summary statistics.

type Analyzer ¶

type Analyzer struct {
	*analyze.BaseHistoryAnalyzer[*ComputedMetrics]
	common.NoStateHibernation

	UAST  *plumbing.UASTChangesAnalyzer
	Ticks *plumbing.TicksSinceStart

	MinCommentLength int
	Gap              float32
	// contains filtered or unexported fields
}

Analyzer tracks comment sentiment across commit history.

func NewAnalyzer ¶

func NewAnalyzer() *Analyzer

NewAnalyzer creates a new sentiment analyzer.

func (*Analyzer) ApplySnapshot ¶

func (s *Analyzer) ApplySnapshot(snap analyze.PlumbingSnapshot)

ApplySnapshot restores plumbing state from a previously captured snapshot.

func (*Analyzer) CPUHeavy ¶

func (s *Analyzer) CPUHeavy() bool

CPUHeavy indicates this analyzer does heavy computation.

func (*Analyzer) Configure ¶

func (s *Analyzer) Configure(facts map[string]any) error

Configure sets up the analyzer with the provided facts.

func (*Analyzer) Consume ¶

func (s *Analyzer) Consume(ctx context.Context, ac *analyze.Context) (analyze.TC, error)

Consume processes a single commit and returns a TC with extracted comments. The analyzer does not retain any per-commit state; all output is in the TC.

func (*Analyzer) ExtractCommitTimeSeries ¶

func (s *Analyzer) ExtractCommitTimeSeries(report analyze.Report) map[string]any

ExtractCommitTimeSeries extracts per-commit sentiment data from a finalized report.

func (*Analyzer) Fork ¶

func (s *Analyzer) Fork(n int) []analyze.HistoryAnalyzer

Fork creates a copy of the analyzer for parallel processing.

func (*Analyzer) GenerateChart ¶

func (s *Analyzer) GenerateChart(report analyze.Report) (components.Charter, error)

GenerateChart implements PlotGenerator interface.

func (*Analyzer) GenerateSections ¶

func (s *Analyzer) GenerateSections(report analyze.Report) ([]plotpage.Section, error)

GenerateSections returns the sections for combined reports.

func (*Analyzer) Initialize ¶

func (s *Analyzer) Initialize(_ *gitlib.Repository) error

Initialize prepares the analyzer for processing commits.

func (*Analyzer) Merge ¶

func (s *Analyzer) Merge(_ []analyze.HistoryAnalyzer)

Merge is a no-op. Per-commit results are emitted as TCs.

func (*Analyzer) NeedsUAST ¶

func (s *Analyzer) NeedsUAST() bool

NeedsUAST returns true to enable the UAST pipeline.

func (*Analyzer) ReleaseSnapshot ¶

func (s *Analyzer) ReleaseSnapshot(snap analyze.PlumbingSnapshot)

ReleaseSnapshot releases UAST trees owned by the snapshot.

func (*Analyzer) SnapshotPlumbing ¶

func (s *Analyzer) SnapshotPlumbing() analyze.PlumbingSnapshot

SnapshotPlumbing captures the current plumbing output state for parallel execution.

func (*Analyzer) WriteToStore ¶

func (s *Analyzer) WriteToStore(ctx context.Context, ticks []analyze.TICK, w analyze.ReportWriter) error

WriteToStore implements analyze.StoreWriter. It extracts comment data from TICKs, computes sentiment scores, and streams pre-computed metrics as individual records:

"time_series": per-tick TimeSeriesData records (sorted by tick).
"trend": single TrendData record.
"aggregate": single AggregateData record.

type CommitResult ¶

type CommitResult struct {
	// Comments contains filtered comment texts from this commit's UAST changes.
	Comments []string
}

CommitResult is the per-commit TC payload for the sentiment analyzer. It holds the filtered comment texts extracted from UAST changes for a single commit.

type ComputedMetrics ¶

type ComputedMetrics struct {
	TimeSeries          []TimeSeriesData         `json:"time_series"           yaml:"time_series"`
	Trend               TrendData                `json:"trend"                 yaml:"trend"`
	LowSentimentPeriods []LowSentimentPeriodData `json:"low_sentiment_periods" yaml:"low_sentiment_periods"`
	Aggregate           AggregateData            `json:"aggregate"             yaml:"aggregate"`
}

ComputedMetrics holds all computed metric results for the sentiment analyzer.

func ComputeAllMetrics ¶

func ComputeAllMetrics(report analyze.Report) (*ComputedMetrics, error)

ComputeAllMetrics runs all sentiment metrics with default options.

func ComputeAllMetricsWithOptions ¶

func ComputeAllMetricsWithOptions(report analyze.Report, opts MetricOptions) (*ComputedMetrics, error)

ComputeAllMetricsWithOptions runs all sentiment metrics with configurable thresholds.

func (*ComputedMetrics) AnalyzerName ¶

func (m *ComputedMetrics) AnalyzerName() string

AnalyzerName returns the name of the analyzer that produced these metrics.

func (*ComputedMetrics) ToJSON ¶

func (m *ComputedMetrics) ToJSON() any

ToJSON returns the metrics in a format suitable for JSON marshaling.

func (*ComputedMetrics) ToYAML ¶

func (m *ComputedMetrics) ToYAML() any

ToYAML returns the metrics in a format suitable for YAML marshaling.

type LowSentimentPeriodData ¶

type LowSentimentPeriodData struct {
	Tick      int      `json:"tick"       yaml:"tick"`
	Sentiment float32  `json:"sentiment"  yaml:"sentiment"`
	Comments  []string `json:"comments"   yaml:"comments"`
	RiskLevel string   `json:"risk_level" yaml:"risk_level"`
}

LowSentimentPeriodData identifies periods with negative sentiment.

type MetricOptions ¶

type MetricOptions struct {
	PositiveThreshold      float64
	NegativeThreshold      float64
	TrendThreshold         float64
	LowSentimentRiskThresh float64
}

MetricOptions holds configurable thresholds for sentiment metrics computation.

func DefaultMetricOptions ¶

func DefaultMetricOptions() MetricOptions

DefaultMetricOptions returns default sentiment metric options.

type ReportData ¶

type ReportData struct {
	EmotionsByTick map[int]float32
	CommentsByTick map[int][]string
	CommitsByTick  map[int][]gitlib.Hash
	TickBounds     map[int]analyze.TickBounds
}

ReportData is the parsed input data for sentiment metrics computation.

func ParseReportData ¶

func ParseReportData(report analyze.Report) (*ReportData, error)

ParseReportData extracts ReportData from an analyzer report. Expects canonical format: comments_by_commit and commits_by_tick.

type ScorerOptions ¶

type ScorerOptions struct {
	NeutralizerWeight float64
	MaxWeightRatio    float64
}

ScorerOptions holds configurable parameters for sentiment scoring.

func DefaultScorerOptions ¶

func DefaultScorerOptions() ScorerOptions

DefaultScorerOptions returns ScorerOptions populated with package-level defaults.

type TickData ¶

type TickData struct {
	// CommentsByCommit maps commit hash (hex) to comment texts.
	CommentsByCommit map[string][]string
}

TickData is the per-tick aggregated payload for the sentiment analyzer. It holds per-commit comments for the canonical report format.

type TimeSeriesData ¶

type TimeSeriesData struct {
	Tick           int     `json:"tick"                 yaml:"tick"`
	StartTime      string  `json:"start_time,omitempty" yaml:"start_time,omitempty"`
	EndTime        string  `json:"end_time,omitempty"   yaml:"end_time,omitempty"`
	Sentiment      float32 `json:"sentiment"            yaml:"sentiment"`
	CommentCount   int     `json:"comment_count"        yaml:"comment_count"`
	CommitCount    int     `json:"commit_count"         yaml:"commit_count"`
	Classification string  `json:"classification"       yaml:"classification"`
}

TimeSeriesData contains sentiment data for a time period.

type TrendData ¶

type TrendData struct {
	StartTick      int     `json:"start_tick"      yaml:"start_tick"`
	EndTick        int     `json:"end_tick"        yaml:"end_tick"`
	StartSentiment float32 `json:"start_sentiment" yaml:"start_sentiment"`
	EndSentiment   float32 `json:"end_sentiment"   yaml:"end_sentiment"`
	TrendDirection string  `json:"trend_direction" yaml:"trend_direction"`
	ChangePercent  float64 `json:"change_percent"  yaml:"change_percent"`
}

TrendData contains trend information.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
lexicons Package lexicons provides multilingual sentiment dictionaries for code comment analysis.	Package lexicons provides multilingual sentiment dictionaries for code comment analysis.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL