Documentation
¶
Overview ¶
Package printer provides output formatting and statistics for duplicate detection.
This package implements multiple output formats and statistical analysis of detected code duplicates.
Output Formats: - TextPrinter: Human-readable text output - JSONPrinter: Structured JSON output - HTMLPrinter: HTML report with syntax highlighting - PlumbingPrinter: Machine-readable output for scripting - StatsPrinter: Comprehensive statistics and health analysis
Design: - Printer interface: Common API for all formats - Format-specific implementations in separate files - Statistics split across focused files:
- stats.go: Core type and interface methods
- stats_collector.go: SetX methods for data collection
- stats_health.go: Health score calculation
- stats_formatter.go: Output formatters (CSV, JSON, Text)
- stats_visualization.go: Visualization helpers
- stats_recommendations.go: Recommendation generation
- stats_styles.go: Style management
- stats_data.go: Data structures
Configuration: - SortBy criteria: Size, Occurrence, Hash, TotalTokens - OutputFormat selection: Text, HTML, JSON, Plumbing, Simple JSON - Threshold settings: Minimum size for duplicates - Verbosity: Detailed output for debugging
Performance: - Streaming output for large projects - Efficient memory usage for statistics - Lazy evaluation where possible
Index ¶
- func BuildCloneGroups(duplChan <-chan syntax.Match) map[string][][]*syntax.Node
- func ComputeUniqueCounts(groups map[string][][]*syntax.Node) map[string]int
- func ExtractSortCriteria(sortBy ...config.SortCriteria) config.SortCriteria
- func GetCloneSize(group [][]*syntax.Node) int
- func SortCloneGroupKeys(keys []string, sortBy config.SortCriteria, groups map[string][][]*syntax.Node, ...)
- func SortCloneGroups(groups []CloneGroup, sortBy config.SortCriteria)
- func SortClonesByHash(dups [][]*syntax.Node) [][]*syntax.Node
- func SortClonesByOccurrence(dups [][]*syntax.Node) [][]*syntax.Node
- func SortClonesBySize(dups [][]*syntax.Node) [][]*syntax.Node
- func SortClonesByTotalTokens(dups [][]*syntax.Node) [][]*syntax.Node
- func SortNodesByCriteria(dups [][]*syntax.Node, sortBy config.SortCriteria) [][]*syntax.Node
- func TestCloneSortingWithData(t *testing.T, sortFunc func([][]*syntax.Node) [][]*syntax.Node, ...)
- func WordDiff(base, compared string) string
- type Clone
- type CloneCategory
- type CloneClassification
- type CloneDiff
- type CloneGroup
- type CloneGroupDiff
- type ClonePriority
- type CloneWithContent
- type CloneWithContentMixin
- type DiffLine
- type DiffLineType
- type DiffResult
- type FileInfo
- type FileStatMixin
- type HashSetter
- type Issue
- type Issuer
- type JSONClone
- type JSONOutput
- type JSONPrinter
- func (p *JSONPrinter) OutputJSON(threshold int, sortBy config.SortCriteria, detectionMethod string) error
- func (p *JSONPrinter) OutputSimpleJSON() error
- func (p *JSONPrinter) PrintClones(dups [][]*syntax.Node, sortBy ...config.SortCriteria) error
- func (*JSONPrinter) PrintFooter() error
- func (p *JSONPrinter) PrintHeader() error
- func (p *JSONPrinter) SetFilesCount(count int)
- func (p *JSONPrinter) SetHash(hash string)
- type LineRangeMixin
- type Printer
- func NewHTML(w io.Writer, fread ReadFile, threshold ...int) Printer
- func NewHTMLWithOptions(w io.Writer, fread ReadFile, diffMode config.DiffMode, metadata ReportMetadata, ...) Printer
- func NewJSON(w io.Writer, fread ReadFile) Printer
- func NewPlumbing(w io.Writer, fread ReadFile) Printer
- func NewSARIF(w io.Writer, fread ReadFile, threshold int) Printer
- func NewStats(w io.Writer, fread ReadFile, threshold int) Printer
- func NewText(w io.Writer, fread ReadFile) Printer
- type ReadFile
- type ReportMetadata
- type SARIFArtifactLocation
- type SARIFConfiguration
- type SARIFDriver
- type SARIFFingerprints
- type SARIFInvocation
- type SARIFLocation
- type SARIFMessage
- type SARIFOutput
- type SARIFPhysicalLocation
- type SARIFRegion
- type SARIFResult
- type SARIFRule
- type SARIFRun
- type SARIFTextContent
- type SARIFTool
- type SimpleCloneGroup
- type SimpleJSONClone
- type SimpleJSONOutput
- type StatsConfig
- type StatsData
- type StatsPrinter
- type StyleMixin
- type Summary
- type TextPrinter
- func (p *TextPrinter) OutputText(threshold int, sortBy config.SortCriteria) error
- func (p *TextPrinter) PrintClones(dups [][]*syntax.Node, sortBy ...config.SortCriteria) error
- func (p *TextPrinter) PrintClonesSorted(dups [][]*syntax.Node, sortBy config.SortCriteria) error
- func (p *TextPrinter) PrintFooter() error
- func (p *TextPrinter) PrintHeader() error
- func (p *TextPrinter) SetFileDuplicate(isDupe bool)
- func (p *TextPrinter) SetHash(hash string)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BuildCloneGroups ¶
BuildCloneGroups builds a map of hash to clone groups from matches.
func ComputeUniqueCounts ¶
ComputeUniqueCounts calculates unique file counts for each clone group.
func ExtractSortCriteria ¶
func ExtractSortCriteria(sortBy ...config.SortCriteria) config.SortCriteria
ExtractSortCriteria extracts the sort criteria from variadic sortBy parameter. Returns SortBySize as default if no criteria is provided.
func GetCloneSize ¶
GetCloneSize returns the size (token count) of the first clone in a group. Returns 0 if the group is empty or has no fragments.
func SortCloneGroupKeys ¶
func SortCloneGroupKeys( keys []string, sortBy config.SortCriteria, groups map[string][][]*syntax.Node, uniqueCounts map[string]int, )
SortCloneGroupKeys sorts clone group hashes based on specified criteria.
func SortCloneGroups ¶
func SortCloneGroups(groups []CloneGroup, sortBy config.SortCriteria)
SortCloneGroups sorts CloneGroup arrays by specified criteria.
func SortClonesByHash ¶
SortClonesByHash sorts clone groups by hash (alphabetical, ascending order).
func SortClonesByOccurrence ¶
SortClonesByOccurrence sorts clone groups by number of files (most files first, descending order).
func SortClonesBySize ¶
SortClonesBySize sorts clone groups by token count (largest first, descending order).
func SortClonesByTotalTokens ¶
SortClonesByTotalTokens sorts clone groups by total token count across all files (largest first).
func SortNodesByCriteria ¶
SortNodesByCriteria applies sorting criteria to node arrays using a unified switch.
Types ¶
type CloneCategory ¶
type CloneCategory string
CloneCategory represents the category of code that was duplicated.
const ( CategoryFunction CloneCategory = "function" // Function declarations CategoryMethod CloneCategory = "method" // Method/function literals CategoryTest CloneCategory = "test" // Test code CategoryStruct CloneCategory = "struct" // Struct types CategoryInterface CloneCategory = "interface" // Interface types CategoryHandler CloneCategory = "handler" // HTTP handlers CategoryLoop CloneCategory = "loop" // Loop constructs CategoryConditional CloneCategory = "conditional" // Conditionals/switches CategoryAssignment CloneCategory = "assignment" // Variable assignments CategoryExpression CloneCategory = "expression" // Generic expressions CategoryUnknown CloneCategory = "unknown" // Unknown/other )
Clone categories for classifying duplicate code.
func (CloneCategory) GetCategoryEmoji ¶
func (c CloneCategory) GetCategoryEmoji() string
GetCategoryEmoji returns an emoji indicator for the category.
type CloneClassification ¶
type CloneClassification struct {
Category CloneCategory
IsTest bool
Priority ClonePriority
Tokens int
Lines int
NodeType string
Suggestion string
}
CloneClassification provides metadata about a code clone for actionable reports.
func ClassifyClone ¶
func ClassifyClone(filename string, nodeType int32, tokens, lines int) CloneClassification
ClassifyClone analyzes a clone and returns its classification.
type CloneDiff ¶
type CloneDiff struct {
CloneWithContent
Diff DiffResult
}
CloneDiff represents a clone with its diff against the base.
type CloneGroup ¶
type CloneGroup struct {
Hash string `json:"hash"`
Size int `json:"size"`
Files []JSONClone `json:"files"`
}
CloneGroup represents a group of duplicate code fragments.
type CloneGroupDiff ¶
type CloneGroupDiff struct {
Base *CloneWithContent
Others []CloneDiff
HasAnyDiff bool
// Aggregate statistics across all diffs in this group
TotalAdded int
TotalRemoved int
TotalModified int
}
CloneGroupDiff computes diffs between all clones in a group. Returns the base clone (first) and diffs for all other clones.
func ComputeCloneGroupDiff ¶
func ComputeCloneGroupDiff(clones []clone) CloneGroupDiff
ComputeCloneGroupDiff computes diffs for a group of clones.
type ClonePriority ¶
type ClonePriority string
ClonePriority represents how important it is to address this clone.
const ( PriorityCritical ClonePriority = "critical" // Must fix - production code, large duplication PriorityHigh ClonePriority = "high" // Should fix - production code, medium duplication PriorityMedium ClonePriority = "medium" // Consider fixing - test code or small production PriorityLow ClonePriority = "low" // Optional - test helpers, tiny duplications )
Clone priority levels for actionability.
func (ClonePriority) GetPriorityColor ¶
func (p ClonePriority) GetPriorityColor() string
GetPriorityColor returns a CSS color variable for the priority.
func (ClonePriority) GetPriorityEmoji ¶
func (p ClonePriority) GetPriorityEmoji() string
GetPriorityEmoji returns an emoji indicator for the priority.
type CloneWithContent ¶
type CloneWithContent struct {
CloneWithContentMixin
Content []byte
}
CloneWithContent represents a clone with its file content.
type CloneWithContentMixin ¶
CloneWithContentMixin provides common location fields for clone structures.
type DiffLine ¶
type DiffLine struct {
Content string
Type DiffLineType
LineNumber int // Line number in the original file
}
DiffLine represents a single line in a diff view.
type DiffLineType ¶
type DiffLineType int
DiffLineType indicates the type of diff line.
const ( DiffLineEqual DiffLineType = iota DiffLineAdded DiffLineRemoved DiffLineModified )
type DiffResult ¶
DiffResult represents the diff between two code fragments.
func LineDiff ¶
func LineDiff(base, compared []byte) DiffResult
LineDiff performs a line-by-line diff between two code fragments. It uses a simple but effective algorithm optimized for code comparison: 1. Split into lines 2. Find matching lines 3. Mark additions/removals/modifications
Size efficiency: O(n) memory, minimal allocations.
type FileInfo ¶
type FileInfo struct {
CloneWithContentMixin
Content []byte
Node *syntax.Node
}
FileInfo represents processed file information.
func ProcessFileContent ¶
ProcessFileContent unified file processing for all printers.
type FileStatMixin ¶
type FileStatMixin struct {
// contains filtered or unexported fields
}
FileStatMixin provides common fields for file statistics.
type HashSetter ¶
type HashSetter interface {
SetHash(hash string)
}
HashSetter is an optional interface for printers that support hash metadata.
type JSONClone ¶
type JSONClone struct {
Filename string `json:"filename"`
LineStart int `json:"line_start"`
LineEnd int `json:"line_end"`
Fragment string `json:"fragment"`
}
JSONClone represents a single code fragment duplicate for JSON output.
type JSONOutput ¶
type JSONOutput struct {
Version string `json:"version"`
Timestamp time.Time `json:"timestamp"`
Threshold int `json:"threshold"`
FilesAnalyzed int `json:"files_analyzed"`
DetectionMethod string `json:"detection_method,omitempty"`
CloneGroups []CloneGroup `json:"clone_groups"`
Summary Summary `json:"summary"`
}
JSONOutput represents the structured JSON output.
type JSONPrinter ¶
type JSONPrinter struct {
ReadFile
// contains filtered or unexported fields
}
func (*JSONPrinter) OutputJSON ¶
func (p *JSONPrinter) OutputJSON( threshold int, sortBy config.SortCriteria, detectionMethod string, ) error
OutputJSON generates the complete JSON output.
func (*JSONPrinter) OutputSimpleJSON ¶
func (p *JSONPrinter) OutputSimpleJSON() error
OutputSimpleJSON generates simple JSON output format (from duplicates project). This provides a simpler, more straightforward JSON format for users who prefer it.
func (*JSONPrinter) PrintClones ¶
func (p *JSONPrinter) PrintClones(dups [][]*syntax.Node, sortBy ...config.SortCriteria) error
func (*JSONPrinter) PrintFooter ¶
func (*JSONPrinter) PrintFooter() error
func (*JSONPrinter) PrintHeader ¶
func (p *JSONPrinter) PrintHeader() error
func (*JSONPrinter) SetFilesCount ¶
func (p *JSONPrinter) SetFilesCount(count int)
SetFilesCount sets the total number of files analyzed.
func (*JSONPrinter) SetHash ¶
func (p *JSONPrinter) SetHash(hash string)
SetHash sets the current hash for the clone group being processed.
type LineRangeMixin ¶
type LineRangeMixin struct {
StartLine int `json:"startLine"`
EndLine int `json:"endLine,omitempty"`
}
LineRangeMixin provides common line range fields.
type Printer ¶
type Printer interface {
PrintHeader() error
PrintClones(dups [][]*syntax.Node, sortBy ...config.SortCriteria) error
}
func NewHTML ¶
NewHTML creates a new HTML printer. Supports optional threshold parameter (default: 15).
func NewHTMLWithOptions ¶
func NewHTMLWithOptions( w io.Writer, fread ReadFile, diffMode config.DiffMode, metadata ReportMetadata, threshold ...int, ) Printer
NewHTMLWithOptions creates a new HTML printer with full options. diffMode enables visual diff highlighting between duplicate occurrences. metadata contains CLI settings to display in the report.
type ReportMetadata ¶
type ReportMetadata struct {
Semantic bool
DetectionMethods []string
SortBy string
FilterGenerated bool
IncludeSQLC bool
IncludeTempl bool
}
ReportMetadata contains CLI settings used for the report.
type SARIFArtifactLocation ¶
type SARIFArtifactLocation struct {
URI string `json:"uri"`
}
SARIFArtifactLocation represents the artifact (file) location.
type SARIFConfiguration ¶
type SARIFConfiguration struct {
Level string `json:"level"`
}
SARIFConfiguration represents rule configuration.
type SARIFDriver ¶
type SARIFDriver struct {
Name string `json:"name"`
Version string `json:"version"`
InformationURI string `json:"informationUri"`
Rules []SARIFRule `json:"rules"`
}
SARIFDriver represents the tool driver information.
type SARIFFingerprints ¶
type SARIFFingerprints struct {
CloneHash string `json:"cloneHash,omitempty"`
}
SARIFFingerprints represents fingerprints for deduplication.
type SARIFInvocation ¶
type SARIFInvocation struct {
ExecutionSuccessful bool `json:"executionSuccessful"`
StartTimeUTC string `json:"startTimeUtc,omitempty"`
EndTimeUTC string `json:"endTimeUtc,omitempty"`
}
SARIFInvocation represents an invocation of the tool.
type SARIFLocation ¶
type SARIFLocation struct {
PhysicalLocation SARIFPhysicalLocation `json:"physicalLocation"`
}
SARIFLocation represents a location in the code.
type SARIFMessage ¶
type SARIFMessage struct {
Text string `json:"text"`
}
SARIFMessage represents a message in a result.
type SARIFOutput ¶
type SARIFOutput struct {
Schema string `json:"$schema"`
Version string `json:"version"`
Runs []SARIFRun `json:"runs"`
}
SARIFOutput represents the SARIF (Static Analysis Results Interchange Format) output. This format is used by security tools like GitHub Advanced Security, CodeQL, etc. Spec: https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html
type SARIFPhysicalLocation ¶
type SARIFPhysicalLocation struct {
ArtifactLocation SARIFArtifactLocation `json:"artifactLocation"`
Region SARIFRegion `json:"region"`
}
SARIFPhysicalLocation represents the physical file location.
type SARIFRegion ¶
type SARIFRegion struct {
LineRangeMixin
}
SARIFRegion represents a region within a file.
type SARIFResult ¶
type SARIFResult struct {
RuleID string `json:"ruleId"`
Level string `json:"level"`
Message SARIFMessage `json:"message"`
Locations []SARIFLocation `json:"locations"`
Fingerprints SARIFFingerprints `json:"fingerprints"`
}
SARIFResult represents a single result (finding).
type SARIFRule ¶
type SARIFRule struct {
ID string `json:"id"`
Name string `json:"name"`
ShortDescription SARIFTextContent `json:"shortDescription"`
FullDescription SARIFTextContent `json:"fullDescription"`
DefaultConfiguration SARIFConfiguration `json:"defaultConfiguration"`
HelpURI string `json:"helpUri,omitempty"`
}
SARIFRule represents a rule/check that was violated.
type SARIFRun ¶
type SARIFRun struct {
Tool SARIFTool `json:"tool"`
Results []SARIFResult `json:"results"`
Invocations []SARIFInvocation `json:"invocations,omitempty"`
}
SARIFRun represents a single analysis run.
type SARIFTextContent ¶
type SARIFTextContent struct {
Text string `json:"text"`
}
SARIFTextContent represents text content in SARIF.
type SARIFTool ¶
type SARIFTool struct {
Driver SARIFDriver `json:"driver"`
}
SARIFTool represents the tool that performed the analysis.
type SimpleCloneGroup ¶
type SimpleCloneGroup struct {
Hash string `json:"hash"`
Score int `json:"score"` // Impact score: tokens × instances
Instances []SimpleJSONClone `json:"instances"`
}
SimpleCloneGroup represents a clone group in simple format (from duplicates project).
type SimpleJSONClone ¶
type SimpleJSONClone struct {
LineRangeMixin
Filename string `json:"filename"`
TokenCount int `json:"token_count"`
}
SimpleJSONClone represents a single code clone instance in simple format (from duplicates project).
type SimpleJSONOutput ¶
type SimpleJSONOutput []SimpleCloneGroup
SimpleJSONOutput represents the simple JSON output format (from duplicates project).
type StatsConfig ¶
type StatsConfig struct {
Format config.OutputFormat
FilesCount int
DetectionMethods string
SemanticDetection bool
Timestamp string
AnalysisDuration time.Duration
TotalEstimatedLines int
FilesFiltered int
FilterBreakdown map[string]int
}
StatsConfig holds configuration for statistics output. Used by StatsPrinter.ApplyStatsConfig to set all stats metadata in one call.
type StatsData ¶
type StatsData struct {
// Count metrics
TotalFilesScanned int `json:"total_files_scanned"`
TotalCloneGroups int `json:"total_clone_groups"`
TotalClones int `json:"total_clones"`
// Size metrics
TotalDuplicateLines int `json:"total_duplicate_lines"`
TotalTokens int `json:"total_tokens"`
TotalEstimatedLines int `json:"total_estimated_lines"` // Estimated total lines for duplication percentage
AverageCloneSize int `json:"average_clone_size"`
// Complexity and impact metrics
ComplexityScore float64 `json:"complexity_score"`
ImpactScore int `json:"impact_score"`
DuplicationRatio float64 `json:"duplication_ratio"` // Percentage of duplicated code
// Quality metrics
HealthScore string `json:"health_score"` // A-F grade based on metrics
// Time metrics
AnalysisDuration string `json:"analysis_duration"` // Time taken for analysis
Timestamp string `json:"timestamp"` // ISO 8601 timestamp
// Aggregation metrics
FileDuplication map[string]int `json:"file_duplication"` // filename -> duplicate line count
SizeDistribution map[string]int `json:"size_distribution"` // size range -> count (lines)
TokenDistribution map[string]int `json:"token_distribution"` // token range -> count
SeverityBreakdown map[string]int `json:"severity_breakdown"` // severity -> count (small/medium/large/huge)
// Filter metrics (NEW)
FilesFiltered int `json:"files_filtered,omitempty"` // Total files filtered out
FilterBreakdown map[string]int `json:"filter_breakdown,omitempty"` // Reason -> count (e.g., "templ" -> 12)
// Detection mode
DetectionMode string `json:"detection_mode,omitempty"` // "semantic" or "structural"
DetectionModeDesc string `json:"detection_mode_description,omitempty"` // Human-readable description
// Metadata
DetectionMethods string `json:"detection_methods"` // Comma-separated detection methods used
SemanticDetection bool `json:"semantic_detection"` // Whether semantic-aware detection was enabled
}
StatsData holds all aggregated statistics about code duplication analysis.
Fields: - Count metrics: TotalFilesScanned, TotalCloneGroups, TotalClones - Size metrics: TotalTokens, TotalDuplicateLines, AverageCloneSize - Complexity metrics: ComplexityScore, ImpactScore - Quality metrics: DuplicationRatio, HealthScore - Time metrics: AnalysisDuration, Timestamp - Aggregation metrics: FileDuplication, SizeDistribution - Filter metrics: FilesFiltered, FilterBreakdown (NEW) - Metadata: DetectionMethods
JSON Marshaling: - All fields are JSON tagged for easy marshaling - Use printer.JSONPrinter for formatted JSON output.
type StatsPrinter ¶
type StatsPrinter interface {
Printer
ApplyStatsConfig(config StatsConfig)
GetStatsData() *StatsData
}
StatsPrinter extends Printer interface with stats configuration.
type StyleMixin ¶
type StyleMixin struct {
// contains filtered or unexported fields
}
StyleMixin holds common lipgloss style fields.
type Summary ¶
type Summary struct {
TotalCloneGroups int `json:"total_clone_groups"`
TotalClones int `json:"total_clones"`
ComplexityScore float64 `json:"complexity_score"`
// ImpactScore represents total duplicated code volume (tokens × instances)
// This is the simple scoring metric from the duplicates project
ImpactScore int `json:"impact_score,omitempty"`
}
Summary provides analysis summary statistics.
type TextPrinter ¶
type TextPrinter struct {
ReadFile
// contains filtered or unexported fields
}
TextPrinter implements text-based output for duplicate detection.
func (*TextPrinter) OutputText ¶
func (p *TextPrinter) OutputText(threshold int, sortBy config.SortCriteria) error
OutputText generates text output with sorting.
func (*TextPrinter) PrintClones ¶
func (p *TextPrinter) PrintClones(dups [][]*syntax.Node, sortBy ...config.SortCriteria) error
func (*TextPrinter) PrintClonesSorted ¶
func (p *TextPrinter) PrintClonesSorted(dups [][]*syntax.Node, sortBy config.SortCriteria) error
PrintClonesSorted prints clones with specified sorting criteria.
func (*TextPrinter) PrintFooter ¶
func (p *TextPrinter) PrintFooter() error
func (*TextPrinter) PrintHeader ¶
func (p *TextPrinter) PrintHeader() error
func (*TextPrinter) SetFileDuplicate ¶
func (p *TextPrinter) SetFileDuplicate(isDupe bool)
SetFileDuplicate marks the current group as a file duplicate.
func (*TextPrinter) SetHash ¶
func (p *TextPrinter) SetHash(hash string)
SetHash sets the hash for the current clone group.
Source Files
¶
- clone_classify.go
- common.go
- diff.go
- file_processor.go
- groups.go
- html.go
- html_diff.go
- html_summary.go
- html_template.go
- issuer.go
- json.go
- plumbing.go
- printer.go
- sarif.go
- sort_unified.go
- sorter.go
- stats.go
- stats_collector.go
- stats_data.go
- stats_formatter.go
- stats_health.go
- stats_recommendations.go
- stats_styles.go
- stats_visualization.go
- test_helper.go
- text.go