printer

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2026 License: MIT Imports: 20 Imported by: 0

Documentation

Overview

Package printer provides output formatting and statistics for duplicate detection.

This package implements multiple output formats and statistical analysis of detected code duplicates.

Output Formats: - TextPrinter: Human-readable text output - JSONPrinter: Structured JSON output - HTMLPrinter: HTML report with syntax highlighting - PlumbingPrinter: Machine-readable output for scripting - StatsPrinter: Comprehensive statistics and health analysis

Design: - Printer interface: Common API for all formats - Format-specific implementations in separate files - Statistics split across focused files:

  • stats.go: Core type and interface methods
  • stats_collector.go: SetX methods for data collection
  • stats_health.go: Health score calculation
  • stats_formatter.go: Output formatters (CSV, JSON, Text)
  • stats_visualization.go: Visualization helpers
  • stats_recommendations.go: Recommendation generation
  • stats_styles.go: Style management
  • stats_data.go: Data structures

Configuration: - SortBy criteria: Size, Occurrence, Hash, TotalTokens - OutputFormat selection: Text, HTML, JSON, Plumbing, Simple JSON - Threshold settings: Minimum size for duplicates - Verbosity: Detailed output for debugging

Performance: - Streaming output for large projects - Efficient memory usage for statistics - Lazy evaluation where possible

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BuildCloneGroups

func BuildCloneGroups(duplChan <-chan syntax.Match) map[string][][]*syntax.Node

BuildCloneGroups builds a map of hash to clone groups from matches.

func ComputeUniqueCounts

func ComputeUniqueCounts(groups map[string][][]*syntax.Node) map[string]int

ComputeUniqueCounts calculates unique file counts for each clone group.

func ExtractSortCriteria

func ExtractSortCriteria(sortBy ...config.SortCriteria) config.SortCriteria

ExtractSortCriteria extracts the sort criteria from variadic sortBy parameter. Returns SortBySize as default if no criteria is provided.

func GetCloneSize

func GetCloneSize(group [][]*syntax.Node) int

GetCloneSize returns the size (token count) of the first clone in a group. Returns 0 if the group is empty or has no fragments.

func SortCloneGroupKeys

func SortCloneGroupKeys(
	keys []string,
	sortBy config.SortCriteria,
	groups map[string][][]*syntax.Node,
	uniqueCounts map[string]int,
)

SortCloneGroupKeys sorts clone group hashes based on specified criteria.

func SortCloneGroups

func SortCloneGroups(groups []CloneGroup, sortBy config.SortCriteria)

SortCloneGroups sorts CloneGroup arrays by specified criteria.

func SortClonesByHash

func SortClonesByHash(dups [][]*syntax.Node) [][]*syntax.Node

SortClonesByHash sorts clone groups by hash (alphabetical, ascending order).

func SortClonesByOccurrence

func SortClonesByOccurrence(dups [][]*syntax.Node) [][]*syntax.Node

SortClonesByOccurrence sorts clone groups by number of files (most files first, descending order).

func SortClonesBySize

func SortClonesBySize(dups [][]*syntax.Node) [][]*syntax.Node

SortClonesBySize sorts clone groups by token count (largest first, descending order).

func SortClonesByTotalTokens

func SortClonesByTotalTokens(dups [][]*syntax.Node) [][]*syntax.Node

SortClonesByTotalTokens sorts clone groups by total token count across all files (largest first).

func SortNodesByCriteria

func SortNodesByCriteria(dups [][]*syntax.Node, sortBy config.SortCriteria) [][]*syntax.Node

SortNodesByCriteria applies sorting criteria to node arrays using a unified switch.

func TestCloneSortingWithData

func TestCloneSortingWithData(
	t *testing.T,
	sortFunc func([][]*syntax.Node) [][]*syntax.Node,
	sortName string,
	clones [][]*syntax.Node,
)

TestCloneSortingWithData is a helper function for testing clone sorting algorithms that takes pre-created clones.

func WordDiff

func WordDiff(base, compared string) string

WordDiff performs a word-level diff between two strings using go-diff. Returns HTML-formatted string with highlighted word changes.

Types

type Clone

type Clone clone

func (Clone) Filename

func (c Clone) Filename() string

func (Clone) Fragment

func (c Clone) Fragment() []byte

func (Clone) LineEnd

func (c Clone) LineEnd() int

func (Clone) LineStart

func (c Clone) LineStart() int

type CloneCategory

type CloneCategory string

CloneCategory represents the category of code that was duplicated.

const (
	CategoryFunction    CloneCategory = "function"    // Function declarations
	CategoryMethod      CloneCategory = "method"      // Method/function literals
	CategoryTest        CloneCategory = "test"        // Test code
	CategoryStruct      CloneCategory = "struct"      // Struct types
	CategoryInterface   CloneCategory = "interface"   // Interface types
	CategoryHandler     CloneCategory = "handler"     // HTTP handlers
	CategoryLoop        CloneCategory = "loop"        // Loop constructs
	CategoryConditional CloneCategory = "conditional" // Conditionals/switches
	CategoryAssignment  CloneCategory = "assignment"  // Variable assignments
	CategoryExpression  CloneCategory = "expression"  // Generic expressions
	CategoryUnknown     CloneCategory = "unknown"     // Unknown/other
)

Clone categories for classifying duplicate code.

func (CloneCategory) GetCategoryEmoji

func (c CloneCategory) GetCategoryEmoji() string

GetCategoryEmoji returns an emoji indicator for the category.

type CloneClassification

type CloneClassification struct {
	Category   CloneCategory
	IsTest     bool
	Priority   ClonePriority
	Tokens     int
	Lines      int
	NodeType   string
	Suggestion string
}

CloneClassification provides metadata about a code clone for actionable reports.

func ClassifyClone

func ClassifyClone(filename string, nodeType int32, tokens, lines int) CloneClassification

ClassifyClone analyzes a clone and returns its classification.

type CloneDiff

type CloneDiff struct {
	CloneWithContent

	Diff DiffResult
}

CloneDiff represents a clone with its diff against the base.

type CloneGroup

type CloneGroup struct {
	Hash  string      `json:"hash"`
	Size  int         `json:"size"`
	Files []JSONClone `json:"files"`
}

CloneGroup represents a group of duplicate code fragments.

type CloneGroupDiff

type CloneGroupDiff struct {
	Base       *CloneWithContent
	Others     []CloneDiff
	HasAnyDiff bool
	// Aggregate statistics across all diffs in this group
	TotalAdded    int
	TotalRemoved  int
	TotalModified int
}

CloneGroupDiff computes diffs between all clones in a group. Returns the base clone (first) and diffs for all other clones.

func ComputeCloneGroupDiff

func ComputeCloneGroupDiff(clones []clone) CloneGroupDiff

ComputeCloneGroupDiff computes diffs for a group of clones.

type ClonePriority

type ClonePriority string

ClonePriority represents how important it is to address this clone.

const (
	PriorityCritical ClonePriority = "critical" // Must fix - production code, large duplication
	PriorityHigh     ClonePriority = "high"     // Should fix - production code, medium duplication
	PriorityMedium   ClonePriority = "medium"   // Consider fixing - test code or small production
	PriorityLow      ClonePriority = "low"      // Optional - test helpers, tiny duplications
)

Clone priority levels for actionability.

func (ClonePriority) GetPriorityColor

func (p ClonePriority) GetPriorityColor() string

GetPriorityColor returns a CSS color variable for the priority.

func (ClonePriority) GetPriorityEmoji

func (p ClonePriority) GetPriorityEmoji() string

GetPriorityEmoji returns an emoji indicator for the priority.

type CloneWithContent

type CloneWithContent struct {
	CloneWithContentMixin

	Content []byte
}

CloneWithContent represents a clone with its file content.

type CloneWithContentMixin

type CloneWithContentMixin struct {
	Filename  string
	LineStart int
	LineEnd   int
}

CloneWithContentMixin provides common location fields for clone structures.

type DiffLine

type DiffLine struct {
	Content    string
	Type       DiffLineType
	LineNumber int // Line number in the original file
}

DiffLine represents a single line in a diff view.

type DiffLineType

type DiffLineType int

DiffLineType indicates the type of diff line.

const (
	DiffLineEqual DiffLineType = iota
	DiffLineAdded
	DiffLineRemoved
	DiffLineModified
)

type DiffResult

type DiffResult struct {
	Base     []DiffLine
	Compared []DiffLine
	HasDiff  bool
}

DiffResult represents the diff between two code fragments.

func LineDiff

func LineDiff(base, compared []byte) DiffResult

LineDiff performs a line-by-line diff between two code fragments. It uses a simple but effective algorithm optimized for code comparison: 1. Split into lines 2. Find matching lines 3. Mark additions/removals/modifications

Size efficiency: O(n) memory, minimal allocations.

type FileInfo

type FileInfo struct {
	CloneWithContentMixin

	Content []byte
	Node    *syntax.Node
}

FileInfo represents processed file information.

func ProcessFileContent

func ProcessFileContent(fread ReadFile, node *syntax.Node) (*FileInfo, error)

ProcessFileContent unified file processing for all printers.

func ProcessNodeRange

func ProcessNodeRange(fread ReadFile, startNode, endNode *syntax.Node) (*FileInfo, error)

ProcessNodeRange processes a range of nodes (start to end).

type FileStatMixin

type FileStatMixin struct {
	// contains filtered or unexported fields
}

FileStatMixin provides common fields for file statistics.

type HashSetter

type HashSetter interface {
	SetHash(hash string)
}

HashSetter is an optional interface for printers that support hash metadata.

type Issue

type Issue struct {
	From, To Clone
}

type Issuer

type Issuer struct {
	ReadFile
}

func NewIssuer

func NewIssuer(fread ReadFile) *Issuer

func (*Issuer) MakeIssues

func (p *Issuer) MakeIssues(dups [][]*syntax.Node) ([]Issue, error)

type JSONClone

type JSONClone struct {
	Filename  string `json:"filename"`
	LineStart int    `json:"line_start"`
	LineEnd   int    `json:"line_end"`
	Fragment  string `json:"fragment"`
}

JSONClone represents a single code fragment duplicate for JSON output.

type JSONOutput

type JSONOutput struct {
	Version         string       `json:"version"`
	Timestamp       time.Time    `json:"timestamp"`
	Threshold       int          `json:"threshold"`
	FilesAnalyzed   int          `json:"files_analyzed"`
	DetectionMethod string       `json:"detection_method,omitempty"`
	CloneGroups     []CloneGroup `json:"clone_groups"`
	Summary         Summary      `json:"summary"`
}

JSONOutput represents the structured JSON output.

type JSONPrinter

type JSONPrinter struct {
	ReadFile
	// contains filtered or unexported fields
}

func (*JSONPrinter) OutputJSON

func (p *JSONPrinter) OutputJSON(
	threshold int,
	sortBy config.SortCriteria,
	detectionMethod string,
) error

OutputJSON generates the complete JSON output.

func (*JSONPrinter) OutputSimpleJSON

func (p *JSONPrinter) OutputSimpleJSON() error

OutputSimpleJSON generates simple JSON output format (from duplicates project). This provides a simpler, more straightforward JSON format for users who prefer it.

func (*JSONPrinter) PrintClones

func (p *JSONPrinter) PrintClones(dups [][]*syntax.Node, sortBy ...config.SortCriteria) error

func (*JSONPrinter) PrintFooter

func (*JSONPrinter) PrintFooter() error

func (*JSONPrinter) PrintHeader

func (p *JSONPrinter) PrintHeader() error

func (*JSONPrinter) SetFilesCount

func (p *JSONPrinter) SetFilesCount(count int)

SetFilesCount sets the total number of files analyzed.

func (*JSONPrinter) SetHash

func (p *JSONPrinter) SetHash(hash string)

SetHash sets the current hash for the clone group being processed.

type LineRangeMixin

type LineRangeMixin struct {
	StartLine int `json:"startLine"`
	EndLine   int `json:"endLine,omitempty"`
}

LineRangeMixin provides common line range fields.

type Printer

type Printer interface {
	PrintHeader() error
	PrintClones(dups [][]*syntax.Node, sortBy ...config.SortCriteria) error
	PrintFooter() error
}

func NewHTML

func NewHTML(w io.Writer, fread ReadFile, threshold ...int) Printer

NewHTML creates a new HTML printer. Supports optional threshold parameter (default: 15).

func NewHTMLWithOptions

func NewHTMLWithOptions(
	w io.Writer,
	fread ReadFile,
	diffMode config.DiffMode,
	metadata ReportMetadata,
	threshold ...int,
) Printer

NewHTMLWithOptions creates a new HTML printer with full options. diffMode enables visual diff highlighting between duplicate occurrences. metadata contains CLI settings to display in the report.

func NewJSON

func NewJSON(w io.Writer, fread ReadFile) Printer

func NewPlumbing

func NewPlumbing(w io.Writer, fread ReadFile) Printer

func NewSARIF

func NewSARIF(w io.Writer, fread ReadFile, threshold int) Printer

NewSARIF creates a new SARIF format printer.

func NewStats

func NewStats(w io.Writer, fread ReadFile, threshold int) Printer

NewStats creates a new stats printer.

func NewText

func NewText(w io.Writer, fread ReadFile) Printer

type ReadFile

type ReadFile func(filename string) ([]byte, error)

type ReportMetadata

type ReportMetadata struct {
	Semantic         bool
	DetectionMethods []string
	SortBy           string
	FilterGenerated  bool
	IncludeSQLC      bool
	IncludeTempl     bool
}

ReportMetadata contains CLI settings used for the report.

type SARIFArtifactLocation

type SARIFArtifactLocation struct {
	URI string `json:"uri"`
}

SARIFArtifactLocation represents the artifact (file) location.

type SARIFConfiguration

type SARIFConfiguration struct {
	Level string `json:"level"`
}

SARIFConfiguration represents rule configuration.

type SARIFDriver

type SARIFDriver struct {
	Name           string      `json:"name"`
	Version        string      `json:"version"`
	InformationURI string      `json:"informationUri"`
	Rules          []SARIFRule `json:"rules"`
}

SARIFDriver represents the tool driver information.

type SARIFFingerprints

type SARIFFingerprints struct {
	CloneHash string `json:"cloneHash,omitempty"`
}

SARIFFingerprints represents fingerprints for deduplication.

type SARIFInvocation

type SARIFInvocation struct {
	ExecutionSuccessful bool   `json:"executionSuccessful"`
	StartTimeUTC        string `json:"startTimeUtc,omitempty"`
	EndTimeUTC          string `json:"endTimeUtc,omitempty"`
}

SARIFInvocation represents an invocation of the tool.

type SARIFLocation

type SARIFLocation struct {
	PhysicalLocation SARIFPhysicalLocation `json:"physicalLocation"`
}

SARIFLocation represents a location in the code.

type SARIFMessage

type SARIFMessage struct {
	Text string `json:"text"`
}

SARIFMessage represents a message in a result.

type SARIFOutput

type SARIFOutput struct {
	Schema  string     `json:"$schema"`
	Version string     `json:"version"`
	Runs    []SARIFRun `json:"runs"`
}

SARIFOutput represents the SARIF (Static Analysis Results Interchange Format) output. This format is used by security tools like GitHub Advanced Security, CodeQL, etc. Spec: https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html

type SARIFPhysicalLocation

type SARIFPhysicalLocation struct {
	ArtifactLocation SARIFArtifactLocation `json:"artifactLocation"`
	Region           SARIFRegion           `json:"region"`
}

SARIFPhysicalLocation represents the physical file location.

type SARIFRegion

type SARIFRegion struct {
	LineRangeMixin
}

SARIFRegion represents a region within a file.

type SARIFResult

type SARIFResult struct {
	RuleID       string            `json:"ruleId"`
	Level        string            `json:"level"`
	Message      SARIFMessage      `json:"message"`
	Locations    []SARIFLocation   `json:"locations"`
	Fingerprints SARIFFingerprints `json:"fingerprints"`
}

SARIFResult represents a single result (finding).

type SARIFRule

type SARIFRule struct {
	ID                   string             `json:"id"`
	Name                 string             `json:"name"`
	ShortDescription     SARIFTextContent   `json:"shortDescription"`
	FullDescription      SARIFTextContent   `json:"fullDescription"`
	DefaultConfiguration SARIFConfiguration `json:"defaultConfiguration"`
	HelpURI              string             `json:"helpUri,omitempty"`
}

SARIFRule represents a rule/check that was violated.

type SARIFRun

type SARIFRun struct {
	Tool        SARIFTool         `json:"tool"`
	Results     []SARIFResult     `json:"results"`
	Invocations []SARIFInvocation `json:"invocations,omitempty"`
}

SARIFRun represents a single analysis run.

type SARIFTextContent

type SARIFTextContent struct {
	Text string `json:"text"`
}

SARIFTextContent represents text content in SARIF.

type SARIFTool

type SARIFTool struct {
	Driver SARIFDriver `json:"driver"`
}

SARIFTool represents the tool that performed the analysis.

type SimpleCloneGroup

type SimpleCloneGroup struct {
	Hash      string            `json:"hash"`
	Score     int               `json:"score"` // Impact score: tokens × instances
	Instances []SimpleJSONClone `json:"instances"`
}

SimpleCloneGroup represents a clone group in simple format (from duplicates project).

type SimpleJSONClone

type SimpleJSONClone struct {
	LineRangeMixin

	Filename   string `json:"filename"`
	TokenCount int    `json:"token_count"`
}

SimpleJSONClone represents a single code clone instance in simple format (from duplicates project).

type SimpleJSONOutput

type SimpleJSONOutput []SimpleCloneGroup

SimpleJSONOutput represents the simple JSON output format (from duplicates project).

type StatsConfig

type StatsConfig struct {
	Format              config.OutputFormat
	FilesCount          int
	DetectionMethods    string
	SemanticDetection   bool
	Timestamp           string
	AnalysisDuration    time.Duration
	TotalEstimatedLines int
	FilesFiltered       int
	FilterBreakdown     map[string]int
}

StatsConfig holds configuration for statistics output. Used by StatsPrinter.ApplyStatsConfig to set all stats metadata in one call.

type StatsData

type StatsData struct {
	// Count metrics
	TotalFilesScanned int `json:"total_files_scanned"`
	TotalCloneGroups  int `json:"total_clone_groups"`
	TotalClones       int `json:"total_clones"`

	// Size metrics
	TotalDuplicateLines int `json:"total_duplicate_lines"`
	TotalTokens         int `json:"total_tokens"`
	TotalEstimatedLines int `json:"total_estimated_lines"` // Estimated total lines for duplication percentage
	AverageCloneSize    int `json:"average_clone_size"`

	// Complexity and impact metrics
	ComplexityScore  float64 `json:"complexity_score"`
	ImpactScore      int     `json:"impact_score"`
	DuplicationRatio float64 `json:"duplication_ratio"` // Percentage of duplicated code

	// Quality metrics
	HealthScore string `json:"health_score"` // A-F grade based on metrics

	// Time metrics
	AnalysisDuration string `json:"analysis_duration"` // Time taken for analysis
	Timestamp        string `json:"timestamp"`         // ISO 8601 timestamp

	// Aggregation metrics
	FileDuplication   map[string]int `json:"file_duplication"`   // filename -> duplicate line count
	SizeDistribution  map[string]int `json:"size_distribution"`  // size range -> count (lines)
	TokenDistribution map[string]int `json:"token_distribution"` // token range -> count
	SeverityBreakdown map[string]int `json:"severity_breakdown"` // severity -> count (small/medium/large/huge)

	// Filter metrics (NEW)
	FilesFiltered   int            `json:"files_filtered,omitempty"`   // Total files filtered out
	FilterBreakdown map[string]int `json:"filter_breakdown,omitempty"` // Reason -> count (e.g., "templ" -> 12)

	// Detection mode
	DetectionMode     string `json:"detection_mode,omitempty"`             // "semantic" or "structural"
	DetectionModeDesc string `json:"detection_mode_description,omitempty"` // Human-readable description

	// Metadata
	DetectionMethods  string `json:"detection_methods"`  // Comma-separated detection methods used
	SemanticDetection bool   `json:"semantic_detection"` // Whether semantic-aware detection was enabled
}

StatsData holds all aggregated statistics about code duplication analysis.

Fields: - Count metrics: TotalFilesScanned, TotalCloneGroups, TotalClones - Size metrics: TotalTokens, TotalDuplicateLines, AverageCloneSize - Complexity metrics: ComplexityScore, ImpactScore - Quality metrics: DuplicationRatio, HealthScore - Time metrics: AnalysisDuration, Timestamp - Aggregation metrics: FileDuplication, SizeDistribution - Filter metrics: FilesFiltered, FilterBreakdown (NEW) - Metadata: DetectionMethods

JSON Marshaling: - All fields are JSON tagged for easy marshaling - Use printer.JSONPrinter for formatted JSON output.

type StatsPrinter

type StatsPrinter interface {
	Printer
	ApplyStatsConfig(config StatsConfig)
	GetStatsData() *StatsData
}

StatsPrinter extends Printer interface with stats configuration.

type StyleMixin

type StyleMixin struct {
	// contains filtered or unexported fields
}

StyleMixin holds common lipgloss style fields.

type Summary

type Summary struct {
	TotalCloneGroups int     `json:"total_clone_groups"`
	TotalClones      int     `json:"total_clones"`
	ComplexityScore  float64 `json:"complexity_score"`
	// ImpactScore represents total duplicated code volume (tokens × instances)
	// This is the simple scoring metric from the duplicates project
	ImpactScore int `json:"impact_score,omitempty"`
}

Summary provides analysis summary statistics.

type TextPrinter

type TextPrinter struct {
	ReadFile
	// contains filtered or unexported fields
}

TextPrinter implements text-based output for duplicate detection.

func (*TextPrinter) OutputText

func (p *TextPrinter) OutputText(threshold int, sortBy config.SortCriteria) error

OutputText generates text output with sorting.

func (*TextPrinter) PrintClones

func (p *TextPrinter) PrintClones(dups [][]*syntax.Node, sortBy ...config.SortCriteria) error

func (*TextPrinter) PrintClonesSorted

func (p *TextPrinter) PrintClonesSorted(dups [][]*syntax.Node, sortBy config.SortCriteria) error

PrintClonesSorted prints clones with specified sorting criteria.

func (*TextPrinter) PrintFooter

func (p *TextPrinter) PrintFooter() error

func (*TextPrinter) PrintHeader

func (p *TextPrinter) PrintHeader() error

func (*TextPrinter) SetFileDuplicate

func (p *TextPrinter) SetFileDuplicate(isDupe bool)

SetFileDuplicate marks the current group as a file duplicate.

func (*TextPrinter) SetHash

func (p *TextPrinter) SetHash(hash string)

SetHash sets the hash for the current clone group.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL