syntax

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package syntax provides unified AST representation for code duplication detection.

This package bridges the gap between language-specific AST parsers (golang/ast for Go code) and the language-agnostic suffix tree used by the detection algorithm.

Core Types: - Node: Unified syntax tree node representing any language construct - Match: Represents a clone match with fragments (group of nodes) - Frags: Slice of node sequences (each fragment is a sequence of nodes)

Design: - Language-agnostic: Works with any language that provides a parser - Type-safe: Uses int32 for types (see golang/constants for mapping) - Memory-optimized: Careful field ordering for cache efficiency - Position-aware: Tracks byte positions and line numbers for all nodes

Usage Flow: 1. Parse source files -> language-specific AST (go/ast, etc.) 2. Transform AST -> unified syntax.Node tree (see syntax/golang/) 3. Build suffix tree from Node sequence (suffixtree.Update()) 4. Find duplicates using suffix tree (FindDuplOver()) 5. Convert matches to complete syntax units (FindSyntaxUnits())

Key Functions: - FindSyntaxUnits(): Converts suffix tree matches to complete syntax units - hashSeq(): Creates hash of node sequence for duplicate detection - isCyclic/spansMultipleFiles(): Validation helpers

Performance: - maxChildrenSerial constant prevents goroutine stack overflow - Node struct is 40B (37.5% reduction from 64B) via int32 fields - See MEMORY_LAYOUT_OPTIMIZATION_PLAN.md for details

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CountUniqueFiles

func CountUniqueFiles(group [][]*Node) int

CountUniqueFiles returns the number of unique files in a clone group.

func Unique

func Unique(group [][]*Node) [][]*Node

Unique removes duplicate entries from a group of syntax nodes based on file and range. Two clones are considered duplicates if they have the same filename and the same range (start-end).

Types

type Match

type Match struct {
	Hash  string
	Frags [][]*Node
}

func FindSyntaxUnits

func FindSyntaxUnits(data []*Node, m suffixtree.Match, threshold int) Match

FindSyntaxUnits finds all complete syntax units in the match group and returns them with the corresponding hash.

type Node

type Node struct {
	Type     int32
	Pos      int32
	End      int32
	Owns     int32
	Children []*Node
	Filename string
}

Node represents a syntax tree node.

Memory Layout Optimized with int32 fields: - int32 fields grouped for cache efficiency (4B each, 16B total) - pointer field (8B) - string header at end (16B) Total: 40B (37.5% reduction from 64B).

func NewNode

func NewNode() *Node

func NewSyntheticFileNode

func NewSyntheticFileNode(filename string, size int) *Node

NewSyntheticFileNode creates a synthetic node representing an entire file. This is used for file-level duplicate detection where we want to match entire files rather than specific code fragments.

func Serialize

func Serialize(n *Node) []*Node

func (*Node) AddChildren

func (n *Node) AddChildren(children ...*Node)

func (*Node) Val

func (n *Node) Val() suffixtree.TokenValue

Val returns the token value for suffix tree compatibility. Implements the suffixtree.Token interface.

Directories

Path Synopsis
Package templ provides AST parsing for templ files using the official templ parser.
Package templ provides AST parsing for templ files using the official templ parser.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL