semantic

package
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 27, 2026 License: GPL-3.0 Imports: 29 Imported by: 0

Documentation

Overview

Package semantic provides a complete semantic input tracer that analyzes codebases to trace user input flow with full cross-file, inter-procedural analysis.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ToDOT

func ToDOT(r *TraceResult) string

ToDOT converts the trace result to GraphViz DOT format

func ToHTML

func ToHTML(r *TraceResult) string

ToHTML converts the trace result to interactive HTML format

func ToJSON

func ToJSON(r *TraceResult) (string, error)

ToJSON converts the trace result to JSON format

func ToMermaid

func ToMermaid(r *TraceResult) string

ToMermaid converts the trace result to Mermaid diagram format

Types

type Config

type Config struct {
	// Languages to analyze (empty = auto-detect all)
	Languages []string

	// MaxDepth for inter-procedural analysis
	MaxDepth int

	// Workers for parallel analysis
	Workers int

	// FollowImports enables cross-file analysis
	FollowImports bool

	// Verbose enables detailed logging
	Verbose bool

	// IncludePatterns for file filtering (glob patterns)
	IncludePatterns []string

	// ExcludePatterns for file filtering (glob patterns)
	ExcludePatterns []string

	// MaxMemoryMB is the maximum memory usage in MB (0 = use default 100MB)
	// Applied to all modes to prevent OOM on large codebases
	MaxMemoryMB int

	// MaxFileSizeBytes is the maximum file size to parse (0 = unlimited)
	MaxFileSizeBytes int64

	// MaxFiles is the maximum number of files to parse (0 = unlimited)
	MaxFiles int

	// MaxFlowNodes is the maximum number of nodes in the flow graph (0 = default 10000)
	MaxFlowNodes int

	// MaxFlowEdges is the maximum number of edges in the flow graph (0 = default 20000)
	MaxFlowEdges int
}

Config configures the semantic tracer

func DefaultConfig

func DefaultConfig() *Config

DefaultConfig returns sensible defaults

type FileInfo

type FileInfo struct {
	Path        string
	Language    string
	SymbolTable *types.SymbolTable
	Sources     []*types.FlowNode
	Assignments []*types.Assignment // Cached assignments for flow tracing (avoids re-parsing)
	Calls       []*types.CallSite   // Cached calls for flow tracing (avoids re-parsing)
	Root        *sitter.Node        // Only populated during parsing, released after
	Content     []byte              // Only populated if NeedsReparse is false
	ParseTime   time.Duration
	Error       error
	// NeedsReparse indicates the file needs re-parsing for deeper analysis
	// (AST was released to save memory)
	NeedsReparse bool
}

FileInfo holds information about a parsed file Optimized to not retain AST and file content in memory after parsing

type LanguageStats

type LanguageStats struct {
	Files        int
	Sources      int
	Flows        int
	ParseErrors  int
	ParseTime    time.Duration
	AnalysisTime time.Duration
}

LanguageStats holds per-language statistics

type TraceContext

type TraceContext struct {
	// contains filtered or unexported fields
}

TraceContext provides per-trace-invocation isolation for thread safety Each TraceBackward() call gets its own context with: - Own parser instances (not shared → thread-safe) - Cached assignments ONLY (extracted once per file, reused in recursion) - NO AST caching (ASTs are huge, assignments are tiny) - Released on completion (memory-efficient)

func (*TraceContext) Close

func (ctx *TraceContext) Close()

Close releases all resources held by the context

type TraceResult

type TraceResult struct {
	// All discovered input sources
	Sources []*types.FlowNode

	// Complete flow map
	FlowMap *types.FlowMap

	// Per-file information
	Files map[string]*FileInfo

	// Global symbol table (merged from all files)
	GlobalSymbolTable *types.SymbolTable

	// Per-file symbol tables (for symbolic execution)
	SymbolTable map[string]*types.SymbolTable

	// Statistics
	Stats *TraceStats
}

TraceResult is the complete result of semantic tracing

func (*TraceResult) GetSourcesByFile

func (r *TraceResult) GetSourcesByFile(filePath string) []*types.FlowNode

GetSourcesByFile returns sources in a specific file

func (*TraceResult) GetSourcesByType

func (r *TraceResult) GetSourcesByType(sourceType types.SourceType) []*types.FlowNode

GetSourcesByType returns sources filtered by type

func (*TraceResult) HasInputAtFunction

func (r *TraceResult) HasInputAtFunction(funcName string) bool

HasInputAtFunction checks if a function receives user input

func (*TraceResult) ToDOT

func (r *TraceResult) ToDOT() string

ToDOT outputs the result as GraphViz DOT

func (*TraceResult) ToHTML

func (r *TraceResult) ToHTML() string

ToHTML outputs the result as interactive HTML

func (*TraceResult) ToJSON

func (r *TraceResult) ToJSON() (string, error)

ToJSON outputs the result as JSON

func (*TraceResult) ToMermaid

func (r *TraceResult) ToMermaid() string

ToMermaid outputs the result as Mermaid diagram

type TraceStats

type TraceStats struct {
	FilesScanned     int
	FilesParsed      int
	FilesSkipped     int
	ParseErrors      int
	SourcesFound     int
	FlowsTraced      int
	CrossFileFlows   int
	TotalDuration    time.Duration
	ParseDuration    time.Duration
	AnalysisDuration time.Duration
	ByLanguage       map[string]*LanguageStats
}

TraceStats holds tracing statistics

type Tracer

type Tracer struct {
	// contains filtered or unexported fields
}

Tracer is the main semantic input tracer

func New

func New(config *Config) *Tracer

New creates a new semantic tracer

func (*Tracer) Close

func (t *Tracer) Close()

Close releases all resources held by the Tracer

func (*Tracer) ParseOnly

func (t *Tracer) ParseOnly(path string) (*TraceResult, error)

ParseOnly parses files and builds symbol tables without flow analysis (fast mode for symbolic tracing)

func (*Tracer) TraceBackward

func (t *Tracer) TraceBackward(target string, codebasePath string) (*types.BackwardTraceResult, error)

TraceBackward performs backward taint analysis from a target expression. This traces from a target variable/expression back to its input sources.

func (*Tracer) TraceBackwardBatch

func (t *Tracer) TraceBackwardBatch(targets []string, codebasePath string) (*types.BatchTraceResult, error)

TraceBackwardBatch performs backward taint analysis for MULTIPLE target expressions in a SINGLE pass. This is CRITICAL for performance: instead of N × files reads (for N variables), we do a single pass through all files, checking all variables at once. PERF: Shares TraceContext and assignment cache across all variables.

func (*Tracer) TraceDirectory

func (t *Tracer) TraceDirectory(path string) (*TraceResult, error)

TraceDirectory performs semantic tracing on a directory

func (*Tracer) TraceFile

func (t *Tracer) TraceFile(path string) (*TraceResult, error)

TraceFile performs semantic tracing on a single file

Directories

Path Synopsis
Package analyzer defines the interface for language-specific analyzers
Package analyzer defines the interface for language-specific analyzers
base
Package base provides shared helpers for language analyzers.
Package base provides shared helpers for language analyzers.
c
Package c implements the C language analyzer for semantic input tracing
Package c implements the C language analyzer for semantic input tracing
cpp
Package cpp implements the C++ language analyzer for semantic input tracing
Package cpp implements the C++ language analyzer for semantic input tracing
csharp
Package csharp implements the C# language analyzer for semantic input tracing
Package csharp implements the C# language analyzer for semantic input tracing
golang
Package golang implements the Go language analyzer for semantic input tracing
Package golang implements the Go language analyzer for semantic input tracing
java
Package java implements the Java language analyzer for semantic input tracing
Package java implements the Java language analyzer for semantic input tracing
javascript
Package javascript implements the JavaScript language analyzer for semantic input tracing
Package javascript implements the JavaScript language analyzer for semantic input tracing
php
Package php implements the PHP language analyzer for semantic input tracing
Package php implements the PHP language analyzer for semantic input tracing
python
Package python implements the Python language analyzer for semantic input tracing
Package python implements the Python language analyzer for semantic input tracing
ruby
Package ruby implements the Ruby language analyzer for semantic input tracing
Package ruby implements the Ruby language analyzer for semantic input tracing
rust
Package rust implements the Rust language analyzer for semantic input tracing
Package rust implements the Rust language analyzer for semantic input tracing
typescript
Package typescript implements the TypeScript language analyzer for semantic input tracing
Package typescript implements the TypeScript language analyzer for semantic input tracing
Package batch provides batch analysis capabilities for analyzing multiple code snippets
Package batch provides batch analysis capabilities for analyzing multiple code snippets
Package callgraph provides sophisticated call graph management with distance computation for input flow analysis.
Package callgraph provides sophisticated call graph management with distance computation for input flow analysis.
Package classifier provides snippet classification using carrier maps
Package classifier provides snippet classification using carrier maps
Package condition provides key condition extraction for branch analysis.
Package condition provides key condition extraction for branch analysis.
Package discovery - carrier map builder and serialization
Package discovery - carrier map builder and serialization
Package extractor provides utilities to extract traceable PHP expressions from code snippets
Package extractor provides utilities to extract traceable PHP expressions from code snippets
Package index provides a unified code indexer with signature-based lookup, inspired by ATLANTIS's multi-tier code retrieval approach.
Package index provides a unified code indexer with signature-based lookup, inspired by ATLANTIS's multi-tier code retrieval approach.
Package pathanalysis provides inter-procedural path expansion and pruning for taint analysis.
Package pathanalysis provides inter-procedural path expansion and pruning for taint analysis.
Package symbolic provides symbolic execution for deep semantic tracing This traces object instantiation, constructor execution, method calls, and property population Works universally across ALL PHP applications - no framework-specific hints
Package symbolic provides symbolic execution for deep semantic tracing This traces object instantiation, constructor execution, method calls, and property population Works universally across ALL PHP applications - no framework-specific hints
Package tracer provides variable tracing across codebases
Package tracer provides variable tracing across codebases
Package types defines universal data structures for semantic input tracing across all supported programming languages.
Package types defines universal data structures for semantic input tracing across all supported programming languages.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL