interproc

package
v0.4.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 24, 2026 License: MIT Imports: 13 Imported by: 0

README

Interprocedural Analysis Engine

This package provides context-sensitive interprocedural analysis for gorisk, enabling more precise capability tracking and taint detection across function boundaries.

Features

  • Context-Sensitive Call Graph (k-CFA): Distinguishes different calling contexts (k=1: immediate caller)
  • SCC Detection: Handles recursive call cycles using Tarjan's algorithm
  • Worklist Fixpoint: Converges on transitive capabilities through iterative propagation
  • Persistent Caching: Speeds up repeated analysis by caching function summaries
  • Enhanced Taint Analysis: Tracks source→sink flows across function calls with full call stacks

Architecture

Context-Sensitive   →  Summary      →  SCC Detector
Call Graph Builder     Generator       (Tarjan)
        ↓                  ↓               ↓
Worklist Fixpoint   ←  Lattice Ops  ←  Cache Manager
        ↓
Per-Function Taint Analysis

Usage

Running Analysis

The interprocedural engine is enabled by default:

gorisk scan
Verbose Output

Enable detailed logging to see the analysis progress:

gorisk scan --verbose

Or using environment variable (backward compatible):

GORISK_VERBOSE=1 gorisk scan

This shows:

  • Call graph construction (nodes, edges, context cloning)
  • SCC detection results
  • Fixpoint iteration progress
  • Cache hit/miss statistics
  • Taint flow analysis results
Disable Caching

To clear the cache manually:

rm -rf ~/.cache/gorisk/summaries
Configure Analysis Options
import "github.com/1homsi/gorisk/internal/interproc"

opts := interproc.AnalysisOptions{
    ContextSensitivity: 1,    // k for k-CFA (0=insensitive, 1=caller-sensitive)
    MaxIterations:      1000, // Max fixpoint iterations
    EnableCache:        true, // Enable persistent caching
    CacheDir:           "",   // Default: $HOME/.cache/gorisk/summaries
}

csGraph, findings, err := interproc.RunAnalysis(irGraph, opts)

Algorithm Details

Context Sensitivity (k-CFA)

The engine uses k=1 calling context sensitivity by default:

  • Each function is analyzed separately for each unique immediate caller
  • Prevents over-approximation when functions are called from different contexts
  • Balances precision with performance (k=0 too imprecise, k≥2 too expensive)
SCC Detection

Strongly connected components (recursive call cycles) are detected using Tarjan's algorithm:

  • Collapses SCCs into "super-nodes" with unified summaries
  • Limits intra-SCC iterations to 3 to prevent infinite loops
  • Ensures termination even with complex mutual recursion
Fixpoint Computation

Capabilities propagate bottom-up (leaves → roots) using a worklist algorithm:

  1. Initialize worklist with all functions in reverse topological order
  2. For each function, compute summary from callees
  3. If summary changes, re-enqueue all callers
  4. Repeat until convergence (or max iterations)

Confidence Decay: Capabilities lose confidence as they propagate:

  • Hop 0 (direct): 1.00
  • Hop 1: 0.70
  • Hop 2: 0.55
  • Hop 3+: 0.40

Depth Limit: Stops propagating after 3 hops to maintain precision.

Persistent Caching

Function summaries are cached to disk for incremental analysis:

  • Cache Key: Function + context + direct capabilities + callee hashes + code hash
  • Format: JSON (human-readable, debuggable)
  • Location: $HOME/.cache/gorisk/summaries/<package>/<hash>.json
  • Invalidation: On file mtime change, dependency change, or version mismatch

Cache Hit Rate: Typically >90% on unchanged code.

Testing

Run the test suite:

go test ./internal/interproc/... -v

Key test cases:

  • SCC Detection: Simple cycles, multiple SCCs, self-loops, DAGs
  • Fixpoint: Linear chains, cycles, convergence, depth limits
  • Context Sensitivity: Context cloning, merging (k=0 vs k=1)

Performance

Benchmarks on the gorisk codebase (>300 functions):

  • First run: ~500ms (no cache)
  • Second run: ~50ms (90%+ cache hits)
  • Memory: ~10MB for typical projects

Comparison: Old vs New

Feature Old (3-Pass) New (Interprocedural)
Context Insensitive k=1 CFA
Cycles Convergence heuristic Explicit SCC detection
Depth Fixed 3 passes Worklist until convergence
Taint Package-level only Function-level with call stacks
Caching None Persistent JSON cache
Precision Medium High

Future Work

  • k=2 Analysis: Two-level calling context (currently limited to k=1)
  • Field-Sensitive: Track capabilities per struct field
  • Path-Sensitive: Distinguish control-flow paths within functions
  • Sanitizer Detection: Recognize validation/encoding functions
  • Binary Cache Format: Faster serialization (currently JSON)

References

Documentation

Overview

Package interproc provides interprocedural analysis capabilities for context-sensitive call graph analysis and taint tracking.

Index

Constants

This section is empty.

Variables

View Source
var (
	// Logger is the global logger for interprocedural analysis
	Logger *log.Logger

	// Verbose controls whether debug messages are printed
	Verbose bool
)

Global logger configuration

Functions

func BuildCSCallGraph

func BuildCSCallGraph(irGraph ir.IRGraph, k int) *ir.CSCallGraph

BuildCSCallGraph constructs a k-CFA call graph from an IRGraph. k=0: context-insensitive (all calls merge) k=1: caller-sensitive (distinguish by immediate caller) k>1: not yet implemented

func ClassifySummary

func ClassifySummary(summary *ir.FunctionSummary)

ClassifySummary categorizes capabilities into sources, sinks, and sanitizers.

func CollapseSCC

func CollapseSCC(scc *ir.SCC, cg *ir.CSCallGraph) ir.FunctionSummary

CollapseSCC creates a unified summary for an SCC by joining all node summaries. It limits recursion depth to 3 iterations within the SCC to prevent infinite loops.

func ComputeCodeHash added in v0.3.3

func ComputeCodeHash(dir string, files []string) string

ComputeCodeHash hashes the contents of the given files (relative to dir) to produce a stable cache key component. Files are hashed in sorted order so that adding/removing/changing any file invalidates the cache. Returns an empty string if no files can be read.

func ComputeFixpoint

func ComputeFixpoint(cg *ir.CSCallGraph, maxIterations int) error

ComputeFixpoint propagates summaries until convergence using a pending algorithm. It logs a warning if the maximum number of iterations is exceeded but does not return an error — the partial analysis remains a valid over-approximation.

func ComputeFixpointCached

func ComputeFixpointCached(cg *ir.CSCallGraph, cache *Cache, maxIterations int) error

ComputeFixpointCached is a wrapper around ComputeFixpoint that uses caching. Currently, caching is implemented but the LoadOrCompute integration is deferred.

func ComputeSCCSummary

func ComputeSCCSummary(scc *ir.SCC, cg *ir.CSCallGraph) ir.FunctionSummary

ComputeSCCSummary computes a summary for an entire SCC.

func ComputeSummary

func ComputeSummary(cg *ir.CSCallGraph, node ir.ContextNode) ir.FunctionSummary

ComputeSummary builds a summary from direct capabilities and callee summaries.

func ConsolidateIR

func ConsolidateIR(pkgCaps map[string]map[string]ir.FunctionCaps, pkgEdges map[string][]ir.CallEdge) ir.IRGraph

ConsolidateIR converts package-level IR into a unified IRGraph.

func Debugf

func Debugf(format string, args ...interface{})

Debugf prints a debug message if verbose mode is enabled

func DetectSCCs

func DetectSCCs(cg *ir.CSCallGraph)

DetectSCCs finds strongly connected components using Tarjan's algorithm. It populates cg.SCCs and cg.NodeToSCC.

func Errorf

func Errorf(format string, args ...interface{})

Errorf always prints an error message regardless of verbose mode

func GetLeaves

func GetLeaves(cg *ir.CSCallGraph) []ir.ContextNode

GetLeaves returns all leaf nodes (nodes with no callees).

func GetRoots

func GetRoots(cg *ir.CSCallGraph) []ir.ContextNode

GetRoots returns all entry point nodes (nodes with no callers).

func Infof

func Infof(format string, args ...interface{})

Infof prints an info message if verbose mode is enabled

func JoinSummaries

func JoinSummaries(a, b ir.FunctionSummary) ir.FunctionSummary

JoinSummaries merges two summaries (for SCC collapse or merging contexts).

func ReverseTopologicalSort

func ReverseTopologicalSort(cg *ir.CSCallGraph) []ir.ContextNode

ReverseTopologicalSort returns nodes in topological order (roots first). This is useful for forward dataflow analysis.

func RunAnalysis

func RunAnalysis(irGraph ir.IRGraph, opts AnalysisOptions) (*ir.CSCallGraph, []taint.TaintFinding, error)

RunAnalysis performs interprocedural analysis on an IRGraph. It returns a context-sensitive call graph with computed summaries and interprocedural taint findings.

func SetOutput

func SetOutput(w io.Writer)

SetOutput redirects logger output (useful for testing)

func SetVerbose

func SetVerbose(enabled bool)

SetVerbose enables or disables verbose logging at runtime

func SummariesEqual

func SummariesEqual(a, b ir.FunctionSummary) bool

SummariesEqual checks if two summaries are equivalent (for fixpoint convergence).

func TopologicalSort

func TopologicalSort(cg *ir.CSCallGraph) []ir.ContextNode

TopologicalSort returns nodes in reverse topological order (leaves first). This ordering ensures that we process callees before callers in the fixpoint algorithm. Nodes in cycles will be ordered arbitrarily within their SCC.

func Warnf

func Warnf(format string, args ...interface{})

Warnf prints a warning message if verbose mode is enabled

Types

type AnalysisOptions

type AnalysisOptions struct {
	ContextSensitivity int    // k for k-CFA (default: 1)
	MaxIterations      int    // Max fixpoint iterations (default: 5000)
	EnableCache        bool   // Enable persistent caching (default: true)
	CacheDir           string // Cache directory (default: $HOME/.cache/gorisk)
}

AnalysisOptions configures the interprocedural analysis.

func DefaultOptions

func DefaultOptions() AnalysisOptions

DefaultOptions returns the default analysis configuration.

type Cache

type Cache struct {
	// contains filtered or unexported fields
}

Cache manages persistent function summary caching.

func NewCache

func NewCache(dir string) *Cache

NewCache creates a new cache manager.

func NewCacheDisabled

func NewCacheDisabled() *Cache

NewCacheDisabled creates a disabled cache (no-op).

func (*Cache) Load

func (c *Cache) Load(key CacheKey) (ir.FunctionSummary, bool)

Load retrieves a cached summary if available.

func (*Cache) Stats

func (c *Cache) Stats()

Stats logs cache hit/miss statistics.

func (*Cache) Store

func (c *Cache) Store(key CacheKey, summary ir.FunctionSummary)

Store saves a summary to the cache.

type CacheEntry

type CacheEntry struct {
	Key       CacheKey           `json:"key"`
	Summary   ir.FunctionSummary `json:"summary"`
	Timestamp time.Time          `json:"timestamp"`
	Version   string             `json:"version"` // gorisk version
}

CacheEntry stores a serialized function summary.

type CacheKey

type CacheKey struct {
	Function     ir.Symbol
	Context      ir.Context
	DirectCaps   string   // Hash of direct capabilities
	CalleeHashes []string // Hashes of callee summaries
	CodeHash     string   // File mtime or git blob hash
}

CacheKey uniquely identifies a function summary for caching.

func (CacheKey) Hash

func (k CacheKey) Hash() string

Hash returns a stable string hash of the cache key.

type ResultBundle added in v0.3.7

type ResultBundle struct {
	CallGraph         *ir.CSCallGraph
	TaintFindings     []taint.TaintFinding
	ReachabilityHints map[string]bool // package -> has reachable sink/source signal
	Diagnostics       []string
}

ResultBundle is the stable output of interprocedural analysis for command consumers.

func RunBundle added in v0.3.7

func RunBundle(irGraph ir.IRGraph, opts AnalysisOptions) (ResultBundle, error)

RunBundle executes interprocedural analysis and returns a stable result bundle.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL