builder

package
v0.0.0-...-4f1c5a8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 16, 2025 License: AGPL-3.0 Imports: 13 Imported by: 0

Documentation

Overview

Package builder provides call graph construction orchestration.

This package ties together all components to build a complete call graph:

  • Module registry (registry package)
  • Type inference (resolution package)
  • Import resolution (resolution package)
  • Call site extraction (extraction package)
  • Advanced resolution (resolution package)
  • Pattern detection (patterns package)
  • Taint analysis (analysis/taint package)

Basic Usage

// Build from existing code graph
callGraph, err := builder.BuildCallGraph(codeGraph, moduleRegistry, projectRoot)

Call Resolution Strategy

The builder uses a multi-strategy approach to resolve function calls:

  1. Direct import resolution
  2. Method chaining with type inference
  3. Self-attribute resolution (self.attr.method)
  4. Type inference for variable.method() calls
  5. ORM pattern detection (Django, SQLAlchemy)
  6. Framework detection (known external frameworks)
  7. Standard library resolution via remote CDN

Each strategy is tried in order until one succeeds.

Multi-Pass Architecture

The builder performs multiple passes over the codebase:

Pass 1: Index all function definitions
Pass 2: Extract return types from all functions
Pass 3: Extract variable assignments and type bindings
Pass 4: Extract class attributes
Pass 5: Resolve call sites and build call graph edges
Pass 6: Generate taint summaries for security analysis

This multi-pass approach ensures that all necessary type information is collected before attempting to resolve call sites.

Caching

The builder uses ImportMapCache to avoid re-parsing imports from the same file multiple times, significantly improving performance.

Thread Safety

All exported functions in this package are thread-safe. The ImportMapCache uses a read-write mutex to allow concurrent reads while ensuring safe writes.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BuildCallGraph

func BuildCallGraph(codeGraph *graph.CodeGraph, registry *core.ModuleRegistry, projectRoot string) (*core.CallGraph, error)

BuildCallGraph constructs the complete call graph for a Python project. This is Pass 3 of the 3-pass algorithm:

  • Pass 1: BuildModuleRegistry - map files to modules
  • Pass 2: ExtractImports + ExtractCallSites - parse imports and calls
  • Pass 3: BuildCallGraph - resolve calls and build graph

Algorithm:

  1. For each Python file in the project: a. Extract imports to build ImportMap b. Extract call sites from AST c. Extract function definitions from main graph
  2. For each call site: a. Resolve target name using ImportMap b. Find target function definition in registry c. Add edge from caller to callee d. Store detailed call site information

Parameters:

  • codeGraph: the existing code graph with parsed AST nodes
  • registry: module registry mapping files to modules
  • projectRoot: absolute path to project root

Returns:

  • CallGraph: complete call graph with edges and call sites
  • error: if any step fails

Example:

Given:
  File: myapp/views.py
    def get_user():
        sanitize(data)  # call to myapp.utils.sanitize

Creates:
  edges: {"myapp.views.get_user": ["myapp.utils.sanitize"]}
  reverseEdges: {"myapp.utils.sanitize": ["myapp.views.get_user"]}
  callSites: {"myapp.views.get_user": [CallSite{Target: "sanitize", ...}]}

func BuildCallGraphFromPath

func BuildCallGraphFromPath(codeGraph *graph.CodeGraph, projectPath string) (*core.CallGraph, *core.ModuleRegistry, error)

BuildCallGraphFromPath is a convenience function that builds a call graph from a project directory path.

It performs all three passes:

  1. Build module registry
  2. Parse code graph (uses existing parsed graph)
  3. Build call graph

Parameters:

  • codeGraph: the parsed code graph from graph.Initialize()
  • projectPath: absolute path to project root

Returns:

  • CallGraph: complete call graph with edges and call sites
  • ModuleRegistry: module path mappings
  • error: if any step fails

func DetectPythonVersion

func DetectPythonVersion(projectPath string) string

DetectPythonVersion infers Python version from project files. It checks in order:

  1. .python-version file
  2. pyproject.toml [tool.poetry.dependencies] or [project] requires-python
  3. Defaults to "3.14"

Parameters:

  • projectPath: absolute path to the project root

Returns:

  • Python version string (e.g., "3.14", "3.11", "3.9")

func FindContainingFunction

func FindContainingFunction(location core.Location, functions []*graph.Node, modulePath string) string

FindContainingFunction finds the function that contains a given call site location. Uses line numbers to determine which function a call belongs to.

Algorithm:

  1. Iterate through all functions in the file
  2. Find function with the highest line number that's still <= call line
  3. Return the FQN of that function

Parameters:

  • location: source location of the call site
  • functions: all function definitions in the file
  • modulePath: module path of the file

Returns:

  • Fully qualified name of the containing function, or empty if not found

func FindFunctionAtLine

func FindFunctionAtLine(root *sitter.Node, lineNumber uint32) *sitter.Node

FindFunctionAtLine searches for a function definition at the specified line number. Returns the tree-sitter node for the function, or nil if not found.

This function recursively traverses the AST tree to find a function or method definition node at the given line number.

Parameters:

  • root: the root tree-sitter node to search from
  • lineNumber: the line number to search for (1-indexed)

Returns:

  • tree-sitter node for the function definition, or nil if not found

func GenerateTaintSummaries

func GenerateTaintSummaries(callGraph *core.CallGraph, codeGraph *graph.CodeGraph, registry *core.ModuleRegistry)

GenerateTaintSummaries analyzes all Python functions for taint flows. This is Pass 5 of the call graph building process.

For each function:

  1. Extract statements from AST
  2. Build def-use chains
  3. Analyze intra-procedural taint
  4. Store TaintSummary in callGraph.Summaries

Parameters:

  • callGraph: the call graph being built (will be populated with summaries)
  • codeGraph: the parsed AST nodes (currently unused, reserved for future use)
  • registry: module registry (currently unused, reserved for future use)

func GetFunctionsInFile

func GetFunctionsInFile(codeGraph *graph.CodeGraph, filePath string) []*graph.Node

GetFunctionsInFile returns all function definitions in a specific file.

Parameters:

  • codeGraph: the parsed code graph
  • filePath: absolute path to the file

Returns:

  • List of function/method nodes in the file, sorted by line number

func IndexFunctions

func IndexFunctions(codeGraph *graph.CodeGraph, callGraph *core.CallGraph, registry *core.ModuleRegistry)

IndexFunctions builds the Functions map in the call graph. Extracts all function definitions from the code graph and maps them by FQN.

Parameters:

  • codeGraph: the parsed code graph
  • callGraph: the call graph being built
  • registry: module registry for resolving file paths to modules

func ReadFileBytes

func ReadFileBytes(filePath string) ([]byte, error)

ReadFileBytes reads a file and returns its contents as a byte slice. Helper function for reading source code.

Parameters:

  • filePath: path to the file (can be relative or absolute)

Returns:

  • File contents as byte slice
  • error if file cannot be read

func ResolveCallTarget

func ResolveCallTarget(target string, importMap *core.ImportMap, registry *core.ModuleRegistry, currentModule string, codeGraph *graph.CodeGraph, typeEngine *resolution.TypeInferenceEngine, callerFQN string, callGraph *core.CallGraph) (string, bool, *core.TypeInfo)

ResolveCallTarget resolves a call target name to a fully qualified name. This is the core resolution logic that handles:

  • Direct function calls: sanitize() → myapp.utils.sanitize
  • Method calls: obj.method() → (unresolved, needs type inference)
  • Imported functions: from utils import sanitize; sanitize() → myapp.utils.sanitize
  • Qualified calls: utils.sanitize() → myapp.utils.sanitize

Algorithm:

  1. Check if target is a simple name (no dots) a. Look up in import map b. If found, return FQN from import c. If not found, try to find in same module
  2. If target has dots (qualified name) a. Split into base and rest b. Resolve base using import map c. Append rest to get full FQN
  3. If all else fails, check if it exists in the registry

Parameters:

  • target: the call target name (e.g., "sanitize", "utils.sanitize", "obj.method")
  • importMap: import mappings for the current file
  • registry: module registry for validation
  • currentModule: the module containing this call
  • codeGraph: the parsed code graph for validation
  • typeEngine: type inference engine
  • callerFQN: fully qualified name of the calling function
  • callGraph: the call graph being built

Returns:

  • Fully qualified name of the target
  • Boolean indicating if resolution was successful
  • TypeInfo if resolved via type inference

Examples:

target="sanitize", imports={"sanitize": "myapp.utils.sanitize"}
  → "myapp.utils.sanitize", true, nil

target="utils.sanitize", imports={"utils": "myapp.utils"}
  → "myapp.utils.sanitize", true, nil

target="obj.method", imports={}
  → "obj.method", false, nil  (needs type inference)

func ValidateFQN

func ValidateFQN(fqn string, registry *core.ModuleRegistry) bool

ValidateFQN checks if a fully qualified name exists in the registry. Handles both module names and function names within modules.

Examples:

"myapp.utils" - checks if module exists
"myapp.utils.sanitize" - checks if module "myapp.utils" exists

Parameters:

  • fqn: fully qualified name to validate
  • registry: module registry

Returns:

  • true if FQN is valid (module or function in existing module)

func ValidateStdlibFQN

func ValidateStdlibFQN(fqn string, remoteLoader *cgregistry.StdlibRegistryRemote) bool

ValidateStdlibFQN checks if a fully qualified name is a stdlib function. Supports module.function, module.submodule.function, and module.Class patterns. Handles platform-specific module aliases (e.g., os.path -> posixpath). Uses lazy loading via remote registry to download modules on-demand.

Examples:

"os.getcwd" - returns true if os.getcwd exists in stdlib
"os.path.join" - returns true if posixpath.join exists in stdlib (alias resolution)
"json.dumps" - returns true if json.dumps exists in stdlib

Parameters:

  • fqn: fully qualified name to check
  • remoteLoader: remote stdlib registry loader

Returns:

  • true if FQN is a stdlib function or class

Types

type ImportMapCache

type ImportMapCache struct {
	// contains filtered or unexported fields
}

ImportMapCache provides thread-safe caching of ImportMap instances. It prevents redundant import extraction by caching results keyed by file path.

Thread-safety:

  • All methods are safe for concurrent use
  • Uses RWMutex for optimized read-heavy workloads
  • GetOrExtract handles double-checked locking pattern

func NewImportMapCache

func NewImportMapCache() *ImportMapCache

NewImportMapCache creates a new empty import map cache.

func (*ImportMapCache) Get

func (c *ImportMapCache) Get(filePath string) (*core.ImportMap, bool)

Get retrieves an ImportMap from the cache if it exists.

Parameters:

  • filePath: absolute path to the Python file

Returns:

  • ImportMap and true if found in cache, nil and false otherwise

func (*ImportMapCache) GetOrExtract

func (c *ImportMapCache) GetOrExtract(filePath string, sourceCode []byte, registry *core.ModuleRegistry) (*core.ImportMap, error)

GetOrExtract retrieves an ImportMap from cache or extracts it if not cached. This is the main entry point for using the cache.

Parameters:

  • filePath: absolute path to the Python file
  • sourceCode: file contents (only used if extraction needed)
  • registry: module registry for resolving imports

Returns:

  • ImportMap from cache or newly extracted
  • error if extraction fails (cache misses only)

Thread-safety:

  • Multiple goroutines can safely call GetOrExtract concurrently
  • First caller for a file will extract and cache
  • Subsequent callers will get cached result

func (*ImportMapCache) Put

func (c *ImportMapCache) Put(filePath string, importMap *core.ImportMap)

Put stores an ImportMap in the cache.

Parameters:

  • filePath: absolute path to the Python file
  • importMap: the extracted ImportMap to cache

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL