builder

package

v0.0.0-...-6bba200 Latest Latest Go to latest Published: Nov 16, 2025 License: AGPL-3.0 Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/shivasurya/code-pathfinder

Links

Open Source Insights

Documentation ¶

Overview ¶

Package builder provides call graph construction orchestration.

This package ties together all components to build a complete call graph:

Module registry (registry package)
Type inference (resolution package)
Import resolution (resolution package)
Call site extraction (extraction package)
Advanced resolution (resolution package)
Pattern detection (patterns package)
Taint analysis (analysis/taint package)

Basic Usage ¶

// Build from existing code graph
callGraph, err := builder.BuildCallGraph(codeGraph, moduleRegistry, projectRoot)

Call Resolution Strategy ¶

The builder uses a multi-strategy approach to resolve function calls:

Direct import resolution
Method chaining with type inference
Self-attribute resolution (self.attr.method)
Type inference for variable.method() calls
ORM pattern detection (Django, SQLAlchemy)
Framework detection (known external frameworks)
Standard library resolution via remote CDN

Each strategy is tried in order until one succeeds.

Multi-Pass Architecture ¶

The builder performs multiple passes over the codebase:

Pass 1: Index all function definitions
Pass 2: Extract return types from all functions
Pass 3: Extract variable assignments and type bindings
Pass 4: Extract class attributes
Pass 5: Resolve call sites and build call graph edges
Pass 6: Generate taint summaries for security analysis

This multi-pass approach ensures that all necessary type information is collected before attempting to resolve call sites.

Caching ¶

The builder uses ImportMapCache to avoid re-parsing imports from the same file multiple times, significantly improving performance.

Thread Safety ¶

All exported functions in this package are thread-safe. The ImportMapCache uses a read-write mutex to allow concurrent reads while ensuring safe writes.

Index ¶

func BuildCallGraph(codeGraph *graph.CodeGraph, registry *core.ModuleRegistry, projectRoot string) (*core.CallGraph, error)
func BuildCallGraphFromPath(codeGraph *graph.CodeGraph, projectPath string) (*core.CallGraph, *core.ModuleRegistry, error)
func DetectPythonVersion(projectPath string) string
func FindContainingFunction(location core.Location, functions []*graph.Node, modulePath string) string
func FindFunctionAtLine(root *sitter.Node, lineNumber uint32) *sitter.Node
func GenerateTaintSummaries(callGraph *core.CallGraph, codeGraph *graph.CodeGraph, ...)
func GetFunctionsInFile(codeGraph *graph.CodeGraph, filePath string) []*graph.Node
func IndexFunctions(codeGraph *graph.CodeGraph, callGraph *core.CallGraph, ...)
func ReadFileBytes(filePath string) ([]byte, error)
func ResolveCallTarget(target string, importMap *core.ImportMap, registry *core.ModuleRegistry, ...) (string, bool, *core.TypeInfo)
func ValidateFQN(fqn string, registry *core.ModuleRegistry) bool
func ValidateStdlibFQN(fqn string, remoteLoader *cgregistry.StdlibRegistryRemote) bool
type ImportMapCache
- func NewImportMapCache() *ImportMapCache

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BuildCallGraph ¶

func BuildCallGraph(codeGraph *graph.CodeGraph, registry *core.ModuleRegistry, projectRoot string) (*core.CallGraph, error)

BuildCallGraph constructs the complete call graph for a Python project. This is Pass 3 of the 3-pass algorithm:

Pass 1: BuildModuleRegistry - map files to modules
Pass 2: ExtractImports + ExtractCallSites - parse imports and calls
Pass 3: BuildCallGraph - resolve calls and build graph

Algorithm:

For each Python file in the project: a. Extract imports to build ImportMap b. Extract call sites from AST c. Extract function definitions from main graph
For each call site: a. Resolve target name using ImportMap b. Find target function definition in registry c. Add edge from caller to callee d. Store detailed call site information

Parameters:

codeGraph: the existing code graph with parsed AST nodes
registry: module registry mapping files to modules
projectRoot: absolute path to project root

Returns:

CallGraph: complete call graph with edges and call sites
error: if any step fails

Example:

Given:
  File: myapp/views.py
    def get_user():
        sanitize(data)  # call to myapp.utils.sanitize

Creates:
  edges: {"myapp.views.get_user": ["myapp.utils.sanitize"]}
  reverseEdges: {"myapp.utils.sanitize": ["myapp.views.get_user"]}
  callSites: {"myapp.views.get_user": [CallSite{Target: "sanitize", ...}]}

func BuildCallGraphFromPath ¶

func BuildCallGraphFromPath(codeGraph *graph.CodeGraph, projectPath string) (*core.CallGraph, *core.ModuleRegistry, error)

BuildCallGraphFromPath is a convenience function that builds a call graph from a project directory path.

It performs all three passes:

Build module registry
Parse code graph (uses existing parsed graph)
Build call graph

Parameters:

codeGraph: the parsed code graph from graph.Initialize()
projectPath: absolute path to project root

Returns:

CallGraph: complete call graph with edges and call sites
ModuleRegistry: module path mappings
error: if any step fails

func DetectPythonVersion ¶

func DetectPythonVersion(projectPath string) string

DetectPythonVersion infers Python version from project files. It checks in order:

.python-version file
pyproject.toml [tool.poetry.dependencies] or [project] requires-python
Defaults to "3.14"

Parameters:

projectPath: absolute path to the project root

Returns:

Python version string (e.g., "3.14", "3.11", "3.9")

func FindContainingFunction ¶

func FindContainingFunction(location core.Location, functions []*graph.Node, modulePath string) string

FindContainingFunction finds the function that contains a given call site location. Uses line numbers to determine which function a call belongs to.

Algorithm:

Iterate through all functions in the file
Find function with the highest line number that's still <= call line
Return the FQN of that function

Parameters:

location: source location of the call site
functions: all function definitions in the file
modulePath: module path of the file

Returns:

Fully qualified name of the containing function, or empty if not found

func FindFunctionAtLine ¶

func FindFunctionAtLine(root *sitter.Node, lineNumber uint32) *sitter.Node

FindFunctionAtLine searches for a function definition at the specified line number. Returns the tree-sitter node for the function, or nil if not found.

This function recursively traverses the AST tree to find a function or method definition node at the given line number.

Parameters:

root: the root tree-sitter node to search from
lineNumber: the line number to search for (1-indexed)

Returns:

tree-sitter node for the function definition, or nil if not found

func GenerateTaintSummaries ¶

func GenerateTaintSummaries(callGraph *core.CallGraph, codeGraph *graph.CodeGraph, registry *core.ModuleRegistry)

GenerateTaintSummaries analyzes all Python functions for taint flows. This is Pass 5 of the call graph building process.

For each function:

Extract statements from AST
Build def-use chains
Analyze intra-procedural taint
Store TaintSummary in callGraph.Summaries

Parameters:

callGraph: the call graph being built (will be populated with summaries)
codeGraph: the parsed AST nodes (currently unused, reserved for future use)
registry: module registry (currently unused, reserved for future use)

func GetFunctionsInFile ¶

func GetFunctionsInFile(codeGraph *graph.CodeGraph, filePath string) []*graph.Node

GetFunctionsInFile returns all function definitions in a specific file.

Parameters:

codeGraph: the parsed code graph
filePath: absolute path to the file

Returns:

List of function/method nodes in the file, sorted by line number

func IndexFunctions ¶

func IndexFunctions(codeGraph *graph.CodeGraph, callGraph *core.CallGraph, registry *core.ModuleRegistry)

IndexFunctions builds the Functions map in the call graph. Extracts all function definitions from the code graph and maps them by FQN.

Parameters:

codeGraph: the parsed code graph
callGraph: the call graph being built
registry: module registry for resolving file paths to modules

func ReadFileBytes ¶

func ReadFileBytes(filePath string) ([]byte, error)

ReadFileBytes reads a file and returns its contents as a byte slice. Helper function for reading source code.

Parameters:

filePath: path to the file (can be relative or absolute)

Returns:

File contents as byte slice
error if file cannot be read

func ResolveCallTarget ¶

func ResolveCallTarget(target string, importMap *core.ImportMap, registry *core.ModuleRegistry, currentModule string, codeGraph *graph.CodeGraph, typeEngine *resolution.TypeInferenceEngine, callerFQN string, callGraph *core.CallGraph) (string, bool, *core.TypeInfo)

ResolveCallTarget resolves a call target name to a fully qualified name. This is the core resolution logic that handles:

Direct function calls: sanitize() → myapp.utils.sanitize
Method calls: obj.method() → (unresolved, needs type inference)
Imported functions: from utils import sanitize; sanitize() → myapp.utils.sanitize
Qualified calls: utils.sanitize() → myapp.utils.sanitize

Algorithm:

Check if target is a simple name (no dots) a. Look up in import map b. If found, return FQN from import c. If not found, try to find in same module
If target has dots (qualified name) a. Split into base and rest b. Resolve base using import map c. Append rest to get full FQN
If all else fails, check if it exists in the registry

Parameters:

target: the call target name (e.g., "sanitize", "utils.sanitize", "obj.method")
importMap: import mappings for the current file
registry: module registry for validation
currentModule: the module containing this call
codeGraph: the parsed code graph for validation
typeEngine: type inference engine
callerFQN: fully qualified name of the calling function
callGraph: the call graph being built

Returns:

Fully qualified name of the target
Boolean indicating if resolution was successful
TypeInfo if resolved via type inference

Examples:

target="sanitize", imports={"sanitize": "myapp.utils.sanitize"}
  → "myapp.utils.sanitize", true, nil

target="utils.sanitize", imports={"utils": "myapp.utils"}
  → "myapp.utils.sanitize", true, nil

target="obj.method", imports={}
  → "obj.method", false, nil  (needs type inference)

func ValidateFQN ¶

func ValidateFQN(fqn string, registry *core.ModuleRegistry) bool

ValidateFQN checks if a fully qualified name exists in the registry. Handles both module names and function names within modules.

Examples:

"myapp.utils" - checks if module exists
"myapp.utils.sanitize" - checks if module "myapp.utils" exists

Parameters:

fqn: fully qualified name to validate
registry: module registry

Returns:

true if FQN is valid (module or function in existing module)

func ValidateStdlibFQN ¶

func ValidateStdlibFQN(fqn string, remoteLoader *cgregistry.StdlibRegistryRemote) bool

ValidateStdlibFQN checks if a fully qualified name is a stdlib function. Supports module.function, module.submodule.function, and module.Class patterns. Handles platform-specific module aliases (e.g., os.path -> posixpath). Uses lazy loading via remote registry to download modules on-demand.

Examples:

"os.getcwd" - returns true if os.getcwd exists in stdlib
"os.path.join" - returns true if posixpath.join exists in stdlib (alias resolution)
"json.dumps" - returns true if json.dumps exists in stdlib

Parameters:

fqn: fully qualified name to check
remoteLoader: remote stdlib registry loader

Returns:

true if FQN is a stdlib function or class

Types ¶

type ImportMapCache ¶

type ImportMapCache struct {
	// contains filtered or unexported fields
}

ImportMapCache provides thread-safe caching of ImportMap instances. It prevents redundant import extraction by caching results keyed by file path.

Thread-safety:

All methods are safe for concurrent use
Uses RWMutex for optimized read-heavy workloads
GetOrExtract handles double-checked locking pattern

func NewImportMapCache ¶

func NewImportMapCache() *ImportMapCache

NewImportMapCache creates a new empty import map cache.

func (*ImportMapCache) Get ¶

func (c *ImportMapCache) Get(filePath string) (*core.ImportMap, bool)

Get retrieves an ImportMap from the cache if it exists.

Parameters:

filePath: absolute path to the Python file

Returns:

ImportMap and true if found in cache, nil and false otherwise

func (*ImportMapCache) GetOrExtract ¶

func (c *ImportMapCache) GetOrExtract(filePath string, sourceCode []byte, registry *core.ModuleRegistry) (*core.ImportMap, error)

GetOrExtract retrieves an ImportMap from cache or extracts it if not cached. This is the main entry point for using the cache.

Parameters:

filePath: absolute path to the Python file
sourceCode: file contents (only used if extraction needed)
registry: module registry for resolving imports

Returns:

ImportMap from cache or newly extracted
error if extraction fails (cache misses only)

Thread-safety:

Multiple goroutines can safely call GetOrExtract concurrently
First caller for a file will extract and cache
Subsequent callers will get cached result

func (*ImportMapCache) Put ¶

func (c *ImportMapCache) Put(filePath string, importMap *core.ImportMap)

Put stores an ImportMap in the cache.

Parameters:

filePath: absolute path to the Python file
importMap: the extracted ImportMap to cache

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL