Documentation
¶
Overview ¶
Package builder provides call graph construction orchestration.
This package ties together all components to build a complete call graph:
- Module registry (registry package)
- Type inference (resolution package)
- Import resolution (resolution package)
- Call site extraction (extraction package)
- Advanced resolution (resolution package)
- Pattern detection (patterns package)
- Taint analysis (analysis/taint package)
Basic Usage ¶
// Build from existing code graph callGraph, err := builder.BuildCallGraph(codeGraph, moduleRegistry, projectRoot)
Call Resolution Strategy ¶
The builder uses a multi-strategy approach to resolve function calls:
- Direct import resolution
- Method chaining with type inference
- Self-attribute resolution (self.attr.method)
- Type inference for variable.method() calls
- ORM pattern detection (Django, SQLAlchemy)
- Framework detection (known external frameworks)
- Standard library resolution via remote CDN
Each strategy is tried in order until one succeeds.
Multi-Pass Architecture ¶
The builder performs multiple passes over the codebase:
Pass 1: Index all function definitions Pass 2: Extract return types from all functions Pass 3: Extract variable assignments and type bindings Pass 4: Extract class attributes Pass 5: Resolve call sites and build call graph edges Pass 6: Generate taint summaries for security analysis
This multi-pass approach ensures that all necessary type information is collected before attempting to resolve call sites.
Caching ¶
The builder uses ImportMapCache to avoid re-parsing imports from the same file multiple times, significantly improving performance.
Thread Safety ¶
All exported functions in this package are thread-safe. The ImportMapCache uses a read-write mutex to allow concurrent reads while ensuring safe writes.
Index ¶
- func BuildCallGraph(codeGraph *graph.CodeGraph, registry *core.ModuleRegistry, projectRoot string) (*core.CallGraph, error)
- func BuildCallGraphFromPath(codeGraph *graph.CodeGraph, projectPath string) (*core.CallGraph, *core.ModuleRegistry, error)
- func DetectPythonVersion(projectPath string) string
- func FindContainingFunction(location core.Location, functions []*graph.Node, modulePath string) string
- func FindFunctionAtLine(root *sitter.Node, lineNumber uint32) *sitter.Node
- func GenerateTaintSummaries(callGraph *core.CallGraph, codeGraph *graph.CodeGraph, ...)
- func GetFunctionsInFile(codeGraph *graph.CodeGraph, filePath string) []*graph.Node
- func IndexFunctions(codeGraph *graph.CodeGraph, callGraph *core.CallGraph, ...)
- func ReadFileBytes(filePath string) ([]byte, error)
- func ResolveCallTarget(target string, importMap *core.ImportMap, registry *core.ModuleRegistry, ...) (string, bool, *core.TypeInfo)
- func ValidateFQN(fqn string, registry *core.ModuleRegistry) bool
- func ValidateStdlibFQN(fqn string, remoteLoader *cgregistry.StdlibRegistryRemote) bool
- type ImportMapCache
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BuildCallGraph ¶
func BuildCallGraph(codeGraph *graph.CodeGraph, registry *core.ModuleRegistry, projectRoot string) (*core.CallGraph, error)
BuildCallGraph constructs the complete call graph for a Python project. This is Pass 3 of the 3-pass algorithm:
- Pass 1: BuildModuleRegistry - map files to modules
- Pass 2: ExtractImports + ExtractCallSites - parse imports and calls
- Pass 3: BuildCallGraph - resolve calls and build graph
Algorithm:
- For each Python file in the project: a. Extract imports to build ImportMap b. Extract call sites from AST c. Extract function definitions from main graph
- For each call site: a. Resolve target name using ImportMap b. Find target function definition in registry c. Add edge from caller to callee d. Store detailed call site information
Parameters:
- codeGraph: the existing code graph with parsed AST nodes
- registry: module registry mapping files to modules
- projectRoot: absolute path to project root
Returns:
- CallGraph: complete call graph with edges and call sites
- error: if any step fails
Example:
Given:
File: myapp/views.py
def get_user():
sanitize(data) # call to myapp.utils.sanitize
Creates:
edges: {"myapp.views.get_user": ["myapp.utils.sanitize"]}
reverseEdges: {"myapp.utils.sanitize": ["myapp.views.get_user"]}
callSites: {"myapp.views.get_user": [CallSite{Target: "sanitize", ...}]}
func BuildCallGraphFromPath ¶
func BuildCallGraphFromPath(codeGraph *graph.CodeGraph, projectPath string) (*core.CallGraph, *core.ModuleRegistry, error)
BuildCallGraphFromPath is a convenience function that builds a call graph from a project directory path.
It performs all three passes:
- Build module registry
- Parse code graph (uses existing parsed graph)
- Build call graph
Parameters:
- codeGraph: the parsed code graph from graph.Initialize()
- projectPath: absolute path to project root
Returns:
- CallGraph: complete call graph with edges and call sites
- ModuleRegistry: module path mappings
- error: if any step fails
func DetectPythonVersion ¶
DetectPythonVersion infers Python version from project files. It checks in order:
- .python-version file
- pyproject.toml [tool.poetry.dependencies] or [project] requires-python
- Defaults to "3.14"
Parameters:
- projectPath: absolute path to the project root
Returns:
- Python version string (e.g., "3.14", "3.11", "3.9")
func FindContainingFunction ¶
func FindContainingFunction(location core.Location, functions []*graph.Node, modulePath string) string
FindContainingFunction finds the function that contains a given call site location. Uses line numbers to determine which function a call belongs to.
Algorithm:
- Iterate through all functions in the file
- Find function with the highest line number that's still <= call line
- Return the FQN of that function
Parameters:
- location: source location of the call site
- functions: all function definitions in the file
- modulePath: module path of the file
Returns:
- Fully qualified name of the containing function, or empty if not found
func FindFunctionAtLine ¶
FindFunctionAtLine searches for a function definition at the specified line number. Returns the tree-sitter node for the function, or nil if not found.
This function recursively traverses the AST tree to find a function or method definition node at the given line number.
Parameters:
- root: the root tree-sitter node to search from
- lineNumber: the line number to search for (1-indexed)
Returns:
- tree-sitter node for the function definition, or nil if not found
func GenerateTaintSummaries ¶
func GenerateTaintSummaries(callGraph *core.CallGraph, codeGraph *graph.CodeGraph, registry *core.ModuleRegistry)
GenerateTaintSummaries analyzes all Python functions for taint flows. This is Pass 5 of the call graph building process.
For each function:
- Extract statements from AST
- Build def-use chains
- Analyze intra-procedural taint
- Store TaintSummary in callGraph.Summaries
Parameters:
- callGraph: the call graph being built (will be populated with summaries)
- codeGraph: the parsed AST nodes (currently unused, reserved for future use)
- registry: module registry (currently unused, reserved for future use)
func GetFunctionsInFile ¶
GetFunctionsInFile returns all function definitions in a specific file.
Parameters:
- codeGraph: the parsed code graph
- filePath: absolute path to the file
Returns:
- List of function/method nodes in the file, sorted by line number
func IndexFunctions ¶
func IndexFunctions(codeGraph *graph.CodeGraph, callGraph *core.CallGraph, registry *core.ModuleRegistry)
IndexFunctions builds the Functions map in the call graph. Extracts all function definitions from the code graph and maps them by FQN.
Parameters:
- codeGraph: the parsed code graph
- callGraph: the call graph being built
- registry: module registry for resolving file paths to modules
func ReadFileBytes ¶
ReadFileBytes reads a file and returns its contents as a byte slice. Helper function for reading source code.
Parameters:
- filePath: path to the file (can be relative or absolute)
Returns:
- File contents as byte slice
- error if file cannot be read
func ResolveCallTarget ¶
func ResolveCallTarget(target string, importMap *core.ImportMap, registry *core.ModuleRegistry, currentModule string, codeGraph *graph.CodeGraph, typeEngine *resolution.TypeInferenceEngine, callerFQN string, callGraph *core.CallGraph) (string, bool, *core.TypeInfo)
ResolveCallTarget resolves a call target name to a fully qualified name. This is the core resolution logic that handles:
- Direct function calls: sanitize() → myapp.utils.sanitize
- Method calls: obj.method() → (unresolved, needs type inference)
- Imported functions: from utils import sanitize; sanitize() → myapp.utils.sanitize
- Qualified calls: utils.sanitize() → myapp.utils.sanitize
Algorithm:
- Check if target is a simple name (no dots) a. Look up in import map b. If found, return FQN from import c. If not found, try to find in same module
- If target has dots (qualified name) a. Split into base and rest b. Resolve base using import map c. Append rest to get full FQN
- If all else fails, check if it exists in the registry
Parameters:
- target: the call target name (e.g., "sanitize", "utils.sanitize", "obj.method")
- importMap: import mappings for the current file
- registry: module registry for validation
- currentModule: the module containing this call
- codeGraph: the parsed code graph for validation
- typeEngine: type inference engine
- callerFQN: fully qualified name of the calling function
- callGraph: the call graph being built
Returns:
- Fully qualified name of the target
- Boolean indicating if resolution was successful
- TypeInfo if resolved via type inference
Examples:
target="sanitize", imports={"sanitize": "myapp.utils.sanitize"}
→ "myapp.utils.sanitize", true, nil
target="utils.sanitize", imports={"utils": "myapp.utils"}
→ "myapp.utils.sanitize", true, nil
target="obj.method", imports={}
→ "obj.method", false, nil (needs type inference)
func ValidateFQN ¶
func ValidateFQN(fqn string, registry *core.ModuleRegistry) bool
ValidateFQN checks if a fully qualified name exists in the registry. Handles both module names and function names within modules.
Examples:
"myapp.utils" - checks if module exists "myapp.utils.sanitize" - checks if module "myapp.utils" exists
Parameters:
- fqn: fully qualified name to validate
- registry: module registry
Returns:
- true if FQN is valid (module or function in existing module)
func ValidateStdlibFQN ¶
func ValidateStdlibFQN(fqn string, remoteLoader *cgregistry.StdlibRegistryRemote) bool
ValidateStdlibFQN checks if a fully qualified name is a stdlib function. Supports module.function, module.submodule.function, and module.Class patterns. Handles platform-specific module aliases (e.g., os.path -> posixpath). Uses lazy loading via remote registry to download modules on-demand.
Examples:
"os.getcwd" - returns true if os.getcwd exists in stdlib "os.path.join" - returns true if posixpath.join exists in stdlib (alias resolution) "json.dumps" - returns true if json.dumps exists in stdlib
Parameters:
- fqn: fully qualified name to check
- remoteLoader: remote stdlib registry loader
Returns:
- true if FQN is a stdlib function or class
Types ¶
type ImportMapCache ¶
type ImportMapCache struct {
// contains filtered or unexported fields
}
ImportMapCache provides thread-safe caching of ImportMap instances. It prevents redundant import extraction by caching results keyed by file path.
Thread-safety:
- All methods are safe for concurrent use
- Uses RWMutex for optimized read-heavy workloads
- GetOrExtract handles double-checked locking pattern
func NewImportMapCache ¶
func NewImportMapCache() *ImportMapCache
NewImportMapCache creates a new empty import map cache.
func (*ImportMapCache) Get ¶
func (c *ImportMapCache) Get(filePath string) (*core.ImportMap, bool)
Get retrieves an ImportMap from the cache if it exists.
Parameters:
- filePath: absolute path to the Python file
Returns:
- ImportMap and true if found in cache, nil and false otherwise
func (*ImportMapCache) GetOrExtract ¶
func (c *ImportMapCache) GetOrExtract(filePath string, sourceCode []byte, registry *core.ModuleRegistry) (*core.ImportMap, error)
GetOrExtract retrieves an ImportMap from cache or extracts it if not cached. This is the main entry point for using the cache.
Parameters:
- filePath: absolute path to the Python file
- sourceCode: file contents (only used if extraction needed)
- registry: module registry for resolving imports
Returns:
- ImportMap from cache or newly extracted
- error if extraction fails (cache misses only)
Thread-safety:
- Multiple goroutines can safely call GetOrExtract concurrently
- First caller for a file will extract and cache
- Subsequent callers will get cached result