python

package
v1.1.17 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 26, 2025 License: MIT Imports: 8 Imported by: 0

README ยถ

Python Code Analyzer

Code analyzer for extracting symbols, structure, and relationships from Python files. Indexes code for semantic search in Qdrant.

Status: โœ… FULLY IMPLEMENTED


๐ŸŽฏ What This Analyzer Does

The Python analyzer parses .py files and extracts:

  1. Symbols - classes, methods, functions, variables, constants
  2. Relationships - inheritance, dependencies, method calls
  3. Metadata - decorators, type hints, docstrings

Information is converted to CodeChunks which are then indexed in Qdrant for semantic search.


๐Ÿ“Š Data Flow

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   .py Files     โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  Python Analyzer โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚   CodeChunks    โ”‚
โ”‚  (source code)  โ”‚     โ”‚  (regex parsing) โ”‚     โ”‚  (structured)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                          โ”‚
                                                          โ–ผ
                                                 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                                 โ”‚     Qdrant      โ”‚
                                                 โ”‚  (vector store) โ”‚
                                                 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ” What We Index

1. Classes (type: "class")
@dataclass
class User(BaseModel, LoggingMixin, metaclass=ABCMeta):
    """Represents a user in the system."""
    name: str
    email: str

Extracted information:

Field Value Description
name "User" Class name
bases ["BaseModel", "LoggingMixin"] Parent classes (inheritance)
decorators ["dataclass"] Applied decorators
is_abstract true If it's an abstract class (ABC)
is_dataclass true If decorated with @dataclass
is_enum false If inherits from Enum
is_protocol false If it's a Protocol (typing)
is_mixin true If it is/uses a mixin
metaclass "ABCMeta" Specified metaclass
dependencies ["BaseModel", "LoggingMixin"] All class dependencies
docstring "Represents a user..." Class documentation
2. Methods (type: "method")
class UserService:
    async def get_user(self, user_id: int) -> User:
        """Returns a user by ID."""
        self.validate_id(user_id)
        user = await self.repository.find(user_id)
        return user

Extracted information:

Field Value Description
name "get_user" Method name
signature "async def get_user(self, user_id: int) -> User" Complete signature
class_name "UserService" Parent class
parameters [{name: "user_id", type: "int"}] Parameters with types
return_type "User" Return type
is_async true If it's an async method
is_static false If it's @staticmethod
is_classmethod false If it's @classmethod
calls [{name: "validate_id", receiver: "self"}, ...] Called methods
type_deps ["User"] Used types (dependencies)
docstring "Returns a user..." Method documentation
3. Functions (type: "function")
@lru_cache(maxsize=100)
async def fetch_data(url: str) -> dict:
    """Downloads data from URL."""
    yield from process(url)

Extracted information:

Field Value Description
name "fetch_data" Function name
signature "async def fetch_data(url: str) -> dict" Signature
is_async true If it's async
is_generator true If it uses yield
decorators ["lru_cache"] Applied decorators
4. Properties (type: "property")
class User:
    @property
    def full_name(self) -> str:
        return f"{self.first_name} {self.last_name}"
    
    @full_name.setter
    def full_name(self, value: str):
        self.first_name, self.last_name = value.split()

Extracted information:

Field Value Description
name "full_name" Property name
type "str" Return type
has_getter true Has getter (@property)
has_setter true Has setter (@x.setter)
has_deleter false Has deleter (@x.deleter)
5. Constants (type: "const")
MAX_CONNECTIONS: int = 100
API_BASE_URL = "https://api.example.com"

Extracted information:

  • Detected by UPPER_CASE convention
  • Type and value are extracted
6. Variables (type: "var")
logger = logging.getLogger(__name__)
default_config: Config = Config()

๐Ÿ”— Relationship Detection

Dependency Graph

The analyzer builds a dependency graph between classes:

class OrderService:
    repository: OrderRepository  # โ†’ dependency
    
    def create_order(self, user: User) -> Order:  # โ†’ dependencies: User, Order
        notification = NotificationService()  # โ†’ dependency (from calls)
        return Order(...)

Detected dependencies:

  • OrderRepository - from type hint on variable
  • User - from parameter
  • Order - from return type
  • NotificationService - from method calls
Method Call Analysis
def process(self, data):
    self.validate(data)           # โ†’ self.validate
    result = Helper.compute(data) # โ†’ Helper.compute (static call)
    super().process(data)         # โ†’ super().process
    save_to_db(result)            # โ†’ save_to_db (function call)

Detected calls:

{
  "calls": [
    {"name": "validate", "receiver": "self", "line": 2},
    {"name": "compute", "receiver": "Helper", "class_name": "Helper", "line": 3},
    {"name": "process", "receiver": "super()", "line": 4},
    {"name": "save_to_db", "line": 5}
  ]
}

๐Ÿ—๏ธ File Structure

python/
โ”œโ”€โ”€ types.go           # Types: ModuleInfo, ClassInfo, MethodInfo, MethodCall, etc.
โ”œโ”€โ”€ analyzer.go        # PathAnalyzer implementation (1500+ lines)
โ”œโ”€โ”€ api_analyzer.go    # Legacy APIAnalyzer (build-tagged out)
โ”œโ”€โ”€ analyzer_test.go   # 26 comprehensive tests
โ””โ”€โ”€ README.md          # This documentation

๐Ÿ’ป Usage

Standard Analysis
import "github.com/doITmagic/rag-code-mcp/internal/ragcode/analyzers/python"

// Create analyzer (excludes test files by default)
analyzer := python.NewCodeAnalyzer()

// Analyze directories/files
chunks, err := analyzer.AnalyzePaths([]string{"./myproject"})

for _, chunk := range chunks {
    fmt.Printf("[%s] %s.%s\n", chunk.Type, chunk.Package, chunk.Name)
    fmt.Printf("  Dependencies: %v\n", chunk.Metadata["dependencies"])
}
With Options
// Include test files
analyzer := python.NewCodeAnalyzerWithOptions(true)

๐Ÿ”Œ Integration

Language Manager

The Python analyzer is automatically selected for:

  • python, py - generic Python projects
  • django - Django projects
  • flask - Flask projects
  • fastapi - FastAPI projects
Workspace Detection

Python projects are detected by:

File Description
pyproject.toml PEP 518 - modern Python
setup.py Setuptools legacy
requirements.txt pip dependencies
Pipfile Pipenv

๐Ÿ“‹ CodeChunk Types

Type Description Example
class Class definition class User(BaseModel):
method Class method def get_user(self):
function Module-level function def helper():
property @property @property def name(self):
const UPPER_CASE constant MAX_SIZE = 100
var Module-level variable logger = getLogger()

๐Ÿท๏ธ Complete Metadata

Class Metadata
{
  "bases": ["BaseModel", "Mixin"],
  "decorators": ["dataclass"],
  "is_abstract": false,
  "is_dataclass": true,
  "is_enum": false,
  "is_protocol": false,
  "is_mixin": false,
  "metaclass": "",
  "dependencies": ["BaseModel", "Mixin", "User", "Order"]
}
Method Metadata
{
  "class_name": "UserService",
  "is_static": false,
  "is_classmethod": false,
  "is_async": true,
  "is_abstract": false,
  "decorators": ["cache"],
  "calls": [
    {"name": "validate", "receiver": "self", "line": 10},
    {"name": "save", "receiver": "self.repository", "line": 12}
  ],
  "type_deps": ["User", "Order"]
}
Function Metadata
{
  "is_async": true,
  "is_generator": false,
  "decorators": ["lru_cache"]
}

๐Ÿงช Testing

# Run all tests (26 tests)
go test ./internal/ragcode/analyzers/python/

# With verbose output
go test -v ./internal/ragcode/analyzers/python/

# Specific test
go test -v -run TestMethodCallExtraction ./internal/ragcode/analyzers/python/

# With coverage
go test -cover ./internal/ragcode/analyzers/python/

๐Ÿšซ Excluded Paths

The analyzer automatically skips:

  • __pycache__/ - Python cache
  • .venv/, venv/, env/ - virtual environments
  • .git/ - Git
  • .tox/, .pytest_cache/, .mypy_cache/ - caches
  • dist/, build/ - distributions
  • test_*.py, *_test.py - test files (by default)

โš ๏ธ Limitations

Limitation Description
Regex-based Doesn't use full Python AST - may miss edge cases
No Type Resolution Type hints are extracted as strings, not resolved
Single-file Each file is analyzed independently
No Runtime Info Doesn't execute code, only static analysis

๐Ÿ”ฎ Future Improvements

  • Django: models, views, URLs, forms
  • Flask/FastAPI: route detection, dependency injection
  • Type resolution: cross-file type hint resolution
  • Import graph: complete import graph
  • Nested classes: classes defined inside other classes
  • Comprehensions: list/dict/set comprehensions

Documentation ยถ

Index ยถ

Constants ยถ

This section is empty.

Variables ยถ

This section is empty.

Functions ยถ

This section is empty.

Types ยถ

type ClassInfo ยถ

type ClassInfo struct {
	Name         string         `json:"name"`
	Description  string         `json:"description"` // Class docstring
	Bases        []string       `json:"bases,omitempty"`
	Decorators   []string       `json:"decorators,omitempty"`
	Methods      []MethodInfo   `json:"methods"`
	Properties   []PropertyInfo `json:"properties"`
	ClassVars    []VariableInfo `json:"class_vars,omitempty"`
	IsAbstract   bool           `json:"is_abstract"`
	IsDataclass  bool           `json:"is_dataclass"`
	IsEnum       bool           `json:"is_enum"`                // Inherits from Enum
	IsProtocol   bool           `json:"is_protocol"`            // Inherits from Protocol (typing)
	IsMixin      bool           `json:"is_mixin"`               // Class name ends with Mixin or used as mixin
	Metaclass    string         `json:"metaclass,omitempty"`    // metaclass= argument
	Dependencies []string       `json:"dependencies,omitempty"` // Classes this class depends on (via type hints, imports)
	FilePath     string         `json:"file_path,omitempty"`
	StartLine    int            `json:"start_line,omitempty"`
	EndLine      int            `json:"end_line,omitempty"`
	Code         string         `json:"code,omitempty"`
}

ClassInfo describes a Python class

type CodeAnalyzer ยถ

type CodeAnalyzer struct {
	// contains filtered or unexported fields
}

CodeAnalyzer implements PathAnalyzer for Python

func NewCodeAnalyzer ยถ

func NewCodeAnalyzer() *CodeAnalyzer

NewCodeAnalyzer creates a new Python code analyzer

func NewCodeAnalyzerWithOptions ยถ

func NewCodeAnalyzerWithOptions(includeTests bool) *CodeAnalyzer

NewCodeAnalyzerWithOptions creates a Python code analyzer with options

func (*CodeAnalyzer) AnalyzeFile ยถ

func (ca *CodeAnalyzer) AnalyzeFile(filePath string) ([]codetypes.CodeChunk, error)

AnalyzeFile analyzes a single Python file

func (*CodeAnalyzer) AnalyzePaths ยถ

func (ca *CodeAnalyzer) AnalyzePaths(paths []string) ([]codetypes.CodeChunk, error)

AnalyzePaths implements the PathAnalyzer interface

func (*CodeAnalyzer) GetModules ยถ

func (ca *CodeAnalyzer) GetModules() []*ModuleInfo

GetModules returns the internal module information

type ConstantInfo ยถ

type ConstantInfo struct {
	Name        string `json:"name"`
	Type        string `json:"type,omitempty"`
	Value       string `json:"value"`
	Description string `json:"description"`
	FilePath    string `json:"file_path,omitempty"`
	StartLine   int    `json:"start_line,omitempty"`
	EndLine     int    `json:"end_line,omitempty"`
}

ConstantInfo describes a module-level constant (UPPER_CASE)

type DependencyInfo ยถ

type DependencyInfo struct {
	Source     string   `json:"source"`     // Source class/module
	Target     string   `json:"target"`     // Target class/module
	Type       string   `json:"type"`       // "inheritance", "composition", "import", "type_hint"
	References []string `json:"references"` // Specific references (method names, etc.)
}

DependencyInfo represents a dependency relationship between classes/modules

type DocstringArg ยถ

type DocstringArg struct {
	Name        string `json:"name"`
	Type        string `json:"type,omitempty"`
	Description string `json:"description"`
	Default     string `json:"default,omitempty"`
	Optional    bool   `json:"optional,omitempty"`
}

DocstringArg represents a parameter/attribute in docstring

type DocstringInfo ยถ

type DocstringInfo struct {
	Summary     string           `json:"summary"`
	Description string           `json:"description"`
	Args        []DocstringArg   `json:"args,omitempty"`
	Returns     *DocstringReturn `json:"returns,omitempty"`
	Raises      []DocstringRaise `json:"raises,omitempty"`
	Examples    []string         `json:"examples,omitempty"`
	Attributes  []DocstringArg   `json:"attributes,omitempty"`
}

DocstringInfo contains parsed docstring information

type DocstringRaise ยถ

type DocstringRaise struct {
	Type        string `json:"type"`
	Description string `json:"description"`
}

DocstringRaise represents an exception that can be raised

type DocstringReturn ยถ

type DocstringReturn struct {
	Type        string `json:"type,omitempty"`
	Description string `json:"description"`
}

DocstringReturn represents return value documentation

type FunctionInfo ยถ

type FunctionInfo struct {
	Name        string                 `json:"name"`
	Signature   string                 `json:"signature"`
	Description string                 `json:"description"` // Function docstring
	Parameters  []codetypes.ParamInfo  `json:"parameters"`
	ReturnType  string                 `json:"return_type,omitempty"`
	Returns     []codetypes.ReturnInfo `json:"returns,omitempty"`
	Decorators  []string               `json:"decorators,omitempty"`
	IsAsync     bool                   `json:"is_async"`
	IsGenerator bool                   `json:"is_generator"`
	FilePath    string                 `json:"file_path,omitempty"`
	StartLine   int                    `json:"start_line,omitempty"`
	EndLine     int                    `json:"end_line,omitempty"`
	Code        string                 `json:"code,omitempty"`
}

FunctionInfo describes a module-level function

type ImportInfo ยถ

type ImportInfo struct {
	Module    string   `json:"module"`          // Module being imported
	Names     []string `json:"names,omitempty"` // Specific names imported (from X import a, b)
	Alias     string   `json:"alias,omitempty"` // Import alias (import X as Y)
	IsFrom    bool     `json:"is_from"`         // True if "from X import Y"
	StartLine int      `json:"start_line,omitempty"`
}

ImportInfo describes an import statement

type MethodCall ยถ

type MethodCall struct {
	Name      string `json:"name"`                 // Method/function name
	Receiver  string `json:"receiver,omitempty"`   // Object the method is called on (e.g., "self", "cls", variable name)
	ClassName string `json:"class_name,omitempty"` // Class name if known
	Line      int    `json:"line,omitempty"`       // Line number of the call
}

MethodCall represents a call to another method/function

type MethodInfo ยถ

type MethodInfo struct {
	Name          string                 `json:"name"`
	Signature     string                 `json:"signature"`
	Description   string                 `json:"description"` // Method docstring
	Parameters    []codetypes.ParamInfo  `json:"parameters"`
	ReturnType    string                 `json:"return_type,omitempty"`
	Returns       []codetypes.ReturnInfo `json:"returns,omitempty"`
	Decorators    []string               `json:"decorators,omitempty"`
	Calls         []MethodCall           `json:"calls,omitempty"`     // Methods/functions this method calls
	TypeDeps      []string               `json:"type_deps,omitempty"` // Types used in parameters/return
	IsStatic      bool                   `json:"is_static"`
	IsClassMethod bool                   `json:"is_classmethod"`
	IsProperty    bool                   `json:"is_property"`
	IsAbstract    bool                   `json:"is_abstract"`
	IsAsync       bool                   `json:"is_async"`
	ClassName     string                 `json:"class_name,omitempty"`
	FilePath      string                 `json:"file_path,omitempty"`
	StartLine     int                    `json:"start_line,omitempty"`
	EndLine       int                    `json:"end_line,omitempty"`
	Code          string                 `json:"code,omitempty"`
}

MethodInfo describes a class method

type ModuleDependencies ยถ

type ModuleDependencies struct {
	ModuleName   string           `json:"module_name"`
	Imports      []ImportInfo     `json:"imports"`
	Dependencies []DependencyInfo `json:"dependencies"`
}

ModuleDependencies contains all dependency information for a module

type ModuleInfo ยถ

type ModuleInfo struct {
	Name        string         `json:"name"`        // Module name (e.g., "mypackage.mymodule")
	Path        string         `json:"path"`        // File path
	Description string         `json:"description"` // Module docstring
	Classes     []ClassInfo    `json:"classes"`
	Functions   []FunctionInfo `json:"functions"`
	Constants   []ConstantInfo `json:"constants"`
	Variables   []VariableInfo `json:"variables"`
	Imports     []ImportInfo   `json:"imports"`
}

ModuleInfo contains comprehensive information about a Python module/package

type PropertyInfo ยถ

type PropertyInfo struct {
	Name        string `json:"name"`
	Type        string `json:"type,omitempty"` // Type hint if available
	Description string `json:"description"`
	HasGetter   bool   `json:"has_getter"`
	HasSetter   bool   `json:"has_setter"`
	HasDeleter  bool   `json:"has_deleter"`
	FilePath    string `json:"file_path,omitempty"`
	StartLine   int    `json:"start_line,omitempty"`
	EndLine     int    `json:"end_line,omitempty"`
}

PropertyInfo describes a class property (using @property decorator)

type VariableInfo ยถ

type VariableInfo struct {
	Name        string `json:"name"`
	Type        string `json:"type,omitempty"` // Type annotation if available
	Value       string `json:"value,omitempty"`
	Description string `json:"description"`
	IsConstant  bool   `json:"is_constant"` // UPPER_CASE naming convention
	FilePath    string `json:"file_path,omitempty"`
	StartLine   int    `json:"start_line,omitempty"`
	EndLine     int    `json:"end_line,omitempty"`
}

VariableInfo describes a module-level or class variable

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL