README
ΒΆ
DSPy-Go Compatibility Testing Framework
This framework provides side-by-side comparison between the Python DSPy package and the Go dspy-go implementation to verify backwards compatibility from an optimizer's perspective.
π Latest Results Dashboard
Test Status: β COMPATIBLE
Last Updated: 2025-07-13 | Dataset Size: 20 | Model: gemini-2.0-flash
Results With Cache (Python DSPy Default)
| Optimizer | Python Score | Go Score | Score Diff | Time (Python) | Time (Go) | Demos (Python) | Demos (Go) | Status |
|---|---|---|---|---|---|---|---|---|
| BootstrapFewShot | 0.60 | 0.60 | 0.00 | 0.04s* | 2.15s | 4 | 4 | β Compatible |
| MIPRO | 0.60 | 0.60 | 0.00 | 0.18s* | 16.19s | 0 | 0 | β Compatible |
| SIMBA | 0.60 | 0.60 | 0.00 | 0.13s* | 38.56s | 0 | 0 | β Compatible |
| COPRO | 0.80 | 0.60 | 0.20 | 0.15s* | 75.04s | 0 | 0 | β οΈ At Threshold |
*Python times with caching enabled
Results Without Cache (Fair Comparison)
| Optimizer | Python Score | Go Score | Score Diff | Time (Python) | Time (Go) | Time Ratio | Status |
|---|---|---|---|---|---|---|---|
| BootstrapFewShot | 0.60 | 0.60 | 0.00 | 2.87s | 2.15s | Go 1.3x faster | β Compatible |
| MIPRO | 0.60 | 0.60 | 0.00 | 33.74s | 16.25s | Go 2.0x faster | β Compatible |
| SIMBA | 0.60 | 0.60 | 0.00 | 23.30s | 7.61s | Go 3.0x faster | β Compatible |
| COPRO | 0.60 | 0.60 | 0.00 | 22.72s | 8.07s | Go 2.8x faster | β Compatible |
𧬠GEPA (Generative Evolutionary Prompt Adaptation) Results
Current GEPA Parity Harness
The historical dashboard below is retained for reference, but current GEPA parity work should use the deterministic fixture harness instead of the older live-Gemini benchmark path.
Run the fixture harness:
./run_gepa_fixture.sh
This harness:
- uses upstream
dspywithDummyLM, not a local DSPy fork - does not require
GEMINI_API_KEY - compares stable GEPA behaviors instead of benchmark scores
- currently covers component-selection parity (
round_robinvsall) - also covers validation-frontier parity for complementary classifier/generator winners
- also covers ancestor-merge parity for deterministic two-parent merge proposals
- also covers feedback-guided rewrite parity for example-level metric feedback
- also covers format-failure-as-feedback parity for parse-error-driven rewrites
- also covers minibatch accept/reject parity for deterministic proposal gating
- also covers early-stop parity for custom stoppers and metric-budget cutoffs
- also covers checkpoint-resume parity for resumed-vs-fresh deterministic winners
Test Status: β COMPATIBLE
Last Updated: 2025-08-12 | Dataset Size: 10 | Model: gemini-2.0-flash
GEPA Compatibility Results (Cache Disabled)
| Implementation | Score | Compilation Time | Status | Notes |
|---|---|---|---|---|
| Python DSPy (Local Fork) | 66.7% | 99.24s | β Working | Historical run against a local DSPy fork |
| Go dspy-go | 66.7% | 82.35s | β Working | Advanced evolutionary algorithm with multi-objective optimization |
GEPA Key Features
- Multi-Objective Optimization: 7-dimensional fitness evaluation with Pareto-based selection
- LLM-Based Evolution: Natural language critique and semantic crossover/mutation
- Adaptive Selection: Dynamic strategy switching between generations
- Elite Archive: Preserves diverse high-quality solutions
- Real-Time Monitoring: Context-aware performance tracking
GEPA Configuration
| Parameter | Python | Go | Notes |
|---|---|---|---|
| Mode/Strategy | auto="light" |
adaptive_pareto |
Different APIs, equivalent functionality |
| Population Size | Auto-configured | 8 | Go allows explicit control |
| Max Generations | Auto-configured | 3 | Go allows explicit control |
| Mutation Rate | Auto-configured | 0.3 | Go exposes evolutionary parameters |
| Crossover Rate | Auto-configured | 0.7 | Go exposes evolutionary parameters |
GEPA Compatibility Summary
- Score Match: β Perfect (Both achieve 66.7% accuracy)
- Performance: β Comparable (Go 18% faster)
- Algorithm: β Consistent (Same evolutionary approach)
- API Design: β οΈ Different (Python simplified, Go full-featured)
GEPA Setup Requirements
For Python GEPA:
- Deterministic parity fixtures use upstream
dspyand install it viauv - No local DSPy fork is required for the new GEPA fixture harness
For Go GEPA:
- Fully integrated in dspy-go package
- No additional setup required
Configuration Summary
All optimizers now use matched configurations between Python and Go implementations:
| Optimizer | Configuration |
|---|---|
| BootstrapFewShot | max_bootstrapped_demos=4, 3/4 dataset split |
| MIPRO | num_trials=5, max_bootstrapped_demos=3, 3/4 dataset split |
| SIMBA | batch_size=4, max_steps=6, num_candidates=4, sampling_temperature=0.2, 3/4 dataset split |
| COPRO | breadth=5, depth=2, init_temperature=1.2, 3/4 dataset split |
| GEPA | Python: auto="light", Go: population_size=8, max_generations=3, 3/4 dataset split |
Compatibility Summary
- Overall Status: β COMPATIBLE
- Score Differences: β ACCEPTABLE (all 0.20, within β€0.2 threshold)
- API Signatures: β MATCH
- Behavior: β CONSISTENT
- Configuration Alignment: β MATCHED
Key Findings
- Go implementation consistently scores 0.20 higher across all optimizers
- All score differences are within acceptable compatibility threshold (β€0.2)
- Performance varies by optimizer but both implementations are functionally equivalent
- Configuration alignment resolved previous discrepancies
Overview
The compatibility testing framework consists of:
- Python DSPy Reference Implementation (
dspy_comparison.py) - Go dspy-go Implementation (
go_comparison.go) - Results Comparison Tool (
compare_results.py) - Automated Experiment Runner (
run_experiment.sh)
Key Features
Optimizer Testing
- BootstrapFewShot: Tests few-shot learning with bootstrapped demonstrations
- MIPRO/MIPROv2: Tests multi-stage instruction prompt optimization with Bayesian optimization
- SIMBA: Tests stochastic introspective mini-batch ascent with temperature-controlled sampling
- COPRO: Tests collaborative prompt optimization with multi-agent refinement
- GEPA: Tests generative evolutionary prompt adaptation with LLM-based genetic operators
Compatibility Verification
- API signature compatibility
- Parameter compatibility
- Behavioral consistency
- Performance comparison
- Results accuracy comparison
Prerequisites
Python Environment
- Python 3.8+
- uv package manager
- Gemini API key
Go Environment
- Go 1.19+
- dspy-go dependencies
Required Environment Variables
export GEMINI_API_KEY=your_api_key_here
The deterministic GEPA fixture harness does not require any environment variables.
Installation
- Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh
- Clone the repository and navigate to the compatibility test directory:
cd compatibility_test
- Ensure Go dependencies are available:
cd ..
go mod tidy
cd compatibility_test
That's it! The Python scripts use uv's inline script dependencies, so no separate virtual environment or requirements.txt is needed.
Usage
Quick Start
Run the complete compatibility experiment:
./run_experiment.sh
Test Specific Optimizers
Test only SIMBA optimizer:
./run_experiment.sh --optimizer simba
Test with custom dataset size:
./run_experiment.sh --optimizer bootstrap --dataset-size 50
Available optimizer options:
bootstrap: BootstrapFewShot onlymipro: MIPRO/MIPROv2 onlysimba: SIMBA onlycopro: COPRO onlygepa: GEPA only (legacy benchmark path; deterministic parity uses./run_gepa_fixture.sh)all: All optimizers (default)
Manual Execution
1. Run Python DSPy Comparison
# Test all optimizers
python dspy_comparison.py
# Test specific optimizer
python dspy_comparison.py --optimizer bootstrap --dataset-size 30
2. Run Go dspy-go Comparison
go build -o go_comparison go_comparison.go
# Test all optimizers
./go_comparison
# Test specific optimizer
./go_comparison --optimizer simba --dataset-size 30
3. Compare Results
python compare_results.py
Test Structure
Dataset
- Simple Q&A pairs (20 examples)
- Split: 15 training, 5 validation
- Questions cover basic facts and calculations
Metrics
- Accuracy: Simple substring matching
- Compilation Time: Time to optimize the program
- Demonstrations: Number of generated examples
Optimizers Tested
BootstrapFewShot
- Python:
dspy.teleprompt.BootstrapFewShot - Go:
optimizers.BootstrapFewShot - Parameters:
max_bootstrapped_demos: 4max_labeled_demos: 4
MIPRO/MIPROv2
- Python:
dspy.teleprompt.MIPROv2 - Go:
optimizers.MIPRO - Parameters:
num_trials: 5max_bootstrapped_demos: 3max_labeled_demos: 3
SIMBA
- Python:
dspy.teleprompt.SIMBA - Go:
optimizers.SIMBA - Parameters:
batch_size: 4max_steps: 6num_candidates: 4sampling_temperature: 0.2
COPRO
- Python:
dspy.teleprompt.COPRO - Go:
optimizers.COPRO - Parameters:
breadth: 5depth: 2init_temperature: 1.2
GEPA
- Python:
dspy.teleprompt.GEPA(requires local fork) - Go:
optimizers.GEPA - Parameters:
- Python:
auto="light" - Go:
population_size=8,max_generations=3,adaptive_paretoselection
- Python:
Output Files
dspy_comparison_results.json
Results from Python DSPy implementation:
{
"dataset_size": 20,
"model": "gpt-3.5-turbo",
"bootstrap_fewshot": {
"optimizer": "BootstrapFewShot",
"average_score": 0.85,
"compilation_time": 12.34,
"demonstrations": [...]
},
"mipro_v2": {
"optimizer": "MIPROv2",
"average_score": 0.92,
"compilation_time": 25.67,
"demonstrations": [...]
},
"simba": {
"optimizer": "SIMBA",
"average_score": 0.88,
"compilation_time": 18.45,
"demonstrations": [...]
},
"gepa": {
"optimizer": "GEPA",
"average_score": 0.67,
"compilation_time": 99.24,
"population_size": 8,
"max_generations": 3,
"demonstrations": [...]
}
}
go_comparison_results.json
Results from Go dspy-go implementation:
{
"dataset_size": 20,
"model": "gpt-3.5-turbo",
"bootstrap_fewshot": {
"optimizer": "BootstrapFewShot",
"average_score": 0.83,
"compilation_time": 11.89,
"demonstrations": [...]
},
"mipro": {
"optimizer": "MIPRO",
"average_score": 0.90,
"compilation_time": 24.12,
"demonstrations": [...]
},
"simba": {
"optimizer": "SIMBA",
"average_score": 0.87,
"compilation_time": 17.89,
"demonstrations": [...]
}
}
compatibility_report.json
Detailed compatibility analysis:
{
"compatibility_summary": {
"bootstrap_fewshot_compatible": true,
"mipro_compatible": true,
"simba_compatible": true,
"score_differences_acceptable": true,
"api_signatures_match": true,
"behavior_consistent": true
},
"recommendations": {
"critical_issues": [],
"improvements": [],
"validation_needed": []
}
}
Compatibility Criteria
β Pass Criteria
- Score difference < 0.1 (10%)
- API signatures match
- Same parameter types and defaults
- Consistent behavior patterns
β οΈ Warning Criteria
- Score difference 0.1-0.2 (10-20%)
- Minor parameter differences
- Performance variations
β Fail Criteria
- Score difference > 0.2 (20%)
- API incompatibilities
- Behavioral inconsistencies
- Missing features
Interpreting Results
Compatibility Report Sections
- Compatibility Summary: Overall compatibility status
- BootstrapFewShot Comparison: Detailed comparison of few-shot optimizer
- MIPRO Comparison: Detailed comparison of MIPRO optimizer
- SIMBA Comparison: Detailed comparison of SIMBA optimizer
- Recommendations: Action items for improvement
Common Issues and Solutions
Score Differences
- Cause: Different random seeds, LLM variations, implementation differences
- Solution: Run multiple trials, use fixed seeds, verify algorithm implementation
Time Differences
- Cause: Language performance, concurrency differences, LLM call patterns
- Solution: Optimize critical paths, implement proper concurrency
Demonstration Count Differences
- Cause: Different filtering criteria, validation logic
- Solution: Align validation functions, verify example generation
Extending the Framework
Adding New Optimizers
- Implement optimizer in both Python and Go
- Add test cases in respective comparison files
- Update results comparison logic
- Add new compatibility criteria
Adding New Metrics
- Implement metric in both languages
- Add to comparison functions
- Update report generation
- Add interpretation guidelines
Adding New Datasets
- Create dataset in both implementations
- Ensure consistent format
- Add dataset-specific metrics
- Update compatibility criteria
Troubleshooting
Common Issues
OpenAI API Key
export OPENAI_API_KEY=your_api_key_here
Python Dependencies
pip install -r requirements.txt
Go Build Issues
go mod tidy
go build -o go_comparison go_comparison.go
Permission Issues
chmod +x run_experiment.sh
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure compatibility tests pass
- Submit a pull request
License
This project is licensed under the same license as the main dspy-go project.
Documentation
ΒΆ
There is no documentation for this package.