Simili Bot
AI-Powered GitHub Issue Intelligence.
Automatically detect duplicate issues, find similar issues with semantic search, and intelligently route issues across repositories.

Features
- Semantic Duplicate Detection — Find related issues using AI-powered embeddings, not just keyword matching.
- Cross-Repository Search — Search for similar issues across your organization.
- Intelligent Routing — Automatically transfer issues to the correct repository based on content.
- Smart Triage — AI-powered labeling and quality assessment.
- Modular Pipeline — Customize workflows with plug-and-play steps.
- Multi-Repo Support — Central configuration with per-repo overrides.
Architecture
Simili uses a "Lego with Blueprints" architecture:
- Lego Blocks: Independent, reusable pipeline steps (Gatekeeper, Similarity, Triage, etc.).
- Blueprints: Pre-defined workflows for common use cases.
- State Branch: Git-based state management using an orphan branch (no comment scanning).
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Gatekeeper │───▶│ Similarity │───▶│ Triage │───▶│ Action │
│ Check │ │ Search │ │ Analysis │ │ Executor │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
Quick Start
Simili-Bot supports both Single-Repository and Organization-wide setups.
Setup Guides
| Guide |
Description |
| Single Repo Setup |
Instructions for setting up Simili-Bot on a standalone repository. |
| Organization Setup |
Best practices for deploying across an organization using Reusable Workflows. |
AI Provider Configuration
Simili supports both Gemini and OpenAI.
- Set at least one key:
GEMINI_API_KEY or OPENAI_API_KEY
- If both keys are set, Simili uses Gemini by default (Gemini takes precedence)
- If only one key is set, Simili uses that provider
Default models:
- LLM:
gemini-2.0-flash-lite (Gemini), gpt-5.2 (OpenAI)
- Embeddings:
text-embedding-004 (Gemini), text-embedding-3-small (OpenAI)
If you override embedding.model, keep embedding.dimensions aligned with the model:
text-embedding-004 -> 768
gemini-embedding-001 -> 3072
text-embedding-3-small -> 1536
text-embedding-3-large -> 3072
Examples
We provide copy-pasteable examples to get you started quickly:
Available Workflows
You can specify a workflow in your simili.yaml or define custom steps.
| Preset |
Description |
issue-triage |
Full pipeline: similarity search, duplicate check, triage analysis, and action execution. |
similarity-only |
Runs similarity search only. Useful for "Find Similar Issues" features without auto-triage. |
index-only |
Indexes issues to the vector database without providing feedback. |
CLI Commands
Simili provides a powerful CLI for local development, testing, and batch operations.
simili index
Bulk index issues from a GitHub repository into the vector database.
simili index --repo owner/repo --workers 5 --limit 100
Flags:
--repo (required): Target repository (owner/name)
--workers: Number of concurrent workers (default: 5)
--since: Start from issue number or timestamp
--limit: Maximum issues to index
--dry-run: Simulate without writing to database
simili process
Process a single issue through the pipeline.
simili process --issue issue.json --workflow issue-triage --dry-run
Flags:
--issue: Path to issue JSON file
--workflow: Workflow preset to run (default: "issue-triage")
--dry-run: Run without side effects
--repo, --org, --number: Override issue fields
simili batch
Process multiple issues from a JSON file in batch mode. All operations run in dry-run mode to prevent GitHub writes.
simili batch --file issues.json --format csv --out-file results.csv --workers 5
Use Cases:
- Test bot logic on historical data without spamming repositories
- Generate reports showing similarity analysis and duplicate detection
- Analyze issues from repositories where you lack write access
- Bulk identify transfer recommendations and quality scores
Flags:
--file (required): Path to JSON file with array of issues
--out-file: Output file path (stdout if not specified)
--format: Output format: json or csv (default: json)
--workers: Number of concurrent workers (default: 1)
--workflow: Workflow preset (default: "issue-triage")
--collection: Override Qdrant collection name
--threshold: Override similarity threshold
--duplicate-threshold: Override duplicate confidence threshold
--top-k: Override max similar issues to show
Input Format:
Create a JSON file with an array of issues:
[
{
"org": "owner",
"repo": "repo-name",
"number": 123,
"title": "Issue title",
"body": "Issue description...",
"state": "open",
"labels": ["bug", "high-priority"],
"author": "username",
"created_at": "2026-02-10T10:00:00Z"
}
]
Output Formats:
- JSON: Full pipeline results with detailed analysis
- CSV: Flattened summary for spreadsheet analysis
Example Workflow:
# 1. Index repository issues
simili index --repo ballerina-platform/ballerina-library --workers 10
# 2. Prepare test issues in batch.json
# 3. Run batch analysis
simili batch --file batch.json --format csv --out-file analysis.csv --workers 5
# 4. Review results
cat analysis.csv
Configuration
Minimal .github/simili.yaml example:
qdrant:
url: "${QDRANT_URL}"
api_key: "${QDRANT_API_KEY}"
collection: "my-issues"
embedding:
provider: "gemini"
api_key: "${GEMINI_API_KEY}"
model: "gemini-embedding-001"
llm:
provider: "gemini"
api_key: "${GEMINI_API_KEY}"
model: "gemini-2.5-flash"
# temperature: 0.3
defaults:
similarity_threshold: 0.65
max_similar_to_show: 5
Notes:
llm.model defaults to gemini-2.5-flash when omitted.
llm.api_key can be omitted if GEMINI_API_KEY is set.
- You can override the model at runtime with
LLM_MODEL.
Development
# Clone the repository
git clone https://github.com/similigh/simili-bot.git
cd simili-bot
# Build
go build ./...
# Run tests
go test ./...
# Lint
go vet ./...
License
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.
Made by the Simili Team