simili-bot

module

v0.1.7 Latest Latest Go to latest Published: Feb 25, 2026 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/similigh/simili-bot

Links

README ¶

Simili Bot

AI-Powered GitHub Issue Intelligence.

Automatically detect duplicate issues, find similar issues with semantic search, and intelligently route issues across repositories.

Features

Semantic Duplicate Detection — Find related issues using AI-powered embeddings, not just keyword matching.
Cross-Repository Search — Search for similar issues across your organization.
Intelligent Routing — Automatically transfer issues to the correct repository based on content.
Smart Triage — AI-powered labeling and quality assessment.
Modular Pipeline — Customize workflows with plug-and-play steps.
Multi-Repo Support — Central configuration with per-repo overrides.

Architecture

Simili uses a "Lego with Blueprints" architecture:

Lego Blocks: Independent, reusable pipeline steps (Gatekeeper, Similarity, Triage, etc.).
Blueprints: Pre-defined workflows for common use cases.
State Branch: Git-based state management using an orphan branch (no comment scanning).

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Gatekeeper  │───▶│  Similarity │───▶│   Triage    │───▶│   Action    │
│   Check     │    │   Search    │    │  Analysis   │    │  Executor   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Quick Start

Simili-Bot supports both Single-Repository and Organization-wide setups.

Setup Guides

Guide	Description
Single Repo Setup	Instructions for setting up Simili-Bot on a standalone repository.
Organization Setup	Best practices for deploying across an organization using Reusable Workflows.

AI Provider Configuration

Simili supports both Gemini and OpenAI.

Set at least one key: GEMINI_API_KEY or OPENAI_API_KEY
If both keys are set, Simili uses Gemini by default (Gemini takes precedence)
If only one key is set, Simili uses that provider

Default models:

LLM: gemini-2.0-flash-lite (Gemini), gpt-5.2 (OpenAI)
Embeddings: text-embedding-004 (Gemini), text-embedding-3-small (OpenAI)

If you override embedding.model, keep embedding.dimensions aligned with the model:

text-embedding-004 -> 768
gemini-embedding-001 -> 3072
text-embedding-3-small -> 1536
text-embedding-3-large -> 3072

Examples

We provide copy-pasteable examples to get you started quickly:

Multi-Repo Examples: Includes shared workflow, caller workflow, and central config.
Single-Repo Examples: Standard workflow and configuration.

Available Workflows

You can specify a workflow in your simili.yaml or define custom steps.

Preset	Description
`issue-triage`	Full pipeline: similarity search, duplicate check, triage analysis, and action execution.
`similarity-only`	Runs similarity search only. Useful for "Find Similar Issues" features without auto-triage.
`index-only`	Indexes issues to the vector database without providing feedback.

CLI Commands

Simili provides a powerful CLI for local development, testing, and batch operations.

`simili index`

Bulk index issues from a GitHub repository into the vector database.

simili index --repo owner/repo --workers 5 --limit 100

Flags:

--repo (required): Target repository (owner/name)
--workers: Number of concurrent workers (default: 5)
--since: Start from issue number or timestamp
--limit: Maximum issues to index
--dry-run: Simulate without writing to database

`simili process`

Process a single issue through the pipeline.

simili process --issue issue.json --workflow issue-triage --dry-run

Flags:

--issue: Path to issue JSON file
--workflow: Workflow preset to run (default: "issue-triage")
--dry-run: Run without side effects
--repo, --org, --number: Override issue fields

`simili batch`

Process multiple issues from a JSON file in batch mode. All operations run in dry-run mode to prevent GitHub writes.

simili batch --file issues.json --format csv --out-file results.csv --workers 5

Use Cases:

Test bot logic on historical data without spamming repositories
Generate reports showing similarity analysis and duplicate detection
Analyze issues from repositories where you lack write access
Bulk identify transfer recommendations and quality scores

Flags:

--file (required): Path to JSON file with array of issues
--out-file: Output file path (stdout if not specified)
--format: Output format: json or csv (default: json)
--workers: Number of concurrent workers (default: 1)
--workflow: Workflow preset (default: "issue-triage")
--collection: Override Qdrant collection name
--threshold: Override similarity threshold
--duplicate-threshold: Override duplicate confidence threshold
--top-k: Override max similar issues to show

Input Format:

Create a JSON file with an array of issues:

[
  {
    "org": "owner",
    "repo": "repo-name",
    "number": 123,
    "title": "Issue title",
    "body": "Issue description...",
    "state": "open",
    "labels": ["bug", "high-priority"],
    "author": "username",
    "created_at": "2026-02-10T10:00:00Z"
  }
]

Output Formats:

JSON: Full pipeline results with detailed analysis
CSV: Flattened summary for spreadsheet analysis

Example Workflow:

# 1. Index repository issues
simili index --repo ballerina-platform/ballerina-library --workers 10

# 2. Prepare test issues in batch.json
# 3. Run batch analysis
simili batch --file batch.json --format csv --out-file analysis.csv --workers 5

# 4. Review results
cat analysis.csv

Configuration

Minimal .github/simili.yaml example:

qdrant:
  url: "${QDRANT_URL}"
  api_key: "${QDRANT_API_KEY}"
  collection: "my-issues"

embedding:
  provider: "gemini"
  api_key: "${GEMINI_API_KEY}"
  model: "gemini-embedding-001"

llm:
  provider: "gemini"
  api_key: "${GEMINI_API_KEY}"
  model: "gemini-2.5-flash"
  # temperature: 0.3

defaults:
  similarity_threshold: 0.65
  max_similar_to_show: 5

Notes:

llm.model defaults to gemini-2.5-flash when omitted.
llm.api_key can be omitted if GEMINI_API_KEY is set.
You can override the model at runtime with LLM_MODEL.

Development

# Clone the repository
git clone https://github.com/similigh/simili-bot.git
cd simili-bot

# Build
go build ./...

# Run tests
go test ./...

# Lint
go vet ./...

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Made by the Simili Team

Directories ¶

Path	Synopsis
cmd
simili command Package main is the entry point for the Simili-Bot CLI.	Package main is the entry point for the Simili-Bot CLI.
simili-web command
simili/commands
internal
core/config Package config handles loading and merging Simili configuration.	Package config handles loading and merging Simili configuration.
core/pipeline Package pipeline provides the core pipeline engine for Simili-Bot.	Package pipeline provides the core pipeline engine for Simili-Bot.
core/state Package state provides a GitHub API-based implementation of GitStateManager.	Package state provides a GitHub API-based implementation of GitStateManager.
integrations/gemini Package gemini provides AI integration for embeddings and LLM.	Package gemini provides AI integration for embeddings and LLM.
integrations/github
integrations/qdrant Package qdrant provides the vector database integration.	Package qdrant provides the vector database integration.
steps Package steps provides the action executor step.	Package steps provides the action executor step.
transfer Package transfer provides the transfer rules engine for cross-repository issue routing.	Package transfer provides the transfer rules engine for cross-repository issue routing.
utils/text

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL