AgentComms

Documentation | Getting Started | MCP Tools
An MCP plugin that enables voice calls and chat messaging for AI coding assistants. Start a task, walk away. Your phone rings when the AI is done, stuck, or needs a decision. Or get notified via Discord, Telegram, or WhatsApp.
Supports: Claude Code, AWS Kiro CLI, Gemini CLI
Built with the plexusone stack - showcasing a complete voice and chat AI architecture in Go.
Features
- π Phone Calls: Real voice calls to your phone via Twilioβworks with smartphones, smartwatches, landlines, or VoIP
- π¬ Chat Messaging: Send messages via Discord, Telegram, or WhatsApp
- π Multi-turn Conversations: Back-and-forth discussions, not just one-way notifications
- β‘ Smart Triggers: Hooks that suggest calling/messaging when you're stuck or done with work
- π Mix and Match: Use voice, chat, or both based on your needs
- π§ Parallel Execution: AI continues working while waiting for your responseβsearching code, running tests, preparing next steps
How It Works
AgentComms provides bidirectional communication between humans and AI agents:
AgentComms
ββββββββββββββββββββββββ
β β
ββββββββββββ β ββββββββββββββ β ββββββββββββ
β AI Agent β βββββΆβ β MCP Server β ββββββ β Human β
β Claude / β β β (OUTBOUND) β β β (Discord β
β Codex β ββββββ ββββββββββββββ ββββββΆ β Phone) β
ββββββββββββ β β ββββββββββββ
β ββββββββββββββ β
β β Daemon β β
β β (INBOUND) β β
β ββββββββββββββ β
β β β
β ββββββ΄βββββ β
β β tmux β β
β β pane β β
β βββββββββββ β
ββββββββββββββββββββββββ
Two communication modes:
| Mode |
Direction |
Use Case |
| OUTBOUND |
Agent β Human |
AI needs input, reports completion, escalates blockers |
| INBOUND |
Human β Agent |
Interrupt agent, send instructions, coordinate multiple agents |
OUTBOUND (MCP Server)
- AI needs input β Calls your phone or sends a chat message
- You respond β Voice is transcribed, chat is read directly
- AI continues β Uses your input to complete the task
INBOUND (Daemon) - Preview
- You send a message β Type in Discord channel or send SMS
- Daemon receives β Routes to the correct agent via tmux
- Agent sees it β Message appears in agent's terminal
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β agentcomms β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β OUTBOUND (MCP Server) - Agent β Human β
β βββ Voice Tools: initiate_call, continue_call, speak_to_user, end_call β
β βββ Chat Tools: send_message, list_channels, get_messages β
β βββ Voice Manager - Orchestrates calls via omnivoice β
β βββ Chat Manager - Routes messages via omnichat β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β INBOUND (Daemon) - Human β Agent β
β βββ Router - Actor-style event dispatcher (goroutine per agent) β
β βββ AgentBridge - Adapters for tmux, process, etc. β
β βββ Event Store - SQLite database via Ent ORM β
β βββ Transports - Discord, Twilio (receives human messages) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Shared Infrastructure β
β βββ omnivoice - Voice abstraction (TTS, STT, Transport, CallSystem) β
β βββ omnichat - Chat abstraction (Discord, Telegram, WhatsApp) β
β βββ mcpkit - MCP server with ngrok integration β
β βββ Ent - Database ORM with SQLite/PostgreSQL support β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Provider Implementations β
β βββ Voice: ElevenLabs, Deepgram, OpenAI, Twilio β
β βββ Chat: Discord, Telegram, WhatsApp β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The plexusone Stack
This project demonstrates the plexusone voice and chat AI stack:
| Package |
Role |
Description |
| omnivoice |
Voice Abstraction |
Batteries-included TTS/STT with registry-based provider lookup |
| omnichat |
Chat Abstraction |
Provider-agnostic chat messaging interface |
| elevenlabs-go |
Voice Provider |
ElevenLabs streaming TTS and STT |
| omnivoice-deepgram |
Voice Provider |
Deepgram streaming TTS and STT |
| omnivoice-openai |
Voice Provider |
OpenAI TTS and STT |
| omnivoice-twilio |
Phone Provider |
Twilio transport and call system |
| mcpkit |
Server |
MCP server runtime with ngrok and multiple transport modes |
Installation
Prerequisites
- Go 1.25+
- For voice: Twilio account + ngrok account
- For chat: Discord/Telegram bot token (optional)
Build
cd /path/to/agentcomms
go mod tidy
go build -o agentcomms ./cmd/agentcomms
Configuration
AgentComms uses a unified JSON configuration file that combines all settings.
Quick Setup
# Generate configuration file
./agentcomms config init
# Or generate minimal config (chat only, no voice)
./agentcomms config init --minimal
# Set environment variables for secrets
export DISCORD_TOKEN=your_discord_bot_token
export TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export TWILIO_AUTH_TOKEN=your_auth_token
export ELEVENLABS_API_KEY=your_elevenlabs_key
export DEEPGRAM_API_KEY=your_deepgram_key
export NGROK_AUTHTOKEN=your_ngrok_authtoken
# Validate configuration
./agentcomms config validate
Configuration File
The config file at ~/.agentcomms/config.json supports environment variable substitution:
{
"version": "1",
"server": { "port": 3333 },
"agents": [
{ "id": "claude", "type": "tmux", "tmux_session": "claude-code" }
],
"voice": {
"phone": {
"account_sid": "${TWILIO_ACCOUNT_SID}",
"auth_token": "${TWILIO_AUTH_TOKEN}",
"number": "+15551234567",
"user_number": "+15559876543"
},
"tts": { "provider": "elevenlabs", "api_key": "${ELEVENLABS_API_KEY}" },
"stt": { "provider": "deepgram", "api_key": "${DEEPGRAM_API_KEY}" },
"ngrok": { "auth_token": "${NGROK_AUTHTOKEN}" }
},
"chat": {
"discord": { "enabled": true, "token": "${DISCORD_TOKEN}" },
"channels": [
{ "channel_id": "discord:YOUR_CHANNEL_ID", "agent_id": "claude" }
]
}
}
See Configuration Guide for full documentation.
Usage
Commands
AgentComms provides two main commands:
# Run MCP server (OUTBOUND - spawned by AI assistant)
./agentcomms serve
# Run daemon (INBOUND - background service for human messages)
./agentcomms daemon
Running ./agentcomms without a subcommand defaults to serve for backwards compatibility.
Running the MCP Server (OUTBOUND)
./agentcomms serve
Output:
Starting agentcomms MCP server...
Using plexusone stack:
- omnivoice (voice abstraction)
- omnichat (chat abstraction)
- mcpkit (MCP server)
Voice providers: tts=elevenlabs stt=deepgram
Chat providers: [discord telegram]
MCP server ready
Local: http://localhost:3333/mcp
Public: https://abc123.ngrok.io/mcp
Running the Daemon (INBOUND) - Preview
The daemon enables human-to-agent communication. It runs as a background service and routes messages from Discord/Twilio to agents running in tmux.
./agentcomms daemon
Output:
INFO starting daemon data_dir=/Users/you/.agentcomms socket=/Users/you/.agentcomms/daemon.sock
INFO database initialized path=/Users/you/.agentcomms/data.db
INFO router initialized
INFO daemon started
Data storage: ~/.agentcomms/
config.json - Unified configuration file
data.db - SQLite database (events, agents)
daemon.sock - Unix socket for CLI/API
Daemon CLI Commands
Once the daemon is running, use these CLI commands to interact with it:
# Check daemon status
./agentcomms status
# List configured agents
./agentcomms agents
# Send a message to an agent (appears in tmux pane)
./agentcomms send <agent-id> "Your message here"
# Send an interrupt (Ctrl-C) to an agent
./agentcomms interrupt <agent-id>
# View recent events for an agent
./agentcomms events <agent-id> --limit 20
# Send a reply to a chat channel (outbound from agent)
./agentcomms reply discord:123456789 "Task completed!"
# List configured chat channels
./agentcomms channels
# Validate configuration
./agentcomms config validate
# Show current configuration
./agentcomms config show
Daemon Configuration
Generate and edit the configuration:
# Generate config file
./agentcomms config init
# Edit ~/.agentcomms/config.json with your settings
# Validate configuration
./agentcomms config validate
See the Configuration Guide for full details.
agentcomms supports multiple AI coding assistants. Generate configuration files for your preferred tool:
# Generate for a specific tool
go run ./cmd/generate-plugin claude . # Claude Code
go run ./cmd/generate-plugin kiro . # AWS Kiro CLI
go run ./cmd/generate-plugin gemini . # Gemini CLI
# Generate for all tools
go run ./cmd/generate-plugin all ./plugins
Claude Code Integration
Option 1: Use generated plugin files
go run ./cmd/generate-plugin claude .
This creates:
.claude-plugin/plugin.json - Plugin manifest
skills/phone-input/SKILL.md - Voice calling skill
skills/chat-messaging/SKILL.md - Chat messaging skill
commands/call.md - /call slash command
commands/message.md - /message slash command
.claude/settings.json - Lifecycle hooks
Option 2: Manual MCP configuration
Add to ~/.claude/settings.json or .claude/settings.json:
{
"mcpServers": {
"agentcomms": {
"command": "/path/to/agentcomms",
"env": {
"TWILIO_ACCOUNT_SID": "ACxxx",
"TWILIO_AUTH_TOKEN": "xxx",
"NGROK_AUTHTOKEN": "xxx",
"DISCORD_TOKEN": "xxx",
"ELEVENLABS_API_KEY": "xxx",
"DEEPGRAM_API_KEY": "xxx",
"AGENTCOMMS_AGENT_ID": "claude"
}
}
}
}
initiate_call
Start a new call to the user.
{
"message": "Hey! I finished implementing the feature. Want me to walk you through it?"
}
Returns:
{
"call_id": "call-1-1234567890",
"response": "Sure, go ahead and explain what you built."
}
continue_call
Continue an active call with another message.
{
"call_id": "call-1-1234567890",
"message": "I added authentication using JWT. Should I also add refresh tokens?"
}
speak_to_user
Speak without waiting for a response (useful for status updates).
{
"call_id": "call-1-1234567890",
"message": "Let me search for that in the codebase. Give me a moment..."
}
end_call
End the call with an optional goodbye message.
{
"call_id": "call-1-1234567890",
"message": "Perfect! I'll get started on that. Talk soon!"
}
send_message
Send a message to a chat channel.
{
"provider": "discord",
"chat_id": "123456789",
"message": "I've finished the PR! Here's the link: https://github.com/..."
}
list_channels
List available chat channels and their status.
{}
Returns:
{
"channels": [
{"provider_name": "discord", "status": "connected"},
{"provider_name": "telegram", "status": "connected"}
]
}
get_messages
Get recent messages from a chat conversation.
{
"provider": "telegram",
"chat_id": "987654321",
"limit": 5
}
These tools allow Claude Code to poll for messages sent by humans via the daemon.
check_messages
Check for new messages sent to this agent from humans via chat.
{
"agent_id": "claude",
"limit": 10
}
Returns:
{
"messages": [
{
"id": "evt_01ABC123",
"channel_id": "discord:123456789",
"provider": "discord",
"text": "Hey, can you also add unit tests?",
"timestamp": "2024-01-15T10:30:00Z",
"type": "human_message"
}
],
"agent_id": "claude",
"has_more": false
}
get_agent_events
Get all recent events for an agent (messages, interrupts, status changes).
{
"agent_id": "claude",
"since_id": "evt_01ABC123",
"limit": 20
}
daemon_status
Check if the agentcomms daemon is running.
{}
Returns:
{
"running": true,
"started_at": "2024-01-15T09:00:00Z",
"agents": 1,
"providers": ["discord", "telegram"]
}
These tools enable agent-to-agent communication for task delegation and coordination.
list_agents
List all available agents and their status.
{
"include_offline": false
}
Returns:
{
"agents": [
{"id": "backend", "type": "tmux", "status": "online", "target": "tmux:dev:0"},
{"id": "frontend", "type": "tmux", "status": "online", "target": "tmux:dev:1"}
]
}
send_agent_message
Send a message to another agent.
{
"to_agent_id": "backend",
"message": "Can you help me with the API implementation?"
}
Messages arrive at the destination agent with source prefix:
[from: frontend] Can you help me with the API implementation?
Use Cases
Phone calls are ideal for:
- Reporting significant task completion
- Requesting urgent clarification when blocked
- Discussing complex decisions
- Walking through code changes
- Multi-step processes needing back-and-forth
Chat messaging is ideal for:
- Asynchronous status updates
- Sharing links, code, or formatted content
- Non-urgent notifications
- Follow-up summaries
Development
Project Structure
agentcomms/
βββ cmd/
β βββ agentcomms/
β βββ main.go # CLI entry point (serve, daemon)
β βββ commands.go # CLI commands (send, interrupt, reply, etc.)
βββ internal/ # INBOUND infrastructure
β βββ daemon/
β β βββ daemon.go # Background daemon service
β β βββ server.go # Unix socket server
β β βββ client.go # Client library for IPC
β β βββ protocol.go # JSON-RPC style protocol
β β βββ config.go # Daemon configuration (YAML)
β βββ router/
β β βββ router.go # Event dispatcher
β β βββ actor.go # Per-agent actor (goroutine)
β βββ bridge/
β β βββ adapter.go # Agent adapter interface
β β βββ tmux.go # tmux adapter
β βββ transport/
β β βββ chat.go # Chat transport (omnichat)
β βββ events/
β βββ id.go # Event ID generation
βββ ent/ # Database schema (Ent ORM)
β βββ schema/
β βββ event.go # Event entity
β βββ agent.go # Agent entity
βββ pkg/ # OUTBOUND infrastructure
β βββ voice/
β β βββ manager.go # Voice call orchestration
β βββ chat/
β β βββ manager.go # Chat message routing
β βββ config/
β β βββ config.go # Legacy configuration
β β βββ unified.go # Unified JSON configuration
β βββ tools/
β βββ tools.go # MCP tool definitions
βββ examples/
β βββ config.json # Example JSON configuration
βββ docs/
β βββ design/ # Architecture documentation
β βββ FEAT_INBOUND_PRD.md
β βββ FEAT_INBOUND_TRD.md
β βββ FEAT_INBOUND_PLAN.md
βββ go.mod
βββ README.md
Dependencies
github.com/plexusone/omnivoice - Batteries-included voice abstraction
github.com/plexusone/omnichat - Chat messaging abstraction
github.com/plexusone/omnivoice-twilio - Twilio transport and call system
github.com/plexusone/mcpkit - MCP server runtime
github.com/modelcontextprotocol/go-sdk - MCP protocol SDK
entgo.io/ent - Entity framework for Go (database ORM)
modernc.org/sqlite - Pure Go SQLite driver
Cost Estimate
| Service |
Cost |
| Twilio outbound calls |
~$0.014/min |
| Twilio phone number |
~$1.15/month |
| ElevenLabs TTS |
$0.30/1K chars ($0.03/min of speech) |
| ElevenLabs STT |
~$0.10/min (Scribe) |
| Deepgram TTS |
~$0.015/1K chars |
| Deepgram STT |
~$0.0043/min (Nova-2) |
| OpenAI TTS |
~$0.015/1K chars |
| OpenAI STT |
~$0.006/min (Whisper) |
| Discord/Telegram |
Free |
| ngrok (free tier) |
$0 |
Provider Recommendations:
| Priority |
TTS Provider |
STT Provider |
Total Cost/min |
Notes |
| Lowest Cost |
Deepgram |
Deepgram |
~$0.03 |
Best value, good quality |
| Best Quality |
ElevenLabs |
Deepgram |
~$0.05 |
Premium voices, fast transcription |
| Balanced |
OpenAI |
OpenAI |
~$0.04 |
Single API key, consistent quality |
Costs are approximate and exclude Twilio phone charges (~$0.014/min).
License
MIT
Credits
Inspired by ZeframLou/call-me (TypeScript).
Built with the plexusone stack: