README
¶
Codehalter
An ACP-compatible AI coding agent that connects Zed to any OpenAI-compatible LLM server. Container-first and local-LLM-first: built around running inside a Debian- or Arch-based devcontainer against llama.cpp (or Ollama / vLLM) on the host, with the project's .git directory bind-mounted read-only so the agent can read history but can never commit, push, or rewrite refs. Targets small-to-medium repositories.
Features
- Devcontainer scaffolding (Debian or Arch) — the first session prompts
Debian / Arch / Abortand writes.devcontainer/Dockerfile+devcontainer.jsonfor the chosen base. Both images install codehalter and gopls into/usr/local/bin; the Arch variant additionally buildsyayfrom AUR. The seededdevcontainer.jsonbind-mounts the host's.gitconfig(so the agent's git reads see your identity) and the project's.gitdirectory read-only — the agent can rungit log/git diff/git statusbut cannot mutate history. - Commit-from-the-host workflow — when the user asks for a commit or push, the agent does NOT run it. A background updater on
llm[2+]keeps.codehalter/.git_commitin sync withgit diff HEADafter every turn; on a commit request the agent regenerates it if stale and tells you the exactgit commit -F .codehalter/.git_commit && git pushcommand to run on the host. After the next prompt, codehalter detects the clean tree and deletes.git_commitso the next round starts fresh. - Bring-your-own MCP —
.codehalter/mcp.tomldeclares each[[server]]as a child process to spawn (name,command,args,env); tools it advertises are registered as<name>__<tool>. Projects with ago.modget a pre-seeded gopls entry (gopls mcpexposesgo_symbols,go_references,go_definition,go_hover, …). Reconciled on every prompt — add or remove a[[server]]between turns and the change applies without restarting; failed startups surface as a red tool-call card and mtime-gate so they don't re-emit until you edit the file. - Parallel LLM slots for local servers — three
[[llm]]entries is the sweet spot:llm[0]owns the foreground KV cache, the per-turn structured summariser pins tollm[1], and the.git_commitdrafter prefersllm[2+].launch_subagentfans out breadth-first across all slots with a per-conn semaphore. Each slot'sparallel = Ncaps concurrent requests to match its server's capacity. - Plan → Execute → Verify → Document pipeline — every prompt is routed through a planning pass, an execution pass, and a self-verification pass. Failed verifications re-plan with the failure context; up to 10 attempts before giving up, with fuzzy duplicate-failure detection (Jaccard similarity over issue words) that bails early when the model keeps rephrasing the same problem. Turns that wrote files end with a documentation pass that updates (or creates) the README when the change is user-visible.
- Big-task decomposition — when the planner splits a request into multiple subtasks, the user picks once: Interactive (approve each subtask), Automatic (run the whole batch under autopilot), or Cancel. Each subtask gets its own full plan → execute → verify cycle.
- Subagents —
launch_subagentspawns parallel sub-tasks for independent work; each runs its own tool loop, up to 2 levels deep. The first subagent in a batch pins tollm[0]and inherits the parent's full history (prefix cache already warm); the rest start fresh onllm[1+]with just theirinstructions+context. - Agentic tool loop — the LLM can read, edit, search, and run tasks iteratively with file edits surfaced as diffs.
- Built-in tools —
read_file,write_file,edit_file,list_files,search_text(literal or regex),run_task,run_command(in-container only),view_output(paging spilled large outputs),ask_user,web_search,web_read,web_read_raw,launch_subagent,respond. Plus any MCP-server tool, registered as<server-name>__<tool>. - Synthetic
respondterminal tool — the execute and subagent phases expose arespond(message)tool that captures the model's final user-facing reply. The turn ends only whenrespondis called, so small local models stay inside the tool-calling grammar they handle best instead of having to decide "tool call vs free text" at exit time. Inspired by forge'srespond_tool. - Stuck-tool-loop detection — a model that re-issues the same call gets nudged once, then the next attempt escalates to a thinking-sampler retry instead of spinning indefinitely.
- Stack-aware skills — auto-detected stacks (Go, TypeScript, JavaScript, Java, Bash), runner configs (Makefile →
SKILL-makefile.md, justfile →SKILL-justfile.md), and container distro (Alpine/Arch/Debian/Fedora/Ubuntu via/etc/os-release) seedSKILL-*.mdfiles in.codehalter/.SKILL-container.mdis always seeded (codehalter assumes containerised execution:.gitis bind-mounted read-only, sogit push/reset --hard/ history rewrites fail at the FS layer). All SKILL files are concatenated into the system prompt on the first turn so they ride along in cached history. Drop your ownSKILL-*.mdto add conventions for any language. - Task runner integration — auto-discovers targets from
justfile,Makefile,package.json,go.mod, orCargo.tomland classifies them as build/test/lint/format for the startup report. - Empty-project bootstrap — fresh directories are flagged; the LLM asks on the first turn which language/runner to scaffold before writing anything.
- Web tools —
web_search(DuckDuckGo) returns a list of result titles/URLs/snippets for the LLM to triage;web_readthen opens a chosen URL in headless Firefox and returns a summary (orweb_read_rawfor verbatim text). Restricted to the planning phase so execution stays offline. - Image support — when the active LLM advertises vision, prompt images are passed through as OpenAI-style content blocks.
- Session persistence — conversations are saved as TOML files under
.codehalter/and can be resumed across editor restarts. - History compression — older messages are summarised to stay within token budgets. After every turn a background goroutine condenses the user/assistant pair into a six-section structured note (Goal / Constraints / Progress / Decisions / Next Steps / Critical Context) on a free
llm[1+]slot; when compaction triggers, the accumulated shadow buffer is installed as the new summary in lieu of a synchronous compress-and-replace pass. The compaction trigger scales to the LLM's discoveredn_ctxrather than a hard-coded buffer. Requires ≥2[[llm]]entries to actually run in parallel — with only one entry configured, the feature self-disables and falls back to the synchronous path. - Two modes — Interactive (ask before non-trivial actions) and Autopilot (auto-answer prompts, no interruption). Selectable per-session from the Zed mode picker.
- Configurable LLM endpoints — different roles (
thinking,execute) can point at different models or servers.
Prerequisites
- Go 1.26+
- An OpenAI-compatible LLM server (e.g. llama.cpp, Ollama, vLLM)
- Firefox, if you want
web_search/web_readto work
Build
just build
Or directly:
go build -o codehalter .
Configuration
Codehalter looks for settings in two places, in this order:
~/.config/codehalter/settings.toml(global — preferred, used across every project)<project>/.codehalter/settings.toml(project-local fallback)
On first run with neither file present, codehalter prompts to write a commented skeleton into <project>/.codehalter/settings.toml. Edit it, then move it to ~/.config/codehalter/ to share across projects (the project-local copy can be deleted once the global one exists).
Example settings
One [[llm]] array. Order matters: llm[0] is the main connection, llm[1+] are extras.
llm[0]— owns the foreground session's KV cache, so its prefix stays warm across turns. Every plan/execute/verify/document call on the main session lands here. Typically a smaller/faster model.llm[1+]— used to fan outlaunch_subagenttasks in parallel and to host background work routed offllm[0]so its prefix cache isn't evicted. The per-turn structured summariser pins tollm[1]; the.git_commitdrafter prefersllm[2+]and falls back tollm[1]only after the summariser finishes.
parallel = N per entry caps how many concurrent LLM calls that entry accepts. The token is held for the duration of one LLM round-trip only, not the lifetime of a subagent — so between LLM calls the slot is free for another caller, and pool size 1 simply serialises everything (no deadlock even when a subagent nests another). For one launch_subagent batch, codehalter fans out using a breadth-first interleave of the configured caps: the first subagent pins to llm[0] (it inherits the parent's full history so the prefix cache stays warm), the next N to llm[1], then llm[2], …; excess tasks queue.
params is forwarded verbatim as the OpenAI request's extra body — put samplers and any model-specific knobs (enable_thinking, reasoning_mode, …) there. Core fields (model, messages, stream, tools) always win over params.
[[llm]]
url = "http://localhost:8080/v1/chat/completions"
model = "qwen3.6-27b"
parallel = 1
params_thinking = { temperature = 1.0, top_p = 0.95, top_k = 20, min_p = 0.0 }
params_execute = { temperature = 0.6, top_p = 0.95, top_k = 20, min_p = 0.0 }
[[llm]]
url = "http://other-host:9001/v1/chat/completions"
model = "qwen3.5-122b"
parallel = 3
params_thinking = { temperature = 1.0, top_p = 0.95, top_k = 20, min_p = 0.0 }
params_execute = { temperature = 0.6, top_p = 0.8, top_k = 20, min_p = 0.0 }
With the caps above (1 + 3) one launch_subagent batch can run up to 4 tasks at once; task 0 pins to llm[0], tasks 1..3 to llm[1], and a 5th task queues on llm[0] until task 0 finishes.
Recommendation: configure at least two
[[llm]]entries — three is ideal. Each entry should point at a different physical slot (different llama-server process, different port). With two entries the per-turn structured summariser runs onllm[1]while the user reads the reply; with three the.git_commitdrafter onllm[2+]runs in parallel with the summariser instead of queuing behind it. With only one entry every background call would evictllm[0]'s prefix cache, so the background features self-disable and fall back to synchronous paths.
| Role | Purpose | Suggested temperature |
|---|---|---|
thinking |
Planning, document, history compaction | ~1.0 (diverse hypotheses, edge-case exploration) |
execute |
Tool loop, verify, web-page summarisation | ~0.6 (precise, faithful) |
Prompt files
On first run, ensureDefaults drops three phase templates into .codehalter/:
| File | Role |
|---|---|
PLAN.md |
Planning-phase instructions (clarity check, info retrieval, subtask JSON schema with per-subtask verify recipe) |
EXECUTE.md |
Execution-phase directives prepended to the user message; instructs the executor to run the verify recipe itself before calling respond |
DOCUMENT.md |
Decides when the change is user-visible enough to update the README, then edits it minimally |
Plus per-stack SKILL-{lang}.md files for any language detected in the project root (go.mod → SKILL-go.md, package.json+tsconfig.json → SKILL-ts.md, plain package.json → SKILL-js.md, pom.xml/build.gradle → SKILL-java.md, *.sh/*.bash files → SKILL-bash.md). Skills are concatenated into the system prompt on the first turn so they ride along in cached history. Designed for smaller local models (Qwen3, Gemma) that need explicit language conventions; larger models can usually have them deleted.
Edit any of them to customize behavior for your project.
Running the LLM server
With llama.cpp:
llama-server -m your-model.gguf --port 8080
With Ollama:
ollama serve
Then adjust the url in your settings to match.
Zed setup
Add to your Zed settings (~/.config/zed/settings.json):
{
"agent_servers": {
"Codehalter": {
"type": "custom",
"command": "/absolute/path/to/codehalter",
"args": [],
"env": {}
}
}
}
Open the agent panel (Cmd+? / Ctrl+?), click +, and select "Codehalter". Pick Interactive or Autopilot from the mode selector in the session header.
Sandboxing with a devcontainer
Codehalter edits files and runs build/test commands. Running it inside a devcontainer sandboxes those actions to a throwaway environment, keeps the project's toolchain pinned, and means the same setup works on every machine. Zed launches ACP servers inside the container when you "Reopen in Container," so codehalter, Firefox, and the LLM endpoint must be reachable from there. The startup banner reports Container: devcontainer when this is wired up correctly.
Codehalter scaffolds the devcontainer for you. On the first session in a project without a .devcontainer/ directory, it prompts Debian / Arch / Abort and writes the chosen Dockerfile + devcontainer.json:
- Debian (
golang:1-trixie) —ca-certificates curl git sudo just make nodejs npm firefox-esr+go installof codehalter and gopls into/usr/local/bin. - Arch (
quay.io/archlinux/archlinux:base-devel) — pacman-installsgit curl just nodejs npm firefox go, same codehalter+gopls install path, plus amakepkg-builtyayfor AUR packages.
Both seed a devcontainer.json that:
- Bind-mounts
${localWorkspaceFolder}/.gitread-only. The agent cangit log,git diff,git status— but it cannotgit commit,git push,git add, or rewrite refs from inside the container. Commits and pushes are the human's job, run on the host. - Bind-mounts
${localEnv:HOME}/.gitconfigread-only into the dev user's home, so the in-containergitsees your realuser.name/user.email/ signing key without any per-project setup. - Bind-mounts
~/.config/codehalterread-only sosettings.tomlis shared across every project. - Sets
containerEnv.DEVCONTAINER=trueso codehalter's startup banner showsContainer: devcontainerand therun_commandtool is registered.
After the container is built, point Zed at the in-container codehalter binary:
"Codehalter": { "type": "custom", "command": "/usr/local/bin/codehalter", "args": [], "env": {} }
If your LLM server runs on the host, change the URL in ~/.config/codehalter/settings.toml to http://host.docker.internal:8080/v1/chat/completions. For an existing language-stack devcontainer (e.g. mcr.microsoft.com/devcontainers/go:1), copy the four mounts above into its devcontainer.json and go install github.com/tbocek/codehalter@latest in its image — no need to start from the Debian/Arch base.
Commits and pushes from inside the container
Because .git is read-only, the agent can never commit or push. When you ask for one, this is what happens:
-
A background updater (running on
llm[2+]after every turn) has already written a draft message to.codehalter/.git_committhat matchesgit diff HEAD. The agent regenerates it if missing or visibly stale. -
The agent's
respond(...)tells you the exact host-side command, typically:git commit -F .codehalter/.git_commit && git push -
You run it outside the container. On your next prompt, codehalter sees the working tree is clean and deletes
.git_commitso the next round starts fresh.
.codehalter/ is gitignored on first bootstrap, so the draft file never accidentally gets staged.
How it works
- Zed spawns
codehalteras a subprocess and communicates via JSON-RPC 2.0 over stdio. - On session start, the agent indexes project files, probes image support on the configured LLM, discovers task runners, and reports which of build/test/lint/format are covered.
- Each prompt runs through the pipeline:
- Plan — the
thinkingLLM analyzes the request with read-only tools, gathers external info via web tools if needed, and emits an array ofsubtasks, each with its ownverifyrecipe. In Interactive mode the user confirms before execution (Execute / Automatic / Cancel). - Work — for each subtask the
executeLLM runs a single bounded tool-calling loop (≤10 turns) where it edits, runs commands, and then runs the verify recipe itself before callingrespond. File edits are shown as diffs and require user approval in Interactive mode. - Replan on failure — if a subtask hits the turn cap, the executor returns without calling
respond, or a tool exits non-zero, the orchestrator records the failure reason and re-plans (up to 10 replans per prompt). When the same failure recurs (Jaccard over issue words ≥ 0.6), the replan note escalates to "the prior fix didn't work; propose a structurally different approach." - Document — fires once at the end of a successful prompt: the
thinkingLLM checks againstDOCUMENT.mdwhether the change is user-visible (new feature/flag/API/dep, install or config change) and, if so, updates or creates the project README. Routine refactors and bug fixes are skipped.
- Plan — the
- Prior subtasks' assistant replies stay in history so later subtasks have context.
- Conversation history is persisted to
.codehalter/session_<id>.toml. After every turn a background goroutine on a freellm[1+]slot condenses the user/assistant pair into a six-section structured note that accumulates in a shadow buffer; when message tokens exceed the model's context budget (minus overhead and safety margin), the shadow buffer is installed as the new summary and older messages rotate into an archive file. Falls back to a synchronous summarise pass when only one[[llm]]entry is configured.
Compared to forge and pi/coding-agent
Codehalter borrows two building blocks: the synthetic respond terminal tool from forge, and the explicit per-tool Terminal flag from pi/coding-agent (uniform exit contract instead of name-based exit checks). Beyond those, codehalter adds:
- Container-first sandboxing posture — auto-scaffolded Debian or Arch
.devcontainer/, host.gitconfigand project.gitbind-mounted read-only, plus a.codehalter/.git_commitdrafter so commits/pushes are always the human's job (forge and pi don't ship a devcontainer story or a git-write firewall). - Native MCP client —
.codehalter/mcp.tomlspawns each[[server]]as a long-lived child, registers its tools under<name>__<tool>, and reconciles on every prompt; gopls is pre-seeded for Go projects. - Multi-slot parallel LLM architecture for local servers —
[[llm]]array with per-connparallel = Nsemaphores, breadth-first subagent pinning, and background work (per-turn summariser onllm[1], git-commit drafter onllm[2+]) routed off the foreground KV-cache slot so the main session's prefix cache stays warm. - Plan → per-subtask self-verifying loop → Document pipeline built in, not delegated to an extension — the planner decomposes into subtasks each carrying its own verify recipe; the executor runs the recipe itself before declaring done; failed subtasks trigger a replan (≤10 per prompt) with fuzzy duplicate-failure detection (Jaccard over issue words) escalating the note when the same problem recurs; the document pass updates the README only when the change is user-visible.
- Big-task decomposition with a once-per-plan Interactive / Automatic / Cancel choice; the same gate re-fires on each replan so the user sees the new approach before it runs.
- Web tools restricted to the planning phase (
web_searchvia DuckDuckGo +web_read[_raw]via headless Firefox) so execution stays offline. - ACP / Zed-native integration with file-edit diffs, permission prompts, and per-session mode switching from Zed's UI.
- Empty-project bootstrap, stuck-tool-loop detection, dynamic
n_ctx-aware compaction trigger, bench harness with isolated devcontainer per test.
What codehalter does NOT have that those projects emphasise: forge's published 26-scenario eval suite with ablation studies (codehalter has a bench harness but no published eval set), pi's JSONL session trees with in-place branching, pi's 30+ provider adapters and TypeScript-based extensions, pi's @-mention fuzzy file references.
Credits
- forge by Antoine Zambelli — the synthetic
respondterminal tool used in execute/subagent is borrowed from forge's playbook for keeping small local models inside structured tool-calling grammar. - pi/coding-agent by earendil-works — the explicit per-tool
Terminalflag (uniform tool-loop exit contract instead of name-based exit checks) is inspired by pi'sterminate: truetool-result signal. - Code harness idea exchanging with Clemens
Documentation
¶
There is no documentation for this package.