wasm

command

v1.58.0 Latest Latest Go to latest Published: May 11, 2026 License: Apache-2.0 Imports: 24 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/docker/docker-agent

Links

Open Source Insights

README ¶

docker-agent in the browser (js/wasm)

cmd/wasm is a GOOS=js GOARCH=wasm entry point that exposes a thin slice of docker-agent — config parsing and a single-round streaming chat — to a JavaScript host (a browser tab or Node).

It is not a port of the full agent. It is a proof-of-concept for the "realistic plan" outlined when we surveyed which parts of docker-agent could even be cross-compiled to wasm. See the Limits section below.

Build

# Compile.
GOOS=js GOARCH=wasm go build -o cmd/wasm/web/docker-agent.wasm ./cmd/wasm

# Copy the matching wasm_exec.js shim from the Go toolchain (its API is
# tied to the compiler version and must match the binary).
cp "$(go env GOROOT)/lib/wasm/wasm_exec.js" cmd/wasm/web/wasm_exec.js

The output is a ~75 MB .wasm. It includes the entire YAML parser, three LLM provider clients (OpenAI, Anthropic, Google) and their dependencies. With tinygo or -ldflags="-s -w" plus wasm-opt you can roughly halve it; we have not optimised the size.

Run (Node smoke test)

node cmd/wasm/smoke_test.js

Prints the parsed config of a small two-agent YAML and exits 0. This proves that the Go runtime starts under wasm, that globalThis.dockerAgent gets registered, and that parseConfig returns the expected shape.

The demo page implements OpenRouter's PKCE OAuth flow:

Serve cmd/wasm/web/ over HTTP (WebAssembly.instantiateStreaming needs the application/wasm MIME type, so file:// won't work):
```
cd cmd/wasm/web && python3 -m http.server 8765
```
Open http://localhost:8765/.
Click Sign in with OpenRouter — you'll be redirected to openrouter.ai, log in, approve the app, and bounced back. The page exchanges the ?code= for a user-controlled API key (PKCE / S256) and stores it in localStorage.
Pick a free model from the dropdown (the YAML textarea updates automatically) and click Run. Streaming completion deltas appear in the output box.
To revoke the key: click Sign out in the page (clears local copy) and/or visit https://openrouter.ai/settings/keys to revoke server-side.

Why this works in a browser

Most LLM providers block direct browser fetches because anyone could read the API key out of the Network tab. OpenRouter solves this by issuing per-app, per-user keys via PKCE — the user owns the key, can revoke it, and the app never sees a master credential.

We verified the relevant CORS posture before shipping:

$ curl -is -X OPTIONS https://openrouter.ai/api/v1/auth/keys \
    -H "Origin: http://localhost:8765" \
    -H "Access-Control-Request-Method: POST"
HTTP/2 204
access-control-allow-origin: *
access-control-allow-headers: Authorization,...,Content-Type,...

Both /api/v1/auth/keys (token exchange) and /api/v1/chat/completions (inference) return access-control-allow-origin: * with Authorization in the allowed headers. No proxy needed.

What the YAML looks like

OpenRouter exposes an OpenAI-compatible API, so it slots in as a custom provider:

providers:
  openrouter:
    provider: openai
    base_url: https://openrouter.ai/api/v1
    token_key: OPENROUTER_API_KEY
agents:
  root:
    model: openrouter/meta-llama/llama-3.3-70b-instruct:free
    instruction: |
      You are a helpful assistant ...

When the user clicks Run, the page passes the stored key as env.OPENROUTER_API_KEY to dockerAgent.chat(...), the existing pkg/model/provider/openai reads it via env.Get(ctx, cfg.TokenKey), and the Go HTTP transport (mapped to fetch under js/wasm) sends the request. Same code path the CLI uses — no special browser-only branch.

Bring-your-own-key fallback

The <details> block on the page lets advanced users paste OpenAI / Anthropic / Gemini keys directly. Anthropic recently added anthropic-dangerous-direct-browser-access so it actually works from a tab; OpenAI and Gemini still block CORS for browser origins, so those fields are mostly there for use against a self-hosted proxy.

JavaScript API

Once the wasm boots, two functions are exported on globalThis.dockerAgent:

`parseConfig(yamlString) -> object`

Synchronous. Loads the YAML through pkg/config.Load (so all the version upgraders run) and returns a small JS-shaped projection:

dockerAgent.parseConfig(`
version: "2"
agents:
  root:
    model: openai/gpt-4o-mini
    instruction: hi
`);
// =>
// {
//   version: "2",
//   agents: [{ name: "root", model: "openai/gpt-4o-mini", instruction: "hi", ... }],
//   models: { "openai/gpt-4o-mini": { provider: "openai", model: "gpt-4o-mini" } }
// }

Throws a JS Error on invalid YAML / unsupported version / failed validation.

`chat({yaml, agentName?, env?, messages}, onEvent) -> Promise`

Asynchronous. Loads the config, picks the agent (or the only one), builds its model provider, and opens one streaming chat completion.

yaml — the YAML document, same as for parseConfig.
agentName — required if the config defines more than one agent.
env — { OPENAI_API_KEY: "...", ANTHROPIC_API_KEY: "...", GEMINI_API_KEY: "..." }. Whatever your model needs.
messages — array of {role, content} objects, OpenAI-style. The agent's instruction is automatically prepended as a system message if you haven't already supplied one.
onEvent(ev) — called from Go for every stream event:
- {type: "delta", content?: string, reasoning?: string}
- {type: "finish", reason: string}

Resolves to {message: {role, content, reasoning, finish}} once the stream ends. Rejects with an Error on any failure.

Limits

These are not bugs to fix; they are direct consequences of GOOS=js:

No tools, no MCP, no hooks, no sub-agent handoffs. Anything that needs os/exec or local file I/O is excluded from the build.
No sessions. pkg/session and pkg/memory/database/sqlite pull in modernc.org/libc which does not have a js port. The browser caller is responsible for keeping the message history.
No fallbacks, no rule-based routing. The rule-based router uses bleve, which depends on mmap / file-locking primitives that don't exist on js/wasm. A js-only factory_js.go swaps the full provider factory for a slim variant with only openai / anthropic / google.
No Docker Model Runner, no Bedrock, no Vertex AI. Same reason — dmr shells out, bedrock and vertexai pull in cloud SDKs that don't cross-compile to wasm.
No Docker Desktop integration. pkg/desktop has stubs for js that return empty paths and refuse to dial.
CORS. Mentioned above. Real deployment needs a proxy.

Where the cross-compilation work lives

The shims that make the existing tree compile under GOOS=js GOARCH=wasm are intentionally tiny:

File	Purpose
`pkg/cache/lock_js.go`	No-op file-lock stubs (single-threaded js).
`pkg/desktop/sockets_js.go`	Returns empty Docker Desktop paths.
`pkg/desktop/connection_js.go`	Refuses Unix-socket / named-pipe dials.
`pkg/desktop/connection_other.go`	Build tag updated to `!windows && !js`.
`pkg/model/provider/factory.go`	Build tag added: `!js`.
`pkg/model/provider/factory_js.go`	js-only provider dispatch (openai / anthropic / google).

Everything else compiles unchanged because docker-agent already had the os/exec, sandbox, sound, audio, server, browser, keyring code well-isolated behind their own packages — the wasm entry just doesn't import them.

Sanity check

# Native build still happy.
go build ./...

# Wasm build still happy.
GOOS=js GOARCH=wasm go build -o /tmp/cagent.wasm ./cmd/wasm

# Existing tests of the touched packages still pass.
go test ./pkg/cache/... ./pkg/desktop/... ./pkg/model/provider/...

# End-to-end runtime smoke test.
node cmd/wasm/smoke_test.js

Documentation ¶

Rendered for js/wasm

Overview ¶

Package main is a js/wasm entry point that exposes docker-agent's agentic loop to the browser. It provides:

Config parsing (any version → latest via pkg/config).
Agent enumeration.
A full agentic loop: streaming chat with tool calling, multi-agent handoffs, fallback models, hooks, and remote MCP tool support.
Request cancellation via abort().

The binary is built with:

GOOS=js GOARCH=wasm go build -o web/docker-agent.wasm ./cmd/wasm

and loaded by web/index.html via wasm_exec.js.

What this entry point does NOT do:

Run shell commands (no os/exec under js/wasm).
Open files or listen on sockets.
Persist sessions to sqlite.
Render a TUI.

What it DOES do:

Full tool-calling loop (model requests tools → execute → return results).
Multi-agent handoffs (transfer_task, handoff).
Remote MCP tool support (HTTP/SSE transport, NOT stdio).
Hooks (session_start, turn_start, pre_tool_use, post_tool_use).
Fallback models with retry + backoff.
Request cancellation via abort().
Token usage reporting.

The JS API (registered on `globalThis.dockerAgent`):

dockerAgent.parseConfig(yamlString) -> {version, agents, models}
dockerAgent.listAgents(yamlString) -> [{name, model, description, instruction}]
dockerAgent.chat({yaml, agentName?, env?, messages}, onEvent) -> Promise<{message, usage?}>
dockerAgent.abort() -> void   // cancels any in-flight chat request

`onEvent` is called with one of:

{type: "delta",  content?: string, reasoning?: string}
{type: "tool_call", id: string, name: string, args: string}
{type: "tool_call_delta", id: string, name: string, arguments: string}
{type: "tool_result", id: string, name: string, output: string}
{type: "tool_blocked", id: string, name: string, reason: string}
{type: "handoff", from: string, to: string}
{type: "fallback", from: string, to: string, attempt: number, reason: string}
{type: "finish", reason: string}
{type: "usage",  input_tokens: number, output_tokens: number}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL