MCP Protocol Surface

Overview

The daemon currently exposes itself over MCP as a thin RPC layer: a handful of tools (create_workload, submit_invocation, get_invocation, cancel_invocation, close_workload) plus agent and skill descriptors as resources. This is a CLI-shaped MCP server — every interaction is a request/response tool call, and the host has to poll to observe long-running work.

That's a fraction of what the protocol gives us. MCP is a session in which the server is a peer that can hold state, push updates, prompt the user directly, and call back into the host's model. For a system whose core primitive is remote skill execution in a sandbox, those affordances aren't decorative — they map onto things we already need.

This document is a roadmap for the protocol surface we want, ranked by leverage. Concrete shape, not implementation order; sequencing is the last section.

Where we are today

Current MCP exposure (see services/daemon/src/mcp/):

Tools: create_workload, submit_invocation, get_invocation, list_invocations, cancel_invocation, close_workload, list_workloads. Sync request/response over the wire.
Resources: each agent (agent://reviewer, agent://ghostwriter) and skill (skill://review-repo, etc.) is exposed read-only as a markdown resource. Discovery works; no subscriptions, no parameterization.
Prompts, sampling, elicitation, progress, roots, completion: unused.

The polling loop is the most visible cost. A skill that takes 90 seconds becomes 10–30 get_invocation tool calls from the host, each consuming context budget.

The richer surface

Tier 1 — architectural shifts

Sampling — skills borrow the host's brain

Today, a skill that needs an LLM mid-execution either ships its own model credentials or doesn't use one. Sampling inverts this: the skill, mid-run, asks the daemon to call back into the host's LLM via sampling/createMessage. The host's model, the host's quota, the daemon orchestrating.

Why it matters:

review-repo, address-pr-comments, ghostwriter all bake in a model today. They shouldn't have to.
Skills become model-agnostic and portable across hosts (Claude Code, Cursor, Zed, custom).
Eliminates a class of secret-management problem (no per-skill API keys).
Cost attribution moves to the user invoking the skill, which is the right place for it.

Caveat: not every host implements sampling. Claude Desktop does; Code support is uneven. Plan for fallback to the skill's configured runtime when sampling isn't available.

Elicitation — structured human-in-the-loop

address-pr-comments is already a gated loop: for each comment the user picks apply / wontfix / defer / discuss. Right now this is implemented by round-tripping through tool results and prompt-engineered text. With elicitation, the daemon directly issues elicitation/create to the host with a JSON schema, and the host renders a typed form. The pause-and-ask becomes a first-class protocol move, not emergent behavior.

Skills that benefit:

address-pr-comments — the per-comment decision
create-draft — inline edits/approvals
Any new "review and confirm" pattern (deploys, migrations, destructive ops)

Sandbox-as-URI-namespace via subscribable resources

Every observable inside a sandboxed workload becomes a resource:

workload://{id}                          # high-level state
workload://{id}/status                   # subscribable
workload://{id}/logs                     # subscribable, tail in real time
workload://{id}/invocations              # list
workload://{id}/invocations/{iid}        # invocation state, subscribable
workload://{id}/invocations/{iid}/result # produced artifact
workload://{id}/fs/{path}                # sandbox filesystem, lazy read
workload://{id}/artifacts/{name}         # named outputs

Clients call resources/subscribe; the daemon emits notifications/resources/updated as state changes. The host stops polling, the agent stops burning tokens watching, the user gets live state. This is the model the spec wants us in.

Resource templates make the URI shape discoverable:

{
  uriTemplate: "workload://{workload_id}/invocations/{invocation_id}",
  name: "Invocation state",
  mimeType: "application/json"
}

Tier 2 — clear wins, smaller surface

Prompts as the user-facing skill surface

Tools are for things the agent should decide to invoke. Prompts are for things the user should be able to invoke. Skills are mostly the latter.

Today: submit_invocation(skill="review-repo", payload="...") — the agent has to know to call this tool and how to shape the payload.

With prompts: each skill exposes itself via prompts/list with typed arguments:

{
  name: "review-repo",
  description: "Review a repository...",
  arguments: [
    { name: "url",   description: "Repo URL or local path", required: true },
    { name: "focus", description: "Optional focus area",     required: false }
  ]
}

The host surfaces /review-repo {url} {focus?} as a slash command. Same execution path underneath; vastly better discoverability. Tools stay for things like cancel_invocation that the agent should reach for autonomously.

Progress notifications

Long-running skills emit notifications/progress with progressToken against the originating request. step 3/7: cloning repo… flows to the host UI natively. Free UX, no protocol gymnastics.

Cancellation propagation

When the host issues notifications/cancelled for an in-flight request, the daemon must SIGTERM the sandbox process, mark the workload's invocation cancelled, and clean up. Today, hitting Esc in the host probably leaves the runtime running until TTL or natural completion.

Resource links in tool results

When a skill finishes, return a resource_link to workload://{id}/invocations/{iid}/result instead of inlining the artifact. Hosts that support it render inline; ones that don't follow the link. Cheaper on the wire, and the result stays addressable for re-fetching later (e.g. by another skill in the same workload).

Tier 3 — speculative but fun

MCP UI for a workload dashboard

A ui://workloads resource (HTML in the mcp-app profile) renders a live dashboard inside the host's iframe: active workloads, queue depth, recent results, log tails. Far more useful than text when 5 things are running. Already in the wild — Slack and Excalidraw both ship ui:// resources we can study.

Roots → sandbox mounts

Hosts declare workspace roots via roots/list. The daemon uses those to decide what the sandbox is allowed to see at mount time. Roots become the bridge between "what the user is working on in their editor" and "what the sandbox filesystem boundary exposes."

Argument completion

completion/complete for skill prompt arguments — repo URLs, ticket IDs, PR numbers. Pulled from the daemon's KB or recent workload history. Makes prompts feel like real CLI commands with tab-completion.

Mapping to daemon concepts

Cross-referencing workload-architecture.md:

Daemon concept	Current MCP shape	Richer MCP shape
Workload	Created by `create_workload` tool, observed by `list_workloads`	`workload://{id}` resource (subscribable). Created by tool, observed via subscription.
Invocation	Submitted by `submit_invocation`, polled via `get_invocation`	Submitted by tool or prompt invocation. Observed via `workload://{id}/invocations/{iid}` subscription + progress notifications.
Skill	`skill://{name}` markdown resource	Same resource + prompt entry with typed args + (optional) sampling client during execution.
Agent	`agent://{name}` markdown resource	Same. Agents are persona configs; they don't need their own protocol surface.
Result	Returned in `get_invocation` payload	`resource_link` to `workload://{id}/invocations/{iid}/result`. Re-fetchable, embeddable.
User decision point in skill	Prompt-engineered text in tool result	`elicitation/create` with JSON schema.
Skill log line	Not exposed	Resource update on `workload://{id}/invocations/{iid}/logs`.
Cancellation	`cancel_invocation` tool	Same tool plus `notifications/cancelled` honored.

Sequencing

Build order, by dependency and leverage:

Subscribable workload/invocation resources + progress notifications. Smallest change, biggest UX delta. Foundation for everything else (logs, artifacts, sandbox-as-namespace). No host-support concerns — resources and progress are widely supported.
Resource links in invocation results + log resources. Falls out of (1). Makes results addressable.
Cancellation propagation. Plumbing fix, not a new surface, but it's a correctness gap and the resources from (1) make verification easy.
Prompts for skills. User-facing win. Doesn't disturb existing tool callers (the tool API remains for agentic invocation).
Elicitation. Refactor address-pr-comments first as the proving ground. Other gated skills follow.
Sampling. Highest ceiling but largest unknowns — host coverage, fallback behavior, cost attribution UX. Pilot on one skill (ghostwriter is a good candidate: small, LLM-only, no sandbox integration).
MCP UI dashboard, roots, completion. Polish layer. Tackle once the core surface is stable.

Non-goals

We are not redesigning the underlying workload/invocation/skill model. The MCP surface is a projection of those concepts; the internal model in services/daemon stays as designed in workload-architecture.md.
We are not committing to support every MCP host equally. Sampling and elicitation degrade gracefully where unsupported; the tool-based fallback path stays functional.
We are not exposing raw sandbox shell access as a resource. workload://{id}/fs/{path} is read-only and scoped; mutations go through skills.