MCP Subagents

Ask Apple Intelligence something it can’t handle and watch what happens. It delegates to ChatGPT. You get a brief “Working with ChatGPT…” indicator, then results come back. Claude Code delegates to subagents. Linear lets you @mention agents that run entire sessions with their own lifecycle. Stagehand wraps an internal LLM loop behind MCP tools for browser automation. OpenClaw’s ACPX delegates to external coding harnesses like Claude Code and Codex.

Everyone keeps building the same thing. Nobody has standardized it. I think MCP should, not as a new protocol, but as an extension to the one that already has distribution.

“Just ship your tools”

I work on an investment platform. Think Google Workspace for investors: backtesting, portfolio analysis, scenario modeling, risk assessment, trade execution. Over 150 commands in the CLI. The obvious MCP integration is to expose each command as a tool and let the client orchestrate.

The pushback I get from my own team is some version of this: just ship the CLI as the MCP server, or ship agent skills that teach the client how to use our tools. I tried both.

A user wants to “find the top AI news stories, run scenario analysis on each, and backtest the results.” With tools exposed directly, the client has to figure out the right sequence: search news, extract themes, create scenarios for each story, run backtests against benchmarks, compare risk-adjusted returns. That’s dozens of tool calls across a domain the client doesn’t deeply understand.

With delegation, the user sends one message. Our agent (running in a sandbox with the CLI, Python, pandas, our APIs) handles the sequencing internally. It creates 9 tilts (3 scenarios x 3 stories), backtests each against SPY, compares Sharpe ratios and max drawdowns, and recommends the best rotation candidate with reasoning. One delegation, one result.

Cloudflare arrived at a similar insight. Their API has thousands of endpoints. Instead of exposing each as an MCP tool, they built Code Mode: two tools (search and execute) that let the model write JavaScript against their API inside a sandboxed Worker. They collapsed the entire API surface into a sandbox execution environment. Different shape, same instinct: the client shouldn’t see the full tool surface.

When the client orchestrates via tools (even with great skills), the server sees a sequence of independent function calls. There’s no trace of which sequences worked. If the user says “no, remove the tech stocks,” that correction goes to the client, not to us. Their thumbs-up goes to Anthropic.

When we own the execution loop, we capture the full decision trace at two levels. The outer trace records the user’s intent and our agent’s final output. The inner trace records every tool call, every intermediate result, every reasoning step inside the sandbox. We can tune our prompts based on real usage. We can improve the skills that users actually invoke. That feedback stays in our system.

Skills shipped to users are skills you lose visibility into. You can’t improve what you can’t observe. This isn’t just a business case: vendors who see their agent’s traces make better agents, and better subagents make the whole ecosystem better.

The same contract, reinvented everywhere

Linear’s Agent SDK lets anyone build agents that participate as workspace members. When you @mention an agent, Linear creates an agent session with a defined lifecycle: pending → active → awaitingInput → complete → error → stale. The agent emits typed activities (thought, action, elicitation, response, error) that Linear displays inline. It’s well-designed. It’s also completely proprietary to Linear.

Stagehand is an MCP server for browser automation. When you tell it to “fill out the contact form,” you’re not orchestrating selectors. Stagehand’s internal AI figures out the page structure, adapts to layout changes, and executes. OpenClaw’s ACPX takes it further: it delegates to external coding harnesses like Claude Code, Codex, and Cursor through the Agent Client Protocol, with session resume and permission handling. But ACP is scoped to code editors.

I built my own version for the investment platform. Five MCP tools: delegate, get_session, get_run, cancel_run, close_session. A session/run lifecycle with progress events streamed via SSE. Claude Code picks it up naturally, pattern-matching the delegation exactly like it does with its own subagents, except the subagent runs on our infrastructure.

Every one of these implements the same contract: delegate intent, stream progress, return results, cancel, manage sessions. Nobody coordinated. Everyone arrived here.

Extend MCP, don’t replace it

The obvious counterargument: isn’t this just A2A? The Agent-to-Agent protocol (originally Google, now a Linux Foundation project) was designed for exactly this use case.

I think A2A has the right intuition and the wrong strategy. It’s a new protocol. Nobody uses it. It doesn’t pave cow paths.

MCP has millions of daily active clients. Claude Code, Cursor, Windsurf, Claude Desktop, OpenAI’s desktop app. The extension mechanism already exists, with official extensions for OAuth and Apps (inline UI). This is how HTTP became REST’s transport: it was already everywhere.

ACP explicitly reuses MCP’s JSON representations, but the subagent pattern isn’t code-editor-specific. It applies to any domain where a specialist agent can do better work than the client orchestrating tools. Scoping it to editors is a missed opportunity.

A subagents extension to MCP would be backwards-compatible. Clients that don’t support it see regular tools. Clients that do get session management, progress streaming, and cancellation. I have this running in production today.

MCP Subagents Protocol

What the extension standardizes

The extension would standardize four things:

Delegation. A schema for sending intent to a subagent. Input: a message (natural language), an optional session ID for continuation, optional context references. Output: a run ID and session ID. The message is the whole point. It’s intent, not a function call.

Session lifecycle. Create, resume, close. Sessions maintain continuity across multiple delegations. The subagent remembers prior context, so follow-up questions work without re-explaining the whole task.

Progress streaming. Status changes and intermediate output streamed to the client. Linear’s activity types are a good reference: thought, action, response, error. Clients can display these to users or use them for their own reasoning.

Cancellation. Stop a running subagent cleanly.

What it does NOT standardize: the agent runtime, the model, the tools, the billing. The subagent is opaque. The server picks its own model, runs its own tools, manages its own token costs. The client doesn’t care, the same way you don’t care which model Apple Intelligence delegates to when it hands off to ChatGPT. Authentication and billing happen at the MCP server level, same as today.

One thing that surprised me: shared connectors might be a non-issue. The outer agent has access to email, calendars, files. The subagent doesn’t need direct access to those. The client mediates, extracting what’s relevant and passing it as input to the delegation. This is actually better for privacy. And composition happens naturally. Our subagent has an askClarification skill that just asks a question in natural language when it needs more input. Claude Code receives this, recognizes it as a question for the user, and maps it to its own AskUserQuestion tool. No mechanical wiring. The models just do the right thing.

If this extension existed, Linear could say “we support MCP subagents” and instantly work with every MCP server that implements the protocol, instead of maintaining a proprietary agent SDK. Stagehand wouldn’t need to hand-roll session management. OpenClaw wouldn’t need ACPX. Every team building bespoke agent delegation protocols could implement against a single spec. Not tools-as-functions, but tools-as-agents.

If you’re on the MCP working group, I’d be interested in working on a formal extension proposal. What am I missing?

Process note: Drafted from internal demos and architecture docs. Used AI for transcription and structural editing. All links and implementations reference production systems I built or reviewed. Mistakes are mine.