Runtime Wrapper
Standardised metric collection across pi, claude, and codex runtimes.
Each runtime emits structured JSONL when invoked in non-interactive mode, but with different event schemas. The runtime wrapper normalises these into a common SessionResult type (defined in packages/core/src/runtime.ts) so the daemon, skill-evolve, and any other consumer can work uniformly.
Interfaces
See packages/core/src/runtime.ts for the full TypeScript types.
The key types:
SessionResult— the main output: usage, cost, tool calls, file operations, turns, statusRuntimeAdapter— interface each runtime implements:parse(lines, meta) → SessionResultTokenUsage/CostBreakdown— normalised token and cost accountingToolCall— name, args, result, status, durationTurn— one model round-trip with per-turn usage breakdown
Invocation Modes
| Runtime | Command | Output mode flag | Event format |
|---|---|---|---|
| pi | pi -p | --mode json | Newline-delimited JSON events |
| claude | claude -p | --output-format json (summary) or --output-format stream-json (streaming) | Single JSON object or newline-delimited JSON events |
| codex | codex exec --experimental-json | Built-in | Newline-delimited JSON events |
Event Schemas
Pi (--mode json)
Pi emits a rich event stream with lifecycle events and per-message usage:
{type: "session", version, id, timestamp, cwd}
{type: "agent_start"}
{type: "turn_start"}
{type: "message_start", message: {role: "user", content: [...]}}
{type: "message_end", message: {role: "user", content: [...]}}
{type: "message_start", message: {role: "assistant", content: [...], model, usage, stopReason}}
... message_update events (text deltas) ...
{type: "message_end", message: {role: "assistant", ..., usage: {input, output, cacheRead, cacheWrite, totalTokens, cost: {input, output, cacheRead, cacheWrite, total}}}}
{type: "tool_execution_start", toolCallId, toolName, args}
{type: "tool_execution_end", toolCallId, toolName, result, isError}
{type: "message_start", message: {role: "toolResult", content: [...]}}
{type: "message_end", message: {role: "toolResult", ...}}
{type: "turn_end", message: {role: "assistant", ..., usage}, toolResults: [...]}
{type: "agent_end", messages: [...]}| Metric | Location |
|---|---|
| Session ID | session.id |
| Model | message.model (on assistant messages) |
| Input tokens | message.usage.input |
| Output tokens | message.usage.output |
| Cache read tokens | message.usage.cacheRead |
| Cache write tokens | message.usage.cacheWrite |
| Total tokens | message.usage.totalTokens |
| Cost (total) | message.usage.cost.total |
| Cost (breakdown) | message.usage.cost.{input,output,cacheRead,cacheWrite} |
| Stop reason | message.stopReason |
| Tool name | tool_execution_start.toolName |
| Tool args | tool_execution_start.args |
| Tool result | tool_execution_end.result |
| Tool error | tool_execution_end.isError |
| Turns | Count turn_start / turn_end pairs |
| Text output | Concatenate text content blocks from assistant messages |
Claude (--output-format json)
Claude's JSON mode returns a single summary object after completion:
{
"type": "result",
"subtype": "success",
"is_error": false,
"duration_ms": 2956,
"duration_api_ms": 2890,
"num_turns": 1,
"result": "Hello!",
"stop_reason": "end_turn",
"session_id": "...",
"total_cost_usd": 0.07,
"usage": {
"input_tokens": 5,
"output_tokens": 8,
"cache_creation_input_tokens": 9984,
"cache_read_input_tokens": 14857,
"server_tool_use": {"web_search_requests": 0}
},
"modelUsage": {
"claude-opus-4-7[1m]": {
"inputTokens": 5,
"outputTokens": 8,
"cacheReadInputTokens": 14857,
"cacheCreationInputTokens": 9984,
"costUSD": 0.07
}
}
}| Metric | Location |
|---|---|
| Session ID | session_id |
| Model | First key in modelUsage |
| Input tokens | usage.input_tokens |
| Output tokens | usage.output_tokens |
| Cache read tokens | usage.cache_read_input_tokens |
| Cache write tokens | usage.cache_creation_input_tokens |
| Cost (total) | total_cost_usd |
| Cost (per-model) | modelUsage[model].costUSD |
| Duration | duration_ms |
| API duration | duration_api_ms |
| Turns | num_turns |
| Stop reason | stop_reason |
| Text output | result |
| Tool calls | Not in JSON mode — need stream-json for per-tool detail |
Note: Claude's --output-format json gives aggregated metrics only. For per-tool-call detail (what meta-harness uses), --output-format stream-json emits per-event JSONL with assistant and user (tool_result) events, similar to the Anthropic Messages API format.
Claude (--output-format stream-json)
Streaming mode emits events matching the Anthropic Messages API shape:
{type: "system", ...}
{type: "assistant", message: {content: [{type: "text", text: "..."} | {type: "tool_use", id, name, input}], usage: {input_tokens, output_tokens, ...}}}
{type: "user", message: {content: [{type: "tool_result", tool_use_id, content, is_error}]}}
{type: "result", session_id, total_cost_usd, usage: {...}}This is richer than JSON mode — each assistant message has individual content blocks that can be text or tool_use, and the following user message carries the tool_result. The final result event has the summary.
Codex (exec --experimental-json)
Codex uses a thread/item model via its TypeScript SDK:
{type: "thread.started", thread_id: "..."}
{type: "turn.started"}
{type: "item.started", item: {id, type: "agent_message" | "command_execution" | "file_change" | "mcp_tool_call" | ...}}
{type: "item.updated", item: {...}}
{type: "item.completed", item: {...}}
{type: "turn.completed", usage: {input_tokens, cached_input_tokens, output_tokens, reasoning_output_tokens}}items.ts):
| Item type | Description | Key fields |
|---|---|---|
agent_message | Text response | text |
reasoning | Chain-of-thought | text |
command_execution | Shell command | command, aggregated_output, exit_code, status |
file_change | File patch | changes: [{path, kind}], status |
mcp_tool_call | MCP tool invocation | server, tool, arguments, result, error, status |
web_search | Web search | query |
todo_list | Agent's task list | items: [{text, completed}] |
error | Non-fatal error | message |
| Metric | Location |
|---|---|
| Session ID | thread.started.thread_id |
| Model | Not in events (passed as config) |
| Input tokens | turn.completed.usage.input_tokens |
| Output tokens | turn.completed.usage.output_tokens |
| Cache tokens | turn.completed.usage.cached_input_tokens |
| Reasoning tokens | turn.completed.usage.reasoning_output_tokens |
| Cost | Not in events (must compute from token counts + pricing) |
| Turns | Count turn.started / turn.completed pairs |
| Stop reason | Infer from last event type |
| Tool calls | item.completed where item.type === "command_execution" or "mcp_tool_call" |
| File changes | item.completed where item.type === "file_change" |
| Text output | Concatenate item.completed where item.type === "agent_message" |
- No cost data in events — must be computed externally from token counts and model pricing
- No explicit model field in events — passed as config, not echoed back
reasoning_output_tokensis a distinct field (OpenAI counts reasoning separately)
Field Mapping Summary
How each runtime field maps to the common SessionResult:
| SessionResult field | Pi | Claude (json) | Codex |
|---|---|---|---|
sessionId | session.id | session_id | thread.started.thread_id |
model | message.model | modelUsage key | config input |
status | infer from agent_end | subtype | infer from exit |
exitCode | process exit | process exit | process exit |
usage.input | Σ message.usage.input | usage.input_tokens | Σ turn.completed.usage.input_tokens |
usage.output | Σ message.usage.output | usage.output_tokens | Σ turn.completed.usage.output_tokens |
usage.cacheRead | Σ message.usage.cacheRead | usage.cache_read_input_tokens | Σ turn.completed.usage.cached_input_tokens |
usage.cacheWrite | Σ message.usage.cacheWrite | usage.cache_creation_input_tokens | — |
usage.reasoning | — | — | Σ turn.completed.usage.reasoning_output_tokens |
cost.total | last message.usage.cost.total | total_cost_usd | computed |
durationMs | compute from timestamps | duration_ms | compute from timestamps |
turns | count turn_end events | num_turns | count turn.completed events |
toolCalls | tool_execution_{start,end} | stream-json tool_use blocks | command_execution + mcp_tool_call items |
fileChanges | infer from tool args | infer from tool args | file_change items (native) |
text | assistant text blocks | result | agent_message items |
Implementation Plan
Each runtime gets an adapter that implements RuntimeAdapter:
packages/core/src/
runtime.ts ← interfaces (done)
adapters/
pi.ts ← parses pi --mode json output
claude.ts ← parses claude --output-format json (or stream-json)
codex.ts ← parses codex exec --experimental-json outputThe daemon's executePipeline currently does raw Bun.spawn → stdout.log. It would instead:
- Spawn the runtime with the appropriate JSON output flag
- Collect JSONL lines
- Pass them through the runtime's adapter →
SessionResult - Store the
SessionResultfor querying, logging, and skill-evolve
References
- Pi source —
--mode jsonevent format - Claude Code docs —
--output-formatoptions - Codex SDK —
events.tsanditems.tstype definitions - Meta-Harness claude_wrapper.py — reference for stream-json parsing
packages/core/src/runtime.ts— the interfacesdocs/skill-evolve.md— primary consumer of session metrics

