Runtime Wrapper

Standardised metric collection across pi, claude, and codex runtimes.

Each runtime emits structured JSONL when invoked in non-interactive mode, but with different event schemas. The runtime wrapper normalises these into a common SessionResult type (defined in packages/core/src/runtime.ts) so the daemon, skill-evolve, and any other consumer can work uniformly.

Interfaces

See packages/core/src/runtime.ts for the full TypeScript types.

The key types:

SessionResult — the main output: usage, cost, tool calls, file operations, turns, status
RuntimeAdapter — interface each runtime implements: parse(lines, meta) → SessionResult
TokenUsage / CostBreakdown — normalised token and cost accounting
ToolCall — name, args, result, status, duration
Turn — one model round-trip with per-turn usage breakdown

Invocation Modes

Runtime	Command	Output mode flag	Event format
pi	`pi -p`	`--mode json`	Newline-delimited JSON events
claude	`claude -p`	`--output-format json` (summary) or `--output-format stream-json` (streaming)	Single JSON object or newline-delimited JSON events
codex	`codex exec --experimental-json`	Built-in	Newline-delimited JSON events

Event Schemas

Pi (`--mode json`)

Pi emits a rich event stream with lifecycle events and per-message usage:

{type: "session", version, id, timestamp, cwd}
{type: "agent_start"}
{type: "turn_start"}
{type: "message_start", message: {role: "user", content: [...]}}
{type: "message_end",   message: {role: "user", content: [...]}}
{type: "message_start", message: {role: "assistant", content: [...], model, usage, stopReason}}
  ... message_update events (text deltas) ...
{type: "message_end",   message: {role: "assistant", ..., usage: {input, output, cacheRead, cacheWrite, totalTokens, cost: {input, output, cacheRead, cacheWrite, total}}}}
{type: "tool_execution_start", toolCallId, toolName, args}
{type: "tool_execution_end",   toolCallId, toolName, result, isError}
{type: "message_start", message: {role: "toolResult", content: [...]}}
{type: "message_end",   message: {role: "toolResult", ...}}
{type: "turn_end", message: {role: "assistant", ..., usage}, toolResults: [...]}
{type: "agent_end", messages: [...]}

Key fields for metric extraction:

Metric	Location
Session ID	`session.id`
Model	`message.model` (on assistant messages)
Input tokens	`message.usage.input`
Output tokens	`message.usage.output`
Cache read tokens	`message.usage.cacheRead`
Cache write tokens	`message.usage.cacheWrite`
Total tokens	`message.usage.totalTokens`
Cost (total)	`message.usage.cost.total`
Cost (breakdown)	`message.usage.cost.{input,output,cacheRead,cacheWrite}`
Stop reason	`message.stopReason`
Tool name	`tool_execution_start.toolName`
Tool args	`tool_execution_start.args`
Tool result	`tool_execution_end.result`
Tool error	`tool_execution_end.isError`
Turns	Count `turn_start` / `turn_end` pairs
Text output	Concatenate `text` content blocks from assistant messages

Claude (`--output-format json`)

Claude's JSON mode returns a single summary object after completion:

{
  "type": "result",
  "subtype": "success",
  "is_error": false,
  "duration_ms": 2956,
  "duration_api_ms": 2890,
  "num_turns": 1,
  "result": "Hello!",
  "stop_reason": "end_turn",
  "session_id": "...",
  "total_cost_usd": 0.07,
  "usage": {
    "input_tokens": 5,
    "output_tokens": 8,
    "cache_creation_input_tokens": 9984,
    "cache_read_input_tokens": 14857,
    "server_tool_use": {"web_search_requests": 0}
  },
  "modelUsage": {
    "claude-opus-4-7[1m]": {
      "inputTokens": 5,
      "outputTokens": 8,
      "cacheReadInputTokens": 14857,
      "cacheCreationInputTokens": 9984,
      "costUSD": 0.07
    }
  }
}

Key fields for metric extraction:

Metric	Location
Session ID	`session_id`
Model	First key in `modelUsage`
Input tokens	`usage.input_tokens`
Output tokens	`usage.output_tokens`
Cache read tokens	`usage.cache_read_input_tokens`
Cache write tokens	`usage.cache_creation_input_tokens`
Cost (total)	`total_cost_usd`
Cost (per-model)	`modelUsage[model].costUSD`
Duration	`duration_ms`
API duration	`duration_api_ms`
Turns	`num_turns`
Stop reason	`stop_reason`
Text output	`result`
Tool calls	Not in JSON mode — need `stream-json` for per-tool detail

Note: Claude's --output-format json gives aggregated metrics only. For per-tool-call detail (what meta-harness uses), --output-format stream-json emits per-event JSONL with assistant and user (tool_result) events, similar to the Anthropic Messages API format.

Claude (`--output-format stream-json`)

Streaming mode emits events matching the Anthropic Messages API shape:

{type: "system", ...}
{type: "assistant", message: {content: [{type: "text", text: "..."} | {type: "tool_use", id, name, input}], usage: {input_tokens, output_tokens, ...}}}
{type: "user", message: {content: [{type: "tool_result", tool_use_id, content, is_error}]}}
{type: "result", session_id, total_cost_usd, usage: {...}}

This is richer than JSON mode — each assistant message has individual content blocks that can be text or tool_use, and the following user message carries the tool_result. The final result event has the summary.

Codex (`exec --experimental-json`)

Codex uses a thread/item model via its TypeScript SDK:

{type: "thread.started", thread_id: "..."}
{type: "turn.started"}
{type: "item.started", item: {id, type: "agent_message" | "command_execution" | "file_change" | "mcp_tool_call" | ...}}
{type: "item.updated", item: {...}}
{type: "item.completed", item: {...}}
{type: "turn.completed", usage: {input_tokens, cached_input_tokens, output_tokens, reasoning_output_tokens}}

Item types (from codex SDK items.ts):

Item type	Description	Key fields
`agent_message`	Text response	`text`
`reasoning`	Chain-of-thought	`text`
`command_execution`	Shell command	`command`, `aggregated_output`, `exit_code`, `status`
`file_change`	File patch	`changes: [{path, kind}]`, `status`
`mcp_tool_call`	MCP tool invocation	`server`, `tool`, `arguments`, `result`, `error`, `status`
`web_search`	Web search	`query`
`todo_list`	Agent's task list	`items: [{text, completed}]`
`error`	Non-fatal error	`message`

Key fields for metric extraction:

Metric	Location
Session ID	`thread.started.thread_id`
Model	Not in events (passed as config)
Input tokens	`turn.completed.usage.input_tokens`
Output tokens	`turn.completed.usage.output_tokens`
Cache tokens	`turn.completed.usage.cached_input_tokens`
Reasoning tokens	`turn.completed.usage.reasoning_output_tokens`
Cost	Not in events (must compute from token counts + pricing)
Turns	Count `turn.started` / `turn.completed` pairs
Stop reason	Infer from last event type
Tool calls	`item.completed` where `item.type === "command_execution"` or `"mcp_tool_call"`
File changes	`item.completed` where `item.type === "file_change"`
Text output	Concatenate `item.completed` where `item.type === "agent_message"`

Codex gaps:

No cost data in events — must be computed externally from token counts and model pricing
No explicit model field in events — passed as config, not echoed back
reasoning_output_tokens is a distinct field (OpenAI counts reasoning separately)

Field Mapping Summary

How each runtime field maps to the common SessionResult:

SessionResult field	Pi	Claude (json)	Codex
`sessionId`	`session.id`	`session_id`	`thread.started.thread_id`
`model`	`message.model`	`modelUsage` key	config input
`status`	infer from `agent_end`	`subtype`	infer from exit
`exitCode`	process exit	process exit	process exit
`usage.input`	Σ `message.usage.input`	`usage.input_tokens`	Σ `turn.completed.usage.input_tokens`
`usage.output`	Σ `message.usage.output`	`usage.output_tokens`	Σ `turn.completed.usage.output_tokens`
`usage.cacheRead`	Σ `message.usage.cacheRead`	`usage.cache_read_input_tokens`	Σ `turn.completed.usage.cached_input_tokens`
`usage.cacheWrite`	Σ `message.usage.cacheWrite`	`usage.cache_creation_input_tokens`	—
`usage.reasoning`	—	—	Σ `turn.completed.usage.reasoning_output_tokens`
`cost.total`	last `message.usage.cost.total`	`total_cost_usd`	computed
`durationMs`	compute from timestamps	`duration_ms`	compute from timestamps
`turns`	count `turn_end` events	`num_turns`	count `turn.completed` events
`toolCalls`	`tool_execution_{start,end}`	stream-json `tool_use` blocks	`command_execution` + `mcp_tool_call` items
`fileChanges`	infer from tool args	infer from tool args	`file_change` items (native)
`text`	assistant text blocks	`result`	`agent_message` items

Implementation Plan

Each runtime gets an adapter that implements RuntimeAdapter:

packages/core/src/
  runtime.ts          ← interfaces (done)
  adapters/
    pi.ts             ← parses pi --mode json output
    claude.ts         ← parses claude --output-format json (or stream-json)
    codex.ts          ← parses codex exec --experimental-json output

The daemon's executePipeline currently does raw Bun.spawn → stdout.log. It would instead:

Spawn the runtime with the appropriate JSON output flag
Collect JSONL lines
Pass them through the runtime's adapter → SessionResult
Store the SessionResult for querying, logging, and skill-evolve

References

Pi source — --mode json event format
Claude Code docs — --output-format options
Codex SDK — events.ts and items.ts type definitions
Meta-Harness claude_wrapper.py — reference for stream-json parsing
packages/core/src/runtime.ts — the interfaces
docs/skill-evolve.md — primary consumer of session metrics

Interfaces

Invocation Modes

Event Schemas

Pi (--mode json)

Claude (--output-format json)

Claude (--output-format stream-json)

Codex (exec --experimental-json)

Field Mapping Summary

Implementation Plan

References

Pi (`--mode json`)

Claude (`--output-format json`)

Claude (`--output-format stream-json`)

Codex (`exec --experimental-json`)