Trace capture
control17’s trace capture is a first-class feature for replacing custom agent orchestrations whose main value prop is built-in logging. The idea: let operators see what the LLM actually said and what tools it actually called, scoped to each objective, without embedding observability hooks into the agent itself.
The runner (c17 claude-code) runs upstream of the claude
process and intercepts its network traffic at the TLS layer via a
loopback MITM TLS proxy with a per-session local CA. Every
HTTPS request the agent makes is transparently decrypted by the
proxy, observed as plaintext, re-encrypted toward the real
upstream, and passed through. From the upstream’s point of view,
we are a normal TLS client doing standard SNI + cert validation —
it can’t tell us apart from any other user-agent, which means
OAuth flows, token refreshes, streaming responses, and SSE all
work identically.
Zero external tools. No tshark. No pcap. No SSLKEYLOGFILE. Just
Node’s built-in crypto + tls + a tiny amount of node-forge
for cert signing.
Setup
# Verify everything's in place before your first run.
c17 claude-code --doctor
The --doctor command runs four checks and prints pass/warn/fail:
claudebinary — must be on$PATHor pointed to via$CLAUDE_PATH. FAIL if missing.$TMPDIRwritable — must be writable with0o600. The runner writes the CA cert PEM here. FAIL if not writable.- loopback proxy bindable — must be able to
listen()on127.0.0.1:0. FAIL on kernel-level networking issues. - trace CA + leaf cert generation — exercises the full node-forge signing pipeline end-to-end. Catches runtime crypto issues before the first real spawn.
Exit code is 0 if no checks failed, 1 otherwise.
What the runner does at startup
When you run c17 claude-code (without --no-trace), the runner:
-
Generates a fresh per-session local CA. One CA keypair
- one shared leaf keypair, both held in memory. The CA cert
(public half only) is written to
$TMPDIR/c17-trace-ca-<pid>-<nonce>.pemat0o600. The CA’s private key never touches disk.
- one shared leaf keypair, both held in memory. The CA cert
(public half only) is written to
-
Starts a loopback HTTP CONNECT proxy on a random ephemeral port. The proxy is configured with the CA’s cert pool so it can mint leaf certs on demand for any hostname the agent asks for.
-
Starts the activity uploader. Seeds
objective_openevents for every objective currently assigned to the slot (from the initial briefing) and begins streaming any activity events toPOST /agents/:callsign/activity. -
Backs up the operator’s
.mcp.jsonto a pid-scoped tmp directory, atomic-writes a new one with ac17entry pointing atc17 mcp-bridge. -
Spawns
claudewith inherited stdio and these env vars merged in:HTTPS_PROXY=http://127.0.0.1:<port> HTTP_PROXY=http://127.0.0.1:<port> ALL_PROXY=http://127.0.0.1:<port> NO_PROXY=localhost,127.0.0.1,::1,<caller's value> NODE_USE_ENV_PROXY=1 NODE_EXTRA_CA_CERTS=<path to CA pem> NODE_TLS_REJECT_UNAUTHORIZED=0 C17_RUNNER_SOCKET=<IPC socket path>NODE_TLS_REJECT_UNAUTHORIZED=0is a failsafe for packaged- binary Node distributions (pkg, sea, yao-pkg) that ship their own bundled cert store whichNODE_EXTRA_CA_CERTScan’t extend. Claude Code v2.x is such a binary. The blast radius is scoped to this single loopback-only runner session, so the risk is self-contained. -
Waits for claude to exit. On any exit path (normal, SIGINT, SIGTERM, uncaughtException), restores the original
.mcp.json, deletes the CA cert PEM, closes the proxy relay, and unlinks the IPC socket.
How the MITM works
When the agent issues CONNECT api.anthropic.com:443 through the
proxy:
agent proxy upstream
│ │ │
│ CONNECT host:443 │ │
│───────────────────>│ │
│ │ TLS handshake │
│ │───────────────────>│
│ │ (standard SNI + │
│ │ cert validation) │
│ │<───────────────────│
│ 200 Established │ │
│<───────────────────│ │
│ ClientHello │ │
│───────────────────>│ │
│ [proxy issues leaf │ │
│ cert for host, │ │
│ signs with CA, │ │
│ wraps socket in │ │
│ TLSSocket server] │ │
│ ServerHello... │ │
│<───────────────────│ │
│ plain HTTP req ──> │ encrypted req ──> │
│ plain HTTP rsp <── │ encrypted rsp <── │
Two independent TLS sessions. The agent talks to us over TLS
(trusting our CA via NODE_EXTRA_CA_CERTS), we talk to the
upstream over TLS (with the upstream’s real cert). In between,
we have plaintext in both directions — which we hand to the
decoder as-is.
The streaming activity model
There are no per-objective spans. The runner maintains one agent activity stream per slot — an append-only timeline of everything the agent’s runner observed:
llm_exchange— an Anthropic API request/response pairopaque_http— a non-Anthropic HTTP exchangeobjective_open— the slot just took ownership of an objectiveobjective_close— the slot released it
Objective “traces” are a time-range view over this stream —
the web UI queries GET /agents/<assignee>/activity?from=<open>& to=<close>&kind=llm_exchange to pull the LLM calls made during
an objective’s lifetime, rather than loading a separately-stored
per-objective blob.
Capture runs entirely live: as soon as the MITM proxy finishes
reassembling an HTTP/1.1 request/response pair, the runner parses
it, extracts + redacts, wraps it as an llm_exchange or
opaque_http activity event, and enqueues it for streaming upload
via POST /agents/:callsign/activity. No per-span buffering, no
memory accumulation over objective lifetime, no big flush at span
close.
The decode pipeline
For every HTTP/1.1 exchange the reassembler completes:
- Incremental parse via
Http1Reassembler(reads plaintext chunks as they arrive from the MITM proxy, keeps rolling buffers per TLS session, emits completed request/response pairs in FIFO order). - Extract Anthropic API shape via
extractEntries(anthropic.ts). ForPOST /v1/messageson*.anthropic.com, parse into a typedAnthropicMessagesEntrywithmodel,maxTokens,system,messages,tools,stopReason, andusage. Everything else becomes anOpaqueHttpEntrywith headers + body previews. - Redact secrets via
redactJson(redact.ts): stripAuthorization,x-api-key,cookie,set-cookie,x-anthropic-api-key,proxy-authorizationheaders and scrubsk-ant-…,sk-…,AKIA…,ghp_…,xox[baprs]-…patterns in string values. - Enqueue in the
ActivityUploader— a batched streaming sender that flushes every 50 events OR 64 KB OR 500 ms, whichever comes first. Failures retry with exponential backoff (200 ms → 30 s); the queue is hard-capped at 1000 events / 1 MB with oldest-first eviction under sustained broker unreachability.
Objective lifecycle markers are emitted directly by the runner
whenever the objectives tracker’s open set changes — the tracker
diff adds objective_open events for new ids and
objective_close events for ids that just left the set. These
flow through the same uploader as LLM exchanges.
Viewing traces
Commanders review uploaded traces in the web UI’s TracePanel on each objective’s detail page:
- Queries
GET /agents/<assignee>/activity?from=<objective.createdAt>&to=<objective.completedAt ?? now>&kind=llm_exchange - Renders each returned LLM exchange with model name, token usage
(
in=150 out=42 cache_hit=100), and message list - Expands Anthropic messages into text blocks + tool_use + tool_result entries inline
The panel is commander-gated in two places:
- Client:
ObjectiveDetail.tsxonly mounts<TracePanel>whenbriefing.authority === 'commander' - Server:
GET /agents/:callsign/activityreturns 403 to any non-commander, including the assignee themselves
The server gate is the real boundary. The client gate is a UX optimization.
Security posture
Trace capture inherently reveals secrets the agent used during the work. control17 mitigates this with defense in depth:
- MITM is loopback-only and session-scoped. The proxy binds
only to
127.0.0.1on a random ephemeral port. The CA is generated fresh per runner process, its cert is written with0o600, and its private key never touches disk. - Redaction at parse time. Secrets are replaced with
[REDACTED]before entries leave the runner. The server never sees the plaintext token. - Commander-only view. Even if secrets slip past redaction, only commanders can see them. Operators, lieutenants, watchers, and even the assignee who captured the trace all get 403 on the GET endpoint.
- CA cert deleted on runner exit. The cert PEM is unlinked on every exit path (normal, SIGINT, SIGTERM, uncaughtException).
.mcp.jsonrestored on every exit — the operator’s pre-run MCP config is backed up and restored idempotently.- Upload is best-effort. If the upload fails, the runner logs and moves on. It does NOT retry, queue, or persist the trace to disk.
Opting out
c17 claude-code --no-trace disables the entire trace subsystem.
No proxy relay, no CA generation, no env var injection. The
runner still handles the briefing, SSE, objectives, and bridge
IPC normally.
Use --no-trace when:
- You’re debugging the runner/bridge plumbing and don’t want extra moving parts
- The agent you’re spawning doesn’t honor
HTTPS_PROXYand the proxy just adds latency without capturing anything
Limitations (v1)
- HTTP/1.1 only. HTTP/2 agents (which negotiate
h2via ALPN) produce no activity events. Adding an HPACK-aware parser is a follow-up. In practice, the Anthropic SDK defaults to HTTP/1.1 for/v1/messages, so this is rarely hit. - Anthropic parser only — other LLM providers (OpenAI,
Gemini, Mistral) land as
opaque_httpentries. Adding parsers per provider is straightforward. - Uploader queue cap. The
ActivityUploadercaps its in-flight queue at 1000 events / 1 MB and evicts oldest-first under sustained broker unreachability. Events dropped here won’t appear in the UI. - Cert pinning — if an agent ships bundled cert pins for Anthropic’s real cert chain, our MITM leaf won’t match and the handshake will fail. Claude Code v2 does not currently pin; if that changes, we’d need to intercept at a different layer (e.g. LD_PRELOAD against libssl).