LLM Observability
LLM Observability is the ability to monitor, trace, and debug the performance of Large Language Models in production.
Key Metrics
- Token Usage: Cost and efficiency tracking.
- Latency: Time to first token and total response time.
- Accuracy (Evals): Using LLMs or heuristic checks to grade the quality of the output.
- Context Integrity: Measuring how much of the provided context was actually utilized or ignored (Context Window management).
SubAgent Cost Observability
When using Claude SubAgents, token observability becomes multi-dimensional: you must track both the parent orchestrator’s context and each subagent’s isolated context. Key metrics to monitor:
- Parent Context Saturation: Use
/contextto track when the orchestrator is nearing delegation threshold (~70–80% window). - Per-Subagent Token Spend: Each subagent spawn has a fixed startup cost (system prompt + task prompt). Measure whether the isolation benefit outweighs the spawn overhead.
- Summary Fidelity: Assess whether subagent summaries preserve enough precision for the parent to synthesize correctly.
Model-Tier Cost Routing
Custom subagents introduce a new observability dimension: model-tier tracking. When routing tasks to Haiku vs. Sonnet vs. Opus, track per-agent model tier to measure whether cost routing decisions are correct:
- High-frequency exploration tasks running on Opus signal a misconfigured agent
model:field - Complex reasoning tasks running on Haiku signal under-provisioning
MCP Context Bloat as Observability Concern
Each connected MCP server loads its tool descriptions into the context window at session start. This creates a new observability dimension: context pollution monitoring. Track the number of active MCP servers and estimate their context token contribution. Signs of MCP context bloat include:
- Increased session startup token usage without corresponding tool usage
- Model responses that ignore available tools (descriptions consumed context but tools went unused)
- Degraded response quality correlated with the number of connected servers
Mitigation: Use /mcp to audit active servers. Remove inactive ones. Scope MCP servers to specific subagents rather than the parent session. See Claude + MCP Explained.
Related: Context Engineering, Claude SubAgents