LLM Observability

LLM Observability is the ability to monitor, trace, and debug the performance of Large Language Models in production.

Key Metrics

Token Usage: Cost and efficiency tracking.
Latency: Time to first token and total response time.
Accuracy (Evals): Using LLMs or heuristic checks to grade the quality of the output.
Context Integrity: Measuring how much of the provided context was actually utilized or ignored (Context Window management).

SubAgent Cost Observability

When using Claude SubAgents, token observability becomes multi-dimensional: you must track both the parent orchestrator’s context and each subagent’s isolated context. Key metrics to monitor:

Parent Context Saturation: Use /context to track when the orchestrator is nearing delegation threshold (~70–80% window).
Per-Subagent Token Spend: Each subagent spawn has a fixed startup cost (system prompt + task prompt). Measure whether the isolation benefit outweighs the spawn overhead.
Summary Fidelity: Assess whether subagent summaries preserve enough precision for the parent to synthesize correctly.

Model-Tier Cost Routing

Custom subagents introduce a new observability dimension: model-tier tracking. When routing tasks to Haiku vs. Sonnet vs. Opus, track per-agent model tier to measure whether cost routing decisions are correct:

High-frequency exploration tasks running on Opus signal a misconfigured agent model: field
Complex reasoning tasks running on Haiku signal under-provisioning

MCP Context Bloat as Observability Concern

Each connected MCP server loads its tool descriptions into the context window at session start. This creates a new observability dimension: context pollution monitoring. Track the number of active MCP servers and estimate their context token contribution. Signs of MCP context bloat include:

Increased session startup token usage without corresponding tool usage
Model responses that ignore available tools (descriptions consumed context but tools went unused)
Degraded response quality correlated with the number of connected servers

Mitigation: Use /mcp to audit active servers. Remove inactive ones. Scope MCP servers to specific subagents rather than the parent session. See Claude + MCP Explained.

Rakesh's Brain

Explorer

LLM Observability

LLM Observability

Key Metrics

SubAgent Cost Observability

Model-Tier Cost Routing

MCP Context Bloat as Observability Concern

References

Table of Contents

Graph View

Latest Blog Posts

Backlinks