Engineering teams are adopting Claude Code fast. The productivity gains are real — but so are the bills. Without visibility into who is spending what, a single heavy user can blow past a monthly budget before anyone notices.

This post walks through how token-based billing works, what cost signals matter most, and how to build a monitoring system that keeps engineering leads in control.

How Claude Code billing works

Claude Code charges by token — roughly, a chunk of text (about 4 characters). Every message you send and every response Claude generates consumes tokens. The cost varies by model:

claude-sonnet-4-6 — the workhorse model most teams run day-to-day
claude-opus-4-8 — more capable, roughly 5× the cost per token
claude-haiku-4-5 — cheapest, best for lighter tasks

A developer doing deep refactoring or debugging complex bugs will use far more tokens per session than someone asking quick questions. This variance is what makes per-developer tracking essential.

What to track

Token counts alone aren't enough. Useful cost monitoring captures:

Input vs. output tokens separately. Input tokens (your prompts + context) and output tokens (Claude's responses) have different per-unit costs. A team that pastes large files into context will skew heavily input-side.

Model mix. If your team has access to Opus, you need to know which developers are using it and how often. One engineer defaulting to Opus costs the same as five engineers on Sonnet.

Session cadence. A developer running 50 short sessions is different from one running 5 long sessions — even at the same total token count. Long sessions indicate complex, multi-step tasks; short sessions often indicate quick lookups.

Project or repository attribution. Knowing that the payments service costs 3× more in AI than the frontend tells you something about code complexity — and helps you allocate budget fairly across teams.

The hidden cost: context window bloat

The most common source of unexpectedly high bills is context window bloat. Claude Code keeps conversation history in memory during a session. If a developer opens a long session and pastes large files repeatedly, the context grows — and every new message re-sends that entire context.

The practical fix: encourage short, focused sessions. Task-based workflows (one Claude session per discrete task) keep context windows lean and costs predictable.

Setting budget alerts

The right alert thresholds depend on team size and usage patterns. A rough baseline:

Team size	Monthly alert threshold
1–5 devs	$50 per developer
5–20 devs	$30 per developer
20+ devs	$20 per developer (with outlier alerts)

The outlier alert matters more than the aggregate. A team of 20 averaging $20/dev might have two developers at $80 and 18 at $10. The aggregate looks fine; the outliers need attention.

What most teams get wrong

Tracking too late. Checking spend monthly means you're reacting, not managing. Weekly or daily visibility catches runaway usage before it compounds.

No per-developer breakdown. Team-level totals hide the distribution. You need to know who the heavy users are — not to punish them, but to understand if they're doing high-value work or burning tokens inefficiently.

Ignoring the model dimension. If you're not tracking which model each session used, you're missing the most important cost lever you have.

What Tazmin does

Tazmin collects Claude Code telemetry via OpenTelemetry and surfaces it as a real-time dashboard. You get per-developer token usage, model-level cost breakdowns, session trends, and configurable budget alerts via Slack or email — without touching any prompt content or source code.

If your team is running Claude Code today without visibility into spend, join the waitlist and we'll get you set up.