A year ago most engineering teams used one AI coding tool. Today the average team uses two or three — often without realizing it. One developer prefers Claude Code for complex refactors, another runs GitHub Copilot in their IDE, and the platform team has been experimenting with Google Gemini Code Assist for infrastructure work.

The problem: costs are scattered across multiple billing dashboards, and no one has a combined picture of what the team is actually spending on AI.

The multi-tool reality

Each major AI coding tool has a different pricing model and a different billing surface:

Claude Code (Anthropic) bills by token. Input and output tokens are priced separately, and costs scale sharply if developers use context-heavy sessions or the Opus model. Usage is exposed via OpenTelemetry — the most transparent telemetry surface of any coding tool.

GitHub Copilot charges per seat (Individual: $10/month, Business: $19/month, Enterprise: $39/month). The seat model is predictable, but it obscures actual usage — you pay whether developers use it heavily or barely at all. GitHub does provide activity metrics in their Enterprise billing portal, but no cross-tool comparison.

OpenAI Codex CLI is OpenAI's open-source terminal-based coding agent, built on top of the standard OpenAI API. It bills against your organization's API usage, making it easy to correlate with your existing OpenAI spend — but hard to separate "Codex sessions" from other API calls unless you instrument explicitly.

Google Gemini Code Assist is seat-priced similar to Copilot, with individual and enterprise tiers. It plugs into standard Google Cloud billing, which means it shows up in your GCP cost reports alongside compute and storage — helpful for teams already using GCP, confusing for everyone else.

Why this matters for engineering managers

The hidden cost of the multi-tool reality is not just money — it's visibility. When AI spend is scattered across four dashboards, no single person has a complete picture. Finance sees top-line numbers; engineering leads see individual tool metrics; developers see nothing.

This creates three failure modes:

Budget attribution is impossible. You know you spent $8,000 on AI last month, but you can't say how much came from which tool, team, or project. That makes it impossible to have meaningful conversations about ROI or cost allocation.

Waste goes undetected. A seat-based tool like Copilot is wasted spend if developers aren't actually using it. A token-based tool like Claude Code can generate surprise bills from a handful of heavy sessions. Neither problem is visible without per-developer, per-tool tracking.

Optimization decisions are made blind. Should you switch some Claude Code users to Gemini? Is the Copilot enterprise tier worth the upgrade cost for your team size? You can't answer these questions without comparative data.

What unified tracking looks like

Good multi-tool cost tracking surfaces:

Total AI spend by tool — side-by-side, so you can see the relative weight of each
Per-developer breakdown — across all tools, not just one
Seat utilization for flat-rate tools — are you paying for seats that aren't being used?
Token efficiency for usage-based tools — are heavy users getting proportionally more value?
Trend over time — is total AI spend growing faster or slower than headcount?

The integration approach varies by tool. Claude Code emits telemetry natively via OpenTelemetry — set two environment variables and data starts flowing. Copilot and Gemini Code Assist expose seat and activity data through their respective admin APIs. Codex CLI, being built on the OpenAI API, is trackable via OpenAI's usage API with session tagging.

Thinking about ROI across tools

Not all AI coding spend is equal. The right question isn't "how much are we spending?" but "what are we getting per dollar?"

Seat-based tools (Copilot, Gemini) have a different ROI calculation than token-based tools (Claude Code, Codex). For seat-based tools, the marginal cost of an additional hour of usage is zero — so the question is whether the per-seat price is justified by usage rate and developer productivity. For token-based tools, the marginal cost is real — so you want to understand whether expensive sessions are high-value tasks or inefficient usage patterns.

A good cost management dashboard should help you see both dimensions: utilization for flat-rate tools, and cost-per-outcome for token-based tools.

Getting started

If your team is using multiple AI coding tools today without a unified view:

Audit what you're actually paying for. Pull up billing dashboards for each tool and find the total monthly cost. Add them up — most engineering managers are surprised by the total.
Find the heaviest users. Most tools have at least basic per-user or per-seat metrics. Identify your top 20% of users across all tools — they're driving 80% of costs.
Calculate seat utilization for flat-rate tools. Divide active users by paid seats. If it's below 70%, you're overpaying.
Instrument token-based tools properly. For Claude Code and Codex, make sure you're capturing per-session costs with user attribution. Raw API spend numbers aren't enough.

Tazmin is building exactly this unified view — connecting to your AI coding tools' telemetry and APIs to surface one dashboard with spend, utilization, and team-level breakdowns. Join the waitlist to get early access.