The Default Is Broken, and Everyone Knows It

Most organizations that have deployed AI across multiple teams are running on the same basic infrastructure: one Anthropic API key, one OpenAI API key, possibly one Azure OpenAI resource. Every team, every application, every experiment points to the same credential. The monthly invoice arrives as a single number. Someone in finance asks where it came from. Nobody has a good answer.

This is not a niche configuration. It is the default. And it persists not because engineering leaders prefer it but because the migration cost of changing it has historically felt higher than the cost of living with the opacity. That calculus is changing as AI spend becomes a material line item — but the organizational habits have not caught up.

The conversation about shared API keys usually starts with the dollar amount. That's the wrong place to start. The dollar amount is a symptom. The actual problem is structural, and it operates at three distinct levels.

The Three Real Costs of Shared Keys

1. No Chargeback Capability

Finance organizations are increasingly asking engineering leaders to allocate AI costs by business unit, product line, or team. This is a reasonable request — AI cost is no longer a rounding error, and cost center accountability is a standard management practice.

With a shared API key, you cannot do this. You have one aggregate spend figure that you can at best split by headcount or estimate by proxy. Neither of those is accurate. A team of three running a document processing pipeline that calls GPT-4 on every upload is consuming more than a team of twenty using a lightweight classifier in a low-volume workflow. Headcount ratios tell you nothing useful.

The consequence is that cost pressure lands on the engineering organization as a whole. No team has a number they're responsible for. No team has an incentive to optimize because optimization savings don't flow back to them. The cost management conversation stalls because accountability is diffuse.

2. No Optimization Signal

Optimization requires measurement. If you want to know whether switching a workflow from GPT-4o to GPT-4o-mini saves money without degrading quality, you need to know the current cost of that workflow. Not the aggregate organizational spend — the specific cost of that specific workflow.

A shared API key gives you none of this. You know your total monthly spend. You do not know what any individual application, feature, or workflow is contributing to that total. Optimization decisions become guesswork dressed as engineering judgment. Teams that are actually wasteful — running expensive models on tasks that don't require them, generating verbose output with no length constraints, not leveraging caching — have no signal pointing them toward improvement.

The perverse outcome: your most technically sophisticated teams are likely already optimizing on intuition. Your least sophisticated teams, who have the most room to improve, have no data telling them to. The gap widens over time.

3. No Governance Posture

A shared API key means a shared rate limit, a shared spend cap, and a shared blast radius. If one team's feature launches and drives an unexpected spike in usage, every other team hitting the same key hits the same rate limit. Incident attribution is a post-hoc archaeology exercise. Identifying which application caused a spend spike requires log correlation across multiple systems, assuming those logs exist and are structured consistently.

More quietly: shared keys make security reviews uncomfortable. Who has access to the key? How is it rotated? Where is it stored? In practice, shared keys end up in environment variables, in CI/CD systems, in local developer configs, in Slack messages sent three years ago. The key-per-team model does not automatically solve this, but it scopes the blast radius. A compromised shared key exposes your entire AI spend to abuse. A compromised team-specific key exposes one team's budget.

The Three Alternatives

There is no single correct answer here. The right architecture depends on your organization's size, technical maturity, and cost management priorities. What follows is an honest assessment of each option.

Option 1: Key-Per-Team

The conceptually simplest alternative: each team gets its own API key. Spend is attributable by key. Rate limits are scoped. Governance is cleaner.

The problems are operational. Key management becomes an administrative burden at scale. Key rotation requires coordination. Developer access management requires a process. Budget caps require manual configuration on each key. And if a team is running three different applications, you still don't have application-level attribution — you have team-level attribution, which is progress but not the end state.

Key-per-team is the right starting point for organizations with a small number of teams and a pressing need to do something before a more sophisticated solution is in place. It is not a long-term architecture for organizations with more than a handful of AI-consuming teams.

Option 2: Proxy Layer

A proxy layer sits between your applications and the AI provider APIs. All traffic routes through the proxy, which injects attribution metadata, enforces policies, and logs structured records of every call. Teams still use credentials, but those credentials are issued by the proxy, not the provider directly.

The advantages are significant: attribution at whatever granularity you instrument for (team, application, feature, user, request type), policy enforcement (rate limits per team, spend caps, model allowlists), and a centralized audit log. The proxy can also add caching, fallback routing, and load balancing across provider accounts.

The disadvantages are engineering cost and operational burden. You are now running infrastructure that is in the critical path of every AI call. Availability matters. Latency matters. You need to maintain and scale it. For teams without dedicated platform engineering capacity, a self-hosted proxy is an expensive commitment relative to the problem it solves.

Option 3: Instrumentation Layer

An instrumentation layer is lighter than a proxy: rather than routing traffic, it wraps your AI client libraries to capture metadata and emit structured telemetry. Every AI call gets tagged with team, application, feature, and any other context your code passes through. The telemetry flows to a cost management system that aggregates, attributes, and surfaces dashboards.

The advantage over a proxy is that it doesn't sit in the critical path. The advantage over key-per-team is that attribution is fine-grained and programmable — you can attribute at feature level, not just team level.

The limitation is that instrumentation requires consistent implementation across all teams and applications. It works well in organizations with platform engineering functions that own shared libraries. It is harder to enforce in organizations where teams have full autonomy over their stack and aren't required to use centralized AI client wrappers.

Why Most Organizations Shouldn't Start With Key-Per-Team

The instinct to solve attribution with key-per-team is understandable. It's the path of least architectural resistance. It uses existing provider tooling without requiring new infrastructure.

The problem is that it solves for one level of attribution — team — and makes the next level of attribution — application, feature, workflow — harder to achieve, not easier. Once teams own their own keys, there's no centralized point to add instrumentation. Each team becomes responsible for its own cost visibility, which replicates the original problem at a different level.

Key-per-team also doesn't give you a governance posture. Budget caps require manual configuration on each key. Policy enforcement (model allowlists, PII filters, rate limit management) needs to be implemented by each team independently. The fragmentation cost is real.

The more durable starting point is to build toward an instrumentation or proxy architecture from the beginning, even at small scale. The incremental effort is lower than it looks when you frame it as a platform capability rather than a per-team migration exercise.

What Good Attribution Actually Requires

Regardless of which technical path you choose, attribution requires a data model. You need to decide what dimensions matter for your organization and instrument for them consistently. The standard dimensions are:

  • Team or cost center — the organizational unit responsible for the workload
  • Application or service — the product or system making the call
  • Feature or workflow — the specific capability within the application
  • Model — which model was called
  • Call type — completion, embedding, classification, etc.
  • Environment — production, staging, development

These dimensions feed dashboards that answer the questions finance and engineering leadership actually need: where is spend growing fastest, which teams are over or under their allocation, which features are driving disproportionate cost relative to their usage, and where are the highest-ROI optimization opportunities.

Without this data model in place, you can't answer those questions. And without answers to those questions, AI cost management is theater — conversations about the size of the problem with no ability to act on it.

The Transition Plan Nobody Wants to Write

The hardest part of moving off shared keys is not technical. It's organizational. It requires someone to own the migration, coordinate across teams, and enforce adoption of whatever new pattern you've chosen. In most engineering organizations, that means either platform engineering ownership or a CTO-level mandate that treats AI infrastructure as a first-class concern.

The migration path that works most reliably: implement centralized instrumentation in a shared AI client library, require teams to use that library for new AI integrations, and migrate existing integrations opportunistically during their next major version upgrade. This is a six-to-twelve month journey, not a weekend project. But it compounds. Each team that migrates reduces the shared-key surface area and adds to the organizational attribution coverage.

Oberhahn is built for organizations working through exactly this transition — providing the attribution infrastructure, cost dashboards, and policy enforcement layer that makes the shared-key problem tractable without requiring teams to build and maintain it themselves.

The moment you can see where your AI spend is going, the spend problem becomes manageable. The shared-key problem is not primarily a cost problem. It's a visibility problem. Solve the visibility problem first.