The Default State: One Bill, No Answers
Most organizations that have been building with the OpenAI API for more than a few months end up in the same place: a single API key, a monthly invoice showing total tokens consumed, and no meaningful breakdown of which team, product, or workflow drove what share of that cost.
This is not a failure of planning. It is the natural result of how teams adopt AI capabilities. An engineer integrates the API to test a feature. It works. The feature ships. Other engineers follow the same path using the same key. Six months later you have a non-trivial monthly bill and a finance team asking questions no one can answer.
The question of how to attribute OpenAI API costs by team is one of the most common operational problems organizations face as AI usage matures. The good news: it is solvable without a full re-architecture of your key structure. The bad news: the most obvious solution — one key per team — is also the one that creates the most problems downstream.
Why Key-Per-Team Is a Trap
The instinct to assign each team its own API key is understandable. It mirrors how many organizations manage cloud credentials and service accounts. It feels like a clean separation of concerns.
In practice, it creates a different set of problems than the ones it solves.
Operational Overhead
Every additional key is a credential that needs to be provisioned, rotated, audited, and revoked. In an organization with eight product teams, that is eight sets of secrets across every environment — development, staging, production. When an engineer leaves or a key needs to rotate, the blast radius multiplies. Key management at scale is not a trivial problem.
Cross-Team Workflows Break the Model
The key-per-team model assumes that AI calls map cleanly onto team boundaries. In practice, a single user-facing feature often touches multiple teams' code. A product team's recommendation engine might call a shared embeddings service maintained by the data platform team. Which key does that call use? Whose cost is it? The clean organizational lines you drew on the whiteboard dissolve against real application architecture.
Rate Limits Become Fragmented
OpenAI's rate limits apply per key. A team with a dedicated key has a smaller token-per-minute budget than the organization's aggregate allocation. This means teams with spiky usage patterns hit limits more often, leading to either failed requests or pressure to raise limits — which requires conversations with OpenAI support that could have been avoided by pooling capacity.
Cost Attribution Gets No More Granular
Even with per-team keys, you still have no visibility below the team level. You cannot see which feature drove cost, which user segment, which workflow, or which model version. You have solved one attribution problem and created three more.
The Three Serious Approaches
There are three approaches to team-level cost attribution that actually work. They are not mutually exclusive — the right architecture often combines elements of all three.
Approach 1: Header Tagging via the OpenAI API
OpenAI supports a user field on completions requests. While this field was designed for end-user tracking and abuse prevention, it can be repurposed to encode organizational metadata: team identifier, feature name, environment, or any other dimension you want to report on.
The limitation is that the user field is a single string. You can encode structured information — team:growth|feature:onboarding|env:prod — but parsing and aggregating that information requires downstream processing. OpenAI's dashboard does not parse it for you.
Header tagging is a lightweight approach that does not require changing your key structure. It works best when you have a small number of dimensions you care about and a data pipeline that can process the usage logs OpenAI exposes via their API.
Approach 2: Proxy-Based Instrumentation Layer
A more robust solution is to route all OpenAI traffic through an internal proxy. Every team makes their API calls to api.internal.yourcompany.com/openai rather than api.openai.com directly. The proxy passes the request through to OpenAI but intercepts the traffic to capture metadata, inject tags, log usage, and enforce policies.
| Capability | Direct API | Proxy Layer |
|---|---|---|
| Per-team cost attribution | No | Yes |
| Per-feature attribution | No | Yes |
| Policy enforcement (budget caps) | No | Yes |
| Multi-vendor routing | No | Yes |
| Latency impact | None | Low (5-20ms) |
| Infrastructure to maintain | None | Moderate |
The proxy approach gives you complete control over the attribution data model. You define the metadata schema, you own the logs, and you can build any reporting layer you want on top. The cost is operational: you are now responsible for keeping a critical piece of infrastructure available at the latency and reliability levels your AI features require.
Approach 3: SDK-Level Instrumentation
Rather than intercepting calls at the network layer, you can instrument the calls themselves — wrapping the OpenAI SDK in a thin layer that automatically tags requests with team, feature, and environment metadata derived from application context.
This approach works best in organizations where AI calls are already centralized through an internal library rather than made directly from feature code. If every team imports your @company/ai-client package rather than the raw OpenAI SDK, you have a single instrumentation point that can be updated centrally.
The advantage over a proxy is that context is available in the application layer — you know which user triggered the call, which feature, which experiment variant — in ways that a network proxy cannot see without extensive header passing. The disadvantage is that instrumentation is only as good as the metadata your engineers consistently provide, and consistency requires discipline or enforcement.
What Good Attribution Data Actually Looks Like
Before choosing an approach, it is worth being precise about what you are trying to produce. Attribution data that is actually useful for organizational decision-making has specific properties.
Dimensions You Need
- Team or cost center: Who owns this spend? This is the minimum viable attribution dimension.
- Feature or workflow: What product capability is this supporting? A team-level rollup masks the variation that matters — often 20% of features drive 80% of cost.
- Model: Which model version was called? GPT-4o and GPT-3.5-turbo have very different cost profiles. Knowing the model breakdown tells you where optimization opportunities exist.
- Environment: Production vs. development vs. testing. Development token consumption is often a larger fraction of the total than organizations expect.
- Prompt vs. completion tokens: The split between input and output tokens affects cost optimization strategy. High output-to-input ratios point to different interventions than high input-to-output ratios.
Dimensions That Are Often Missed
- Latency vs. cost tradeoffs: Some teams make expensive calls because they have not benchmarked whether a cheaper model would meet their accuracy threshold.
- Caching opportunity: Repeated prompts that could be served from cache but are not. This is invisible without request-level logging.
- Error rate and waste: Failed API calls still consume tokens on the prompt side. High error rates are a cost problem, not just a reliability problem.
Building the Reporting Layer
Capturing attribution metadata is only half the problem. The other half is making it usable. Raw API logs are not reports. Building the reporting layer typically involves:
Storage: Usage events need to land somewhere queryable. Common choices are a data warehouse (BigQuery, Snowflake, Redshift) or a time-series database. The choice affects the latency of reporting — real-time dashboards require a different architecture than monthly cost reviews.
Aggregation: Token counts need to be translated into costs, which requires keeping the model pricing schedule current. OpenAI changes pricing periodically, and historical data needs to be costed at the rates that were in effect at the time of the call.
Visualization: Finance teams need cost reports. Engineering teams need usage patterns. Platform teams need anomaly detection. These are different views of the same data, which usually means different dashboards rather than a single universal view.
Alerting: Attribution data is only operationally useful if someone is watching it. Budget thresholds, unusual spend spikes, and new high-cost features should trigger notifications rather than waiting for the monthly bill.
The Organizational Problem Under the Technical One
The technical solutions described above are tractable. The harder problem is organizational: who owns the instrumentation, who maintains it, and who is accountable for the costs it reveals?
Most AI platform teams can build a proxy or instrument the SDK. The challenge is that doing so well — with the reliability, latency, and data quality that makes the attribution actually useful — requires sustained investment. It competes with feature work. It requires cooperation from teams who are not always motivated to have their spend made visible.
Establishing clear ownership of AI cost attribution as a platform function, rather than an ad hoc project, is a prerequisite for the technical solutions to be worth building. Without ownership, attribution systems get built and then rot.
Where Platform Tooling Fits In
For organizations that want team-level attribution without building and maintaining the underlying infrastructure, platforms like Oberhahn provide the proxy layer, metadata schema, and reporting stack as a managed solution. The tradeoff is customizability versus time-to-value: a purpose-built platform ships the reporting layer immediately but constrains your data model to what the platform supports.
For most engineering organizations, the build-vs-buy decision comes down to whether AI spend management is a core competency you want to develop in-house or a solved problem you want to stop rebuilding. Neither answer is universally correct.
Starting From Where You Are
If you are starting from a single shared key with no attribution, the pragmatic first step is not to build a proxy or re-architect your key structure. It is to add the user field to your OpenAI calls with a team identifier, enable usage export from the OpenAI dashboard, and load that data into whatever analytics tool your organization already uses.
That gets you to team-level attribution in days, not quarters. It will not give you feature-level granularity or real-time alerting. But it answers the question finance is asking right now, and it gives you the baseline data you need to decide what to invest in next.
Perfect attribution is an iterative project. The goal is not to design the final state before you start — it is to get to the next level of visibility quickly enough that the data informs your decisions before the next renewal conversation.