The Wrong Diagnosis Is Costing You More Than the Right One Would
The AI cost conversation in most engineering organizations follows a predictable arc. Spend grows. Someone notices. A review is called. Questions are asked: which teams are spending the most? Which features are expensive? What's driving the month-over-month increase? The answers are vague or absent. A cost reduction target is set. Engineering leaders guess at where to cut. The cuts are made imprecisely. Some teams that were already optimizing get squeezed. Some teams that were genuinely wasteful are untouched because nobody could prove it. Spend stabilizes temporarily, then grows again.
This cycle repeats because the underlying diagnosis is wrong. Every organization going through it thinks they have a cost problem. They don't. They have an attribution problem. And the distinction matters more than most leaders realize.
What Attribution Actually Means
Attribution is the ability to connect every dollar of AI spend to the team, application, feature, workflow, and — ideally — the business outcome it was generated in service of. It sounds like a reporting requirement. It's actually a management enabler.
Without attribution, AI cost management is politics. You set aggregate budgets, argue about who's responsible for overages, and make optimization decisions based on intuition. With attribution, AI cost management becomes engineering. You see where spend is concentrated, you measure cost per outcome, you identify where spend is growing faster than value, and you act on data rather than assumption.
The moment you have attribution, a large fraction of your cost problem disappears — not because spend goes down immediately, but because the distribution of spend becomes visible, and visible problems get fixed. Engineers optimize things they can measure. Finance can chargeback what they can attribute. Product managers will start asking about cost per feature completion when they can see the number. The organizational incentives realign once the data exists.
Why Cost Problems Are Misdiagnosed as Spend Problems
There's a structural reason this misdiagnosis is so common: the symptom of an attribution failure looks exactly like a cost problem. Your AI bill is large and growing. The natural response is to treat it as a spending issue — cut budgets, impose model restrictions, require approval workflows for new AI features. All of these are cost controls applied to a problem that is actually a visibility gap.
The tell that you're dealing with an attribution failure is the inability to answer basic questions about your own spend:
- Which three features are responsible for 60% of your AI cost this month?
- What is the cost per successful document processed in your extraction pipeline?
- Which team increased their AI spend most in the last 30 days, and why?
- What percentage of your AI spend is in production versus development and testing?
If you cannot answer these questions, you do not have a cost problem. You have an attribution problem that is presenting as a cost problem. Treating it as a cost problem — applying constraints before you have visibility — risks cutting the wrong things while leaving the actual inefficiencies intact.
The Attribution Stack: What You Need Technically
Building attribution for AI spend requires instrumentation at four levels. Most organizations have none of them. Some have one or two. The full stack is what gives you actionable data.
Level 1: Request-Level Tagging
Every API call to an AI provider should carry structured metadata: which team, which application, which feature, which environment (production/staging/development), and which model. This metadata is attached at the call site — in whatever library or service wrapper your application uses to make AI calls — and logged with the response metadata (tokens consumed, latency, model called, success/failure).
This is the foundation. Without it, nothing else is possible. The implementation effort is modest if you control your AI call pathway through a shared library or service. It is harder if teams are calling provider APIs directly with heterogeneous client implementations.
Level 2: Workflow-Level Aggregation
Individual request data is necessary but not sufficient. A single user action in your product might trigger a chain of three or five AI calls — a retrieval step, a reranking step, a generation step, a validation step. From a cost management perspective, the relevant unit is the workflow, not the individual call. Level 2 attribution groups related calls by workflow and reports at workflow granularity.
This is where most attribution efforts stall. It requires a concept of workflow identity that spans multiple API calls — a trace ID or session ID that links the calls together. If you're already using distributed tracing (OpenTelemetry, Datadog APM, or similar), AI call attribution can hook into the same trace infrastructure. If you're not, you need to introduce a lightweight context propagation mechanism.
Level 3: Cost-Per-Outcome Calculation
Cost-per-outcome is the metric that changes how product and engineering conversations happen. It answers: what does it cost to complete one of the primary value-generating actions in your product? One document summarized. One code review generated. One customer support query resolved. One legal clause extracted.
Calculating cost-per-outcome requires joining your AI cost data with your product event data. You need to know how many outcomes were produced in a given period and which AI calls contributed to each outcome. This is a data engineering problem, not just an instrumentation problem. It requires schema design, pipeline work, and a definition of what counts as a successful outcome — which is a product question, not a technology question.
The investment is worthwhile because cost-per-outcome is the one metric that product managers, finance leaders, and engineering leaders can all have a coherent conversation about. Cost per token means nothing to a product manager. Cost per successful customer query resolution means something to everyone.
Level 4: Real-Time Attribution and Alerting
The first three levels are primarily retrospective — they help you understand what happened. Level 4 makes attribution actionable in real time: alerts when a team's spend accelerates beyond expected trajectory, budget tracking that updates continuously rather than at month end, anomaly detection that flags unusual spend patterns before they compound into large overages.
Real-time attribution is particularly valuable for catching two categories of problem: runaway agentic workflows (a bug in an agent loop that causes it to make far more AI calls than intended) and unexpected traffic spikes (a feature launch driving more usage than estimated). Both of these can generate significant cost in hours, not weeks. Without real-time visibility, you discover the problem when the invoice arrives.
The Organizational Side of Attribution
The technical stack is the easier half of the attribution problem. The harder half is organizational: who owns it, who benefits from it, and how it changes existing workflows.
Ownership
Attribution infrastructure needs an owner. In organizations with platform engineering teams, this is natural territory for the AI platform function — the same team that manages shared model infrastructure, prompt engineering standards, and observability tooling. In organizations without a centralized platform function, attribution tends to fall between finance (who cares about the cost) and engineering (who controls the instrumentation), and ends up owned by neither.
The practical pattern that works: a small tiger team — two to three engineers — builds the attribution stack as a platform capability, then works with application teams to adopt it. The adoption work is mostly about integrating the shared client library and ensuring the tagging convention is followed. It's not a large engineering effort per team, but it requires coordination and some degree of mandate from engineering leadership.
Chargebacks and Incentives
Attribution data is most useful when it creates incentives. If team-level AI spend is attributed but doesn't affect anything — no budget accountability, no optimization targets, no engineering review — the data is interesting but not motivating. The organizational leverage comes from connecting attribution to accountability: teams see their cost, own their budget, and are expected to explain anomalies.
This requires buy-in from engineering leadership to treat AI cost as a team-level responsibility, not an organizational overhead. That is a cultural shift in most organizations, and it happens gradually rather than all at once. Start with transparency — make the data visible to teams without punitive consequences. Let teams get familiar with their own cost profile. Then introduce accountability structures once the data is trusted and the norms are established.
What Attribution Unlocks That Cost Controls Cannot
A cost control approach — model restrictions, spending caps, mandatory approval for new AI features — can reduce spend. Attribution does something different: it makes the spend that remains more efficient.
With attribution, engineering teams can identify the workflows in their applications where cost-per-outcome is highest relative to the value generated and prioritize optimization there. They can A/B test model choices — running a cheaper model on a subset of traffic and measuring whether quality degrades — with actual data rather than intuition. They can identify development and testing traffic that is running on production models and doesn't need to. They can find caching opportunities (stable inputs with redundant calls) that are invisible without per-call visibility.
None of this is possible if you're looking at an aggregate monthly invoice. All of it becomes routine once you have attribution at the right granularity.
The Starting Point
If your organization has no attribution today, the starting point is not a comprehensive platform build. It's a minimal instrumentation pass: identify the five to ten highest-cost AI workloads in your organization, add request-level tagging to each, and build a simple dashboard that shows weekly cost by workload. This takes one to two weeks for a competent platform engineer and immediately changes the quality of your cost conversations.
From that starting point, you expand coverage, add workflow-level aggregation, and work toward cost-per-outcome tracking. The full attribution stack is a multi-quarter journey. The first useful increment is a week away.
Oberhahn provides the attribution infrastructure — the tagging framework, the aggregation layer, the dashboards, and the alerting — that makes this journey shorter and less custom-built for every organization that goes through it. The pattern is the same for most AI-native and AI-enabled companies. The tooling doesn't need to be built from scratch each time.
Your AI cost problem is probably not what it looks like. Look at your attribution posture first. The cost problem usually takes care of itself once you can see it clearly.