Deep Dive May 18, 2026 · 15 min read · Updated June 17, 2026

I cost Anthropic at least $1,700 a month. I pay 200. What happens when this subsidy ends in 2028, and how to hedge today.

I pay $200 a month for Claude. In API-equivalent value I pull roughly 43× that out of it (measured, not estimated). By 2028 the subsidy is gone. Here's the math, and a plan to keep your AI workflows from turning into a debt trap.

The AI industry is burning capital at a scale with few historical parallels. At the same time, token prices have dropped 280× in roughly 18 months[1]. Both are true. Sort them out, and you understand why the next three years stay historically cheap for AI users — and why caution comes after. Let's start with your own bill.

Key points

→Seven cents of revenue per dollar invested[△]: Big-4 are spending roughly $350B on AI-relevant in 2025 (datacenters, GPUs, networking)[6], while global AI-native revenue is just $25B.[5]
→Measured: for $200/month I pull ~$8,600 of API-equivalent value, factor ~43×[△]. Even best-case (top-tier margin assumed), it costs Anthropic at least ~$1,700/month to serve me, 8 to 9× my plan.
→Token prices have dropped 280× for GPT-3.5-class performance (Nov 2022 → Oct 2024)[1]. Since then the headline price has stayed flat — but 10 cents per million tokens buys you GPT-4-class output today, not GPT-3.5.
→Subsidy phase ends 2027–2030: Anthropic projects break-even 2027/28, OpenAI 2029/30[9].
→What to do: use the subsidy fully, build provider-flexible, keep self-hosting on the table, only call a model when it earns its keep — classical software stays cheaper and more stable.

01 · Your billWhat you pay, what you consume, what it would cost at API prices

Your real bill · measured

You pay $200 a month. In real API-equivalent value you pull ~$8,600 out of it. That's factor ~43×[△].

Not an estimate: my own numbers from 30 days of Claude Code (as of June 17, 2026), aggregated from the local session transcripts. "API-equivalent value" means what the same workload would have cost at Anthropic's official API list prices. I pay a flat rate, so this is the value I extract, not Anthropic's cost.

I pay Anthropic (flat rate) $200.00 / month

Consumption: new tokens, in + out (30 days) ~59M

the same tokens at Anthropic's API list price ~$8,600

Factor: API-equivalent ÷ what I pay ~43×

Source: my own token-usage dashboard, which parses ~/.claude/projects/** and prices it against API list rates. 82,504 API responses across May 8 to June 17, 2026. See the dashboard · methodology in the making-of.

And what does it cost Anthropic? · best case

Nobody knows exactly. But even in the best case it costs Anthropic at least ~$1,700[△] a month to serve me. That's 8 to 9× my $200.

Anthropic's real cost isn't public, and can't be measured from the outside, precisely BECAUSE I'm not charged API prices. So, best case: the API list price ($8,600) is revenue, not cost. How much of it is cost depends on gross margin. Assume the margin of the best software firms in the world (~80%), and ~$1,700 of real cost still sits inside. More realistically (50 to 70% margin), it's ~$2,600 to $4,300.

Best case: top margin ~80% assumed ~$1,700

More realistic: 50 to 70% margin ~$2,600–4,300

What I pay $200

If Anthropic charged me API prices, they'd have good margins. They don't: the flat rate IS the subsidy, their margin on me is negative. Reference gross margins: top software/SaaS ~80–90%, optimistic inference estimate (DeepSeek 2025) ~85%, Anthropic's estimated blended margin ~50%[12]. Math in the making-of.

This subsidy is finite. Anthropic internally projects cash-flow positive for 2027/2028 (per investor materials, not audited[9]). OpenAI for 2029/2030. So: two to four years of subsidy are reasonably safe — after that, providers have to start pricing for real.

A paradox at first glance: why does the $20 base tier stay stable when the subsidy ends? The answer is tier segmentation. Per-token inference cost drops about 10× per year, per Sequoia[5]. In two years, the real cost of GPT-3.5-class operations is a fraction of today's. The $20 list price can hold steady — with tighter token caps — and still turn profitable. Providers need the budget tier as top-of-funnel.

The heavy-use tier — where you land as a serious AI user — gets more expensive. Claude Pro has already tightened limits. ChatGPT Plus throttles o3 access. The "Max" and "Ultra" tiers at $100–$200 are the new normal. What costs $100 today either costs $100 in 2028 with dramatically fewer tokens included, or $200–$300 for the same depth of use.

If you want to know why the subsidy exists and why it has to end, look at the macro numbers next.

02 · ScaleBig 4 are burning $700B — against $50B in industry revenue

A single NVIDIA GPU runs $25,000 to $40,000[14]. The big buy them in hundreds of thousands. A modern AI datacenter with liquid cooling costs $20–30M per of installed power[15] — hardware on top. The ratio of investment to revenue is absurd. Here are the three hardest contrasts:

Big-4 AI CapEx 2025

$350B[6]

Microsoft, Alphabet, Meta, Amazon. AI-relevant share (datacenters, GPUs, networking) — from .

↔

AI-native inference revenue 2025

$25B[5]

Direct API and subscription revenue from OpenAI, Anthropic, etc. (Sequoia/Goldman definition[3], excluding hyperscaler cloud-AI services).

That's seven cents of revenue per dollar invested[△]. Even a doubling of revenue in 2026 doesn't change the ratio — because CapEx doubles too.

You pay per month

$200

Claude Max 20x, the heavy-use tier.

↔

Costs Anthropic (best case)

~$1,700+[△]

Even at top-tier software margin. Realistically more.

Even best-case it costs Anthropic about 8× your plan to serve you. At API list prices, measured, you pull ~43×[△]. You're in the golden phase of acquisition subsidy.

Alphabet Q1 2026 book gain

$28.7B[4]

From revaluing the Anthropic stake.

↔

Cash actually received

Pure mark-to-market — no real cash flow.

Nearly half of Alphabet's quarterly profit is a circular deal: Big Tech invests in Anthropic → valuation goes up → their own stakes get marked up.

03 · Token price collapseWhat you pay keeps going down

Here's the good news, and it's surprisingly good. Stanford HAI's AI Index 2025[1] documents a 280× price collapse for GPT-3.5-class performance in roughly 18 months — November 2022 to October 2024. Put differently: what cost you $20 per million at the end of 2022 costs 7 cents today. For the same performance.

One thing matters for the chart below: every value is in the same unit — US dollars per million input tokens, measured against the same performance bar (GPT-3.5 level on the benchmark). Otherwise you're comparing apples to oranges.

Chart 1 · Token price collapse

USD per million input tokens (Nov 2022 → May 2026)

GPT-3 Davinci November 2022 · frontier at the time

$20.00

baseline

GPT-3.5 Turbo March 2023

$2.00

10× cheaper

Gemini 1.5 Flash June 2024

$0.75

27× cheaper

Gemini 1.5 Flash 8B October 2024 · GPT-3.5 class

$0.07

280× cheaper

Gemini 2.0 Flash / GPT-4.1 Nano May 2026 · well above GPT-3.5

$0.10

≈ GPT-4 class

Sources: Stanford HAI AI Index 2025[1] (data points through Oct 2024), current provider price lists via pricepertoken.com and costgoat.com [13] (May 2026). Bar lengths log-scaled for readability (otherwise $0.07 would be invisible). MMLU benchmark as the performance baseline.

What's interesting is what's happened since the end of 2024: the price for raw GPT-3.5-class performance has hit a floor — roughly 5 to 10 cents per million tokens, and not much room left to fall. What keeps shifting dramatically: what you get for that price. Models like Gemini 2.0 Flash, GPT-4.1 Nano, or DeepSeek V3.2 run around 10 cents per million input tokens today[13] and deliver well above GPT-3.5-class output — roughly on par with original GPT-4 (which still cost $30 in May 2023). Performance per dollar has multiplied again, even though the headline token price stays flat.

What does performance cost today? Top-tier models like Claude Opus 4 or GPT-5 still run $3 to $15 per million input tokens[13]. But these models do reasoning, multi-hour coding sessions, and tool use — none of which was possible in 2023. The market is segmenting: mass usage turns commodity, frontier stays valuable.

04 · Bulls vs. bearsIs AI changing the world — or is it the next bubble?

I've collected the strongest arguments on both sides — only with numbers from primary sources. No speculative "might be".

↑ Bull case

AI is changing the world

Inference prices are collapsing — the end user wins directly.

280× price drop in ~18 months · Stanford HAI 2025[1]

Hardware and energy efficiency are improving faster than Moore's Law.

GPU cost per FLOP −30%/year, energy efficiency +40%/year · Stanford AI Index[1]

Power isn't the problem — hardware depreciation is, and it's shrinking.

Pure power cost < 5% of inference cost · Epoch AI 2025[7]

Hyperscalers fund this from profitable core business — no debt bubble.

$350B in AI CapEx from current cashflow · earnings calls 2025[6]

NVIDIA promises 10× cheaper token cost from H2 2026.

Jensen Huang, CES January 2026 (for MoE models, not yet shipping)[8]

↓ Bear case

AI is the next bubble

Seven cents of revenue per CapEx dollar — Goldman sees no GDP impact.

Goldman Sachs[3] / Sequoia (David Cahn) 2024/2025[5]

OpenAI reportedly spends ~$2 for every $1 of inference revenue, per leaked documents.

Investor materials, not official · Fortune / wheresyoured.at[9]

GPUs age out in 3–6 years — no "" like the old .

Amazon shortened GPU depreciation from 6 to 5 years in 2025[11]

Big Tech's book gains from Anthropic stakes are a circular deal.

Alphabet: $28.7B Anthropic book gain Q1 2026 · SEC filing[4]

Efficiency gets eaten by demand — .

IEA: datacenter power 415 (2024) → 945 TWh (2030 projected)[2]

05 · Investment paradoxSeven cents of revenue per dollar invested

Goldman Sachs put it well[3]: in 2025, roughly $350B flows into AI infrastructure (the Big 4 alone)[6], and AI-native inference revenue (OpenAI, Anthropic, etc., by the Sequoia/Goldman definition) sits at roughly $25B[5]. That's seven cents of revenue per dollar invested[△]. Profit is nowhere in sight — the providers are paying out of pocket. (The total AI market including hyperscaler cloud-AI services is much larger, but the ratio of CapEx to direct AI revenue stays structurally lopsided.)

Chart 2 · CapEx vs. Revenue

Investment grows much faster than AI revenue (Big 4 vs. industry revenue, 2023–2026)

2023

$150B
$5B

2024

$226B
$13B

2025

$350B
$25B

2026 (P)

$700B
$50B

AI-relevant CapEx Big 4 (Microsoft, Alphabet, Meta, Amazon)

Global AI revenue (industry-wide)

Sources: earnings calls Q1–Q4 2025 (Microsoft, Alphabet, Meta, Amazon)[6], Goldman Sachs AI Infrastructure Report[3], Sequoia Capital (David Cahn "AI's $600B Question")[5]. 2026 consensus forecasts. Scale normalized to max $700B = 100%. Note: values show the AI-relevant share of Big-4 CapEx (datacenters, GPUs, networking) — total CapEx in 2025 is around $400B.

Even if AI revenue doubles in 2026 (to $50B) and CapEx grows "only" to $700B — the ratio gets worse, not better. From an investor's view, that's a 1:14 ratio of revenue to infrastructure investment. From an AI user's view it means: you're using infrastructure worth a multiple of what all users combined are paying for it.

Dictionary-style definition card: ROT [rət], noun — Return on Tokens. The business value actually created per token spent. With the note: Prediction — within a year, this is on every company's KPI board. Signed Max Fraunhofer, mfraunhofer.de. — What matters to the user isn't the price per token — it's the value each token actually creates. I call it **Return on Tokens** — the metric almost nobody tracks yet.

06 · Dotcom or Telecom?Why the fiber bust is the more honest analogy

Most people compare this with Amazon or remember the of the late 1990s. Amazon doesn't fit — wrong order of magnitude: Amazon's cumulative losses up to its first real profitability in 2003 were around $3B. OpenAI burns that today in about four months.

The dotcom bubble gets us closer: hundreds of overvalued software companies, Pets.com, Webvan, Boo.com — almost all gone. But the structurally more accurate parallel is its lesser-known twin crisis that collapsed at the same time: the . While the dotcom bubble wiped out the software layer, telecom wiped out the infrastructure — and infrastructure is what's at stake with AI today. If you don't remember the telecom boom: WorldCom, Global Crossing, and Nortel put over $500B into fiber infrastructure and bonds because everyone believed internet data volume would "double every 100 days"[16]. It didn't. Result: roughly 90% price collapse for bandwidth, massive bankruptcies, WorldCom's July 2002 insolvency — the largest accounting fraud in US history at the time ($11B+ in inflated assets)[16].

Parallel story

Two infrastructure investment booms — three decades apart

Then · Telecom 1996–2001

Fiber overbuild for an internet that wasn't there yet

01 Over $500B invested.[16] WorldCom, Global Crossing, AT&T, Lucent — all financed with debt, based on growth forecasts that later proved too optimistic.
02 Bandwidth prices dropped ~90%. Overcapacity led to the price collapse. Only a fraction of the laid fiber was actually used.[16]
03 Bankruptcies and crash. WorldCom insolvency July 2002 — the largest accounting fraud in US history at the time[16]. Investors lost hundreds of billions.
04 Who won: the users and the app layer. Google bought dark fiber for pennies. YouTube was only possible because of cheap bandwidth. Streaming, cloud, social media — all built on the wreckage.

Now · AI infrastructure 2023–?

GPU overbuild for AI demand that isn't here yet?

01 $350B in AI CapEx in 2025 alone.[6] Big 4 finance from cashflow (unlike back then!), but the order of magnitude is a one-to-one match — built on the assumption of exponentially growing AI demand.
02 Token prices fell 99.7%. 280× price drop in ~18 months[1]. Same pattern — overcapacity meeting slower-growing real demand.
03 Risk: individual providers fail. OpenAI investor documents put losses at ~$9B per year[9], Anthropic has accumulated a low double-digit billion in losses over 5 years — both from pitch decks, not audited. When the subsidy ends, not everyone survives.
04 Who wins: probably the users and the app layer, again. Anyone building robust AI workflows today wins in every scenario — no matter who ends up buying the GPU wrecks.

Key difference: fiber lasts 20–40 years. GPUs age out in 3–6 years. If the AI bubble pops, there's no "dark compute" for Google to buy on the cheap — the assets lose technical residual value. That makes a potential crash sharper. But for you as a user it doesn't change the picture: the subsidy phase ends, then pricing power kicks in.

07 · Jevons paradoxEfficiency is near the limit — consumption explodes anyway

Every new GPU generation is more energy efficient, every new datacenter has a better (Power Usage Effectiveness — the ratio of total energy to compute energy). Lower is better: PUE 1.0 means no overhead loss, PUE 2.0 means half the energy goes to cooling and networking. Best-in-class hyperscalers (Google, Meta) sit at ~1.1 today[10] — which is where most AI workloads run. And yet power consumption triples by 2030[2].

Chart 3 · Jevons paradox

Efficiency maxing out — total consumption exploding

Global datacenter power (TWh / year)

Consumption triples by 2030

+372%

2020

200

2022

260

2024

415

2025 (E)

485

2030 (P)

945

Efficiency: headroom nearly gone

PUE sits at ~1.1 today (best-in-class hyperscalers) — theoretical optimum is 1.0

only ~7% left

PUE 2.5 in 2007 (150% overhead) → PUE 1.1 in 2025 (10% overhead, hyperscaler-class)[10]: the datacenter industry has captured ~93% of the possible efficiency gain. Industry average sits at 1.56, well above that — but AI frontier workloads run mostly on the most modern sites.

~93% captured

~7%

Start 2007 (PUE 2.5) Today (PUE ~1.1) Optimum (PUE 1.0)

Meaning: there's not much left on the datacenter efficiency side. Future savings have to come from better chips (NVIDIA Rubin) and algorithmic efficiency — not datacenter layout.

Sources: IEA "Energy and AI" 2025 (base-case 2030 projection)[2], Google / Microsoft / AWS own sustainability reports 2024 (fleet-wide PUE ~1.08–1.15). Industry-average PUE 1.56 per Uptime Institute Datacenter Survey 2024[10] — the 1.1 value refers to best-in-class hyperscalers, where AI frontier workloads actually run. PUE = Power Usage Effectiveness; 1.0 would be theoretically perfect (compute = 100% of total energy).

Per-token efficiency doubles — total consumption triples. That's Jevons paradox in real time: when something gets cheaper, we use so much more of it that total consumption rises anyway.

08 · Two truthsWhat we know for sure — and what's estimate

Confirmed (from primary sources and SEC filings): Big-4 CapEx is real and comes from cashflow, not debt[6]. Inference prices are collapsing[1]. Energy efficiency is maxed out[10]. Total consumption doubles by 2030[2]. Alphabet's Q1 2026 included a $28.7B book gain from the Anthropic stake[4] — almost half of the quarterly profit with not a single dollar of cash inflow. And measured very concretely for myself: 30 days of Claude Code is ~59M new tokens, which at Anthropic's API list prices would have cost ~$8,600[△]. That's not a model, those are my transcript aggregates.

Estimate / not officially confirmed: OpenAI's internal cost-to-revenue ratios ("$2 spent per $1 of inference revenue", $9B loss against $13B revenue) come from leaked investor documents cited by Fortune, The Information, and wheresyoured.at[9] — not from audited statements. Anthropic's of ~50% and of ~$211/month are third-party analyses (SaaStr subscription-mix analysis[12], Sequoia token economics[5]). What it costs Anthropic to serve me can't be quantified from the outside, because I'm not charged API prices, their margin on me is negative. As a best case I assume the top margin of the best software firms (~80%): even then, at least ~$1,700 of cost[△] sits inside $8,600 of API-equivalent value, realistically (50 to 70% margin) ~$2,600 to $4,300. Profitability forecasts (Anthropic 2027/28, OpenAI 2029/30) are internal models, not audited statements[9]. Read these numbers in the article as order-of-magnitude reference points, not as verified balance-sheet items.

The most likely resolution: we're living through two truths at the same time. On one side, an investor bubble at the big cloud providers pouring hundreds of billions into infrastructure that may never pay off. On the other side, a historic bargain for anyone using AI: out of a $200 plan I pull a measured ~$8,600 of API-equivalent value (factor ~43×), and even in the best case it costs Anthropic a multiple of what I pay to serve me. Both are real. Both can be true simultaneously.

The telecom boom proved this: investors lost, the equipment makers won (NVIDIA is the new Cisco), users won. The internet, YouTube, Spotify — all possible because of the cheap bandwidth that was left behind. With AI it'll be similar. Only the subsidy phase is finite — and in this phase you should build your workflows so you're not stuck when pricing power kicks in.

09 · If you build somethingNot everything needs AI — and that becomes important soon

Here's where the article gets practical. If you build your own tools, write software, or want to put a business model on top of AI — the subsidy phase dramatically changes the rules after 2028. Anyone building a SaaS today that makes an call per user click has a scaling problem in two years. Classical deterministic software scales cheap: a written-once if-then block costs a few cents per million calls. An LLM call costs hundreds to thousands of dollars per million.

The rule of thumb: use AI where it actually earns its keep — creative, contextual, linguistic, generative. Not where deterministic software does the work cheaper and more reliably. A simple calculation, a database query, a validator check, a routing decision — all of that is classical programming. Replacing it with an LLM makes it 1,000× more expensive and less reliable. It works today because the LLM call is subsidized. In three years it doesn't.

Anyone shipping software should ask the following before every AI call:

→Can you solve this with classical logic? Then do that.
→Do you actually need frontier intelligence, or is a small, cheap model enough?
→Can the result be cached, so the same call doesn't run twelve times?
→How much would your business model lose if the token price tripled tomorrow?

For solo developers: build your personal tools so you can swap them out without breaking your business model. If your workflow says "absolutely needs Claude Opus at every step", you're exposed.

For companies: cost of inference has to be a hard metric on every AI project, not just a footnote. Ask on every project: what does it cost us per user, per month, at 10× scale? Which components are non-negotiably AI — and which are better built classically? Any business model built on 100,000 users generating millions of LLM calls per month has an existential problem when kicks in. Software that uses AI intelligently and sparingly wins.

10 · How to hedgeWhat you should do concretely now

The honest recommendation isn't "don't worry" and it isn't "stop paying". It's: use the subsidy phase fully, but build your workflows so you're not exposed when pricing power arrives. Concretely, six things:

Hedge plan · what to do now

Use the subsidy. Avoid the lock-in.

01 Use it fully now. The price-to-performance ratio is historically rare. Build workflows, automate processes, get as much as you can. Wait and you leave the subsidy on the table.
02 Build classical where possible. A calculation, a database lookup, a validator, a routing decision — a few cents per million calls. Same job via LLM: hundreds to thousands of dollars. Use AI where language, context, or generation are needed — not as a universal hammer. The hammer approach works today because the LLM call is subsidized. In three years it doesn't.
03 Build provider-flexible. Write your tools so you can flip from Anthropic to OpenAI or Google with one switch. Tools like or make this practically trivial. For companies: parallel contracts with two providers, never bet 100% on one. How I did it myself: setting up cloud with Antigravity.
04 Keep your data local. Prompts, memory, workflows, custom knowledge bases — whatever lives inside a provider system isn't yours. Regular exports. For sensitive company data, add storage and clear contractual terms on data egress.
05 Keep self-hosting on the table. -class models run on a Mac Studio today ( is surprisingly good for local LLMs). For companies: a dedicated GPU server pays off above a certain volume — and makes you independent of provider pricing. Build plan B before you need it.
06 Cheap models for bulk work. Frontier model only where it needs actual reasoning. For classification, extraction, simple answers, a 10–20× cheaper model is enough. That saves over 80% and decouples you from price shocks on the top tier. If you work with Claude Code: a token statusbar shows live which workflow eats the most tokens. The 30-day aggregates behind the bill at the top come from the same mechanism, here is my actual dashboard.

Rule of thumb: anyone who builds routines that don't work without AI during the subsidy phase will be exposed in the pricing-power phase. Anyone who builds routines that migrate to a different model in minutes wins today and keeps options tomorrow.

Bottom line

AI infrastructure investment is the most expensive bet in history — and a meaningful share of it is rational. Token prices are dropping for real. Efficiency is at its limit. Demand grows faster than both combined.

The risk sits with investors and hyperscalers, not users — at least for now. The historical precedent (the telecom boom) shows: the infrastructure gets built anyway, the prices collapse anyway, and the biggest winners are the ones who use the cheap infrastructure most effectively — not the ones who held WorldCom stock.

What you need today isn't skepticism, and it isn't naive enthusiasm. You need a plan to use the subsidy phase without being exposed in the next one. Take the plan above literally — or as a starting point for your own.

Behind the scenes

How this article was made

This text didn't come out of a single sitting at the desk. It came out of a pipeline of AI agents running in parallel, with me as the editor and decision-maker in the loop. The whole flow is stored as a reusable skill (research-article) in my Claude Code setup and runs in 9 phases.

The 9 phases — orchestrated in one chat

01 · PLAN Topic brief What question should the article answer? Required sources, audience, hook.

02 · RESEARCH 4 agents in parallel Sonnet agents each research one angle: datacenter costs, unit economics, efficiency trends, historical parallels.

03 · FACT-CHECK 35 claims Dedicated agent verifies each number against primary sources. Tags ✅ / ⚠️ / ❌.

04 · SYNTHESIS Opus distills Pro/con arguments, personal calculation, 3 chart ideas from the 4 research files.

05 · BUILD HTML from brand template Contrast cards, CSS charts, bull/bear box, 2-column story — all in the mfraunhofer.de layout.

06 · FACT-CHECK 2 Math & consistency Second agent checks the finished article — found 3 math errors and 2 mis-copied numbers.

07 · UI/SEO/GEO Review agent Layout consistency with other site pages, mobile test, Schema.org, optimization for AI crawlers.

08 · COPY-EDIT Language pass 22 language findings, 7 must-fix — comma placement, idiom, native-vs-translated phrasing.

09 · DEPLOY Verify & ship Playwright screenshots, side-by-side against existing pages, then Netlify deploy via script.

Between phases I sit in the chat: reframe the hook, rework charts, swap tables, add anti-patterns. Whatever should run differently next time goes straight into the skill — the workflow gets better with every run. Total roughly three hours for this article, much of it parallel agent work in the background. Next research article: realistically 60 to 90 minutes.

Methodology · what's behind the △ anchor classes. A few numbers in the article aren't direct source quotes but my own plausibility calculations from publicly known inputs. To check them yourself:

"~$8,600 API-equivalent value, factor ~43×" — measured, not estimated: my own token-usage dashboard aggregates all Claude Code session transcripts (~/.claude/projects/**), May 8 to June 17, 2026, 82,504 API responses. Last 30 days: ~59M new tokens (in+out) plus ~9.4B cache reads (valued at 0.1×). At Anthropic's official API list prices that comes to ~$8,600. Divided by the $200 plan = factor ~43. "API-equivalent" = what the same workload would have cost on the API, i.e. the value I extract from the flat rate, NOT Anthropic's cost of production. Dashboard.
"Best-case cost to Anthropic ~$1,700/month" — different number, different side, and deliberately the cheapest case for Anthropic. The API list price ($8,600) is revenue, not cost. How much of it is cost depends on gross margin. Top software/SaaS run ~80–90%, the most optimistic inference estimate (DeepSeek disclosure 2025) ~85%. At 80% margin: $8,600 × 0.20 ≈ $1,700 cost. More realistically 50 to 70% (Anthropic's estimated blended margin is ~50%, SaaStr/Sequoia[12][5]): ~$2,600 to $4,300. Cleanly bounded below, the exact figure only Anthropic knows, because they don't charge flat-rate customers API prices, their margin on such customers is negative. That's the cost side, not the API-equivalent above.
"Seven cents of revenue per dollar invested" — AI-native inference revenue 2025 (~$25B[5]) ÷ Big-4 AI CapEx 2025 (~$350B[6]) ≈ 0.071 · matches the ratio Sequoia and Goldman discuss in their reports.
"280× price drop" — Stanford HAI data point[1]: $20/M tokens (Nov 2022) ÷ $0.07/M tokens (Oct 2024, Gemini 1.5 Flash 8B) ≈ 286 · rounded to "280×" as a clean round number.

Sources for the underlying values: Stanford HAI[1], Epoch AI[7], Sequoia/Cahn[5], Big-4 earnings[6].

Sources

Convention in the article: hard numbers, quotes, and external model outputs are marked inline with a superscript source number (e.g. [1]). Click jumps directly into this list. Three confidence levels, visually distinct:

[N] hard primary source (earnings, SEC filing, official report) [N°] analyst estimate, industry estimate, or model output [△] own plausibility calculation — methodology in the making-of

Where a number appears without a footnote, it's a plausibility synthesis across multiple sources listed here, or it links to a bundle entry (e.g. the OpenAI investor materials[9] as a bundled source for the cost-to-revenue numbers). Goal: every checkable claim should be traceable without the reader having to do their own research.

01Stanford HAI — AI Index Report 2025 · 280× token price drop for GPT-3.5-class performance: $20 → $0.07 per million input tokens, Nov 2022 → Oct 2024 (Gemini 1.5 Flash 8B) · MMLU benchmark as comparison basis · core-numbers summary: 10-charts overview
02IEA — "Energy and AI" Executive Summary (April 2025) · base case: 415 TWh (2024, ~1.5% of global power consumption) → 945 TWh by 2030 · doubling as middle-path scenario
03Goldman Sachs — "Gen AI: Too Much Spend, Too Little Benefit?" (June 2024) · core skeptic report on AI CapEx vs revenue · plus April 2026 update: Goldman April 2026 — datacenter power +220% by 2030 (via Benzinga, original report subscription-only)
04Alphabet — Q1 2026 Earnings Release (PDF, 04/30/2026) · $28.7B net effect from equity-securities gain (Anthropic revaluation as the dominant block) · ~48% of quarterly net profit from mark-to-market, no cash inflow · secondary: Fortune analysis
05Sequoia Capital / David Cahn — "AI's $600B Question" (2024, update 2025) · revenue-gap analysis · Cahn's main thesis: per-token inference cost drops ~10× per year, but the CapEx-revenue gap widens
06Microsoft / Alphabet / Meta / Amazon earnings calls Q4 2024 + Q1 + Q4 2025 · cumulative ~$350B AI-relevant CapEx consensus as a mid-2025 assumption; actuals reached closer to $400B+ by year-end · financed from operating cashflow, not debt · overview: Visual Capitalist Big Tech CapEx 2022–2025
07Epoch AI — "How persistent is the inference cost burden?" (2025) · inference cost structure: GPU hardware depreciation dominates, power secondary · 5–10× cost reduction per year from hardware + algorithmic improvements
08NVIDIA — Rubin Investor Press Release (CES, 01/05/2026) · "up to 10× reduction in inference token cost" vs. Blackwell · benchmark specific to MoE models (Kimi-K2-Thinking), dense-transformer gain 2–3× · GTC 2026 confirmation: CNBC GTC keynote coverage
09Fortune — leaked OpenAI financial documents (11/12/2025) · $13B revenue, $22B spending = $9B net loss 2025 · "~$1.69 spent per $1 of inference revenue" · profitability forecast 2030 · plus Ed Zitron / wheresyoured.at — cross-check via Microsoft disclosures
10Uptime Institute — Global Data Center Survey 2024 · industry-average PUE stable at 1.56 since 2020 · historical series (Uptime blog): 2.5 (2007) → 1.98 (2011) → 1.65 (2014) → 1.58 (2018) → 1.56 (2024) · hyperscaler best-in-class at ~1.1 (Google/Meta sustainability reports)
11Amazon — Q4 2024 Earnings Release (SEC EDGAR) · useful-life study leads to GPU depreciation shortened from 6 to 5 years effective 01/01/2025 · ~$0.7B operating-income impact 2025 · signal: faster AI hardware value decay (reversal of the 2024 extension from 5 to 6 years)
12SaaStr — Anthropic ARR analysis (February 2026) · Anthropic monetizes at roughly $211 per monthly user (vs. OpenAI ~$25 per weekly user) · SaaStr calculation without openly disclosed methodology · $14B ARR as of Feb 2026, later grew to $30B ARR by April 2026 (VentureBeat confirmation)
13pricepertoken.com and costgoat.com · aggregated LLM API price lists May 2026 · Gemini 2.0 Flash + GPT-4.1 Nano confirmed at $0.10 per million input tokens · DeepSeek V3.2 now at $0.435 per million (above the "~$0.10" bucket)
14IntuitionLabs — NVIDIA AI GPU Pricing Guide (2024–2026) · H100 80GB PCIe stable at $25,000–$30,000, SXM variant at $35,000–$40,000 · NVIDIA doesn't publish an official list; values from OEM/integrator channels · secondary data: Hashrate Index secondary market tracker
15JLL — 2026 Data Center Outlook · AI-optimized datacenters with liquid cooling: up to $30M per MW (tenant fit-out up to $25M per MW) · standard hyperscale build significantly lower ($10.7–$11.3M per MW) · secondary report: Datacenter Dynamics on the $3T investment supercycle thesis
16WorldCom scandal (Wikipedia + SEC reference) · $11B+ inflated assets, largest accounting fraud in US history (2002) · plus The Bubble Bubble — telecom-bubble analysis · $500B+ in telecom bonds 1996–2001 in the US · plus Richmond Fed Economic Quarterly (Fall 2003) for the economic post-mortem of the telecom boom