I can't read your hyperbolically titled paywalled medium post, so idk if it has ...

ragingregard · 2025-10-31T21:25:05 1761945905

The above article is not convincing at all.

Nothing on infra costs, hardware throughput + capacity (accounting for hidden tokens) & depreciation, just a blind faith that pricing by providers "covers all costs and more". Naive estimate of 1000 tokens per search using some simplistic queries, exactly the kind of usage you don't need or want an LLM for. LLMs excel in complex queries with complex and long output. Doesn't account at all for chain-of-thought (hidden tokens) that count as output tokens by the providers but are not present in the output (surprise).

Completely skips the fact the vast majority of paid LLM users use fixed subscription pricing precisely because the API pay-per-use version would be multiples more expensive and therefore not economical.

Moving on.

logicprog · 2025-11-07T12:57:56 1762520276

> Nothing on infra costs, hardware throughput + capacity (accounting for hidden tokens) & depreciation

That's because it's coming at things from the other end: since we can't be sure exactly what companies are doing, we're just going to look at the actual market incentives and pricing available and try to work backwards from there. And to be fair, it also cites, for instance, deepseek's paper where they talk about what their power foot margins are on inference.

> just a blind faith that pricing by providers "covers all costs and more".

It's not blind faith. I think they make a really good argument for why the pricing by providers almost certainly does cover all the costs and more. Again, including citing white papers by some of those providers.

> Naive estimate of 1000 tokens per search using some simplistic queries, exactly the kind of usage you don't need or want an LLM for.

Those token estimates were for comparing to search pricing to establish whether — relative to other things on the market — LLMs were expensive, so obviously they wanted to choose something where the domain is similar to search. That wasn't for determining whether inference was profitable or not in itself, and has absolutely no bearing on that.

> Doesn't account at all for chain-of-thought (hidden tokens) that count as output tokens by the providers but are not present in the output (surprise).

Most open-source providers provide thinking tokens in the output. Just separated by some tokens so that UI and agent software can separate it out if they want to. I believe the number of thinking tokens that Claude and GPT-5 use can be known as well: https://www.augmentcode.com/blog/developers-are-choosing-old... typically, chain of thought tokens are also factored into API pricing in terms of what tokens you're charged for. So I have no idea what this point is supposed to mean.

> Completely skips the fact the vast majority of paid LLM users use fixed subscription pricing precisely because the API pay-per-use version would be multiples more expensive and therefore not economical.

That doesn't mean that selling inference by subscription isn't profitable either! This is a common misunderstanding of how subscriptions work. With these AI inference subscriptions, your usage is capped to ensure that the company doesn't lose too much money on you. And then the goal is with the subscriptions that most people who have a subscription will end up on average using less inference than they paid for in order to pay for those who use more so that it will equal out. And that's assuming that the upper limit on the subscription usage is actually more costly than the subscription being paid itself, and that's a pretty big assumption.

If you want something that factors in subscriptions and also does the sort of first principles analysis you want, this is a good article:

https://martinalderson.com/posts/are-openai-and-anthropic-re...

And in my opinion it seems pretty clear that basically everyone who does any kind of analysis whether black box or first principles on this comes to the conclusion that you can very easily make money on inference. The only people coming to any other conclusion are those that just look at the finances of U.S. AI companies and draw conclusions on that without doing any kind of more detailed breakdown — exactly like the article you linked me, which now I have finally been able to read, thanks to someone posting the archive link, which isn't actually making any kind of case about the subscription or unit economics of token inference whatsoever, but is instead just basing its case on the massive overinvestment of specifically open AI into gigantic hyperscale data centers, which is unrelated to the specific economics of AI itself.