More

nbstme · 2025-10-08T16:00:49 1759939249

The best AI code is the code you delete. Models are eating abstractions faster than teams can adapt. Each model introduces a new paradigm shift. If you miss a paradigm shift, you’re dead.

nbstme · 2025-10-03T16:39:51 1759509591

After 30 years of clicking, scrolling, and optimizing pixels, websites are becoming obsolete. LLM agents will read and act for us, ending search engines, blue links, and traditional websites.

nbstme · 2025-10-02T01:11:15 1759367475

Exactly. Just a markdown file per user. Anthropic recommends that.

nbstme · 2025-10-02T01:10:37 1759367437

HAHAHA. Ok let's call it "transformation." As i wrote "The next decade of AI search will belong to systems that read and reason end-to-end. Retrieval isn’t dead—it’s just been demoted."

nbstme · 2025-10-02T01:09:35 1759367375

Thanks! Sorry if the flow was off

nbstme · 2025-10-02T01:08:50 1759367330

haha so true!

nbstme · 2025-10-02T01:08:27 1759367307

Why call it an ad? It’s not even on the company site. I only mentioned my company upfront so people get context (why we had to build a complex RAG pipeline, what kinds of documents we’re working with, and why the examples come from real production use cases).

dymk · 2025-10-02T01:11:58 1759367518

It stands out because the flow and tone was clearly AI generated. It’s fluff, and I don’t trust it was written by a human who wasn’t hallucinating the non-company related talking points.

nbstme · 2025-10-02T01:06:00 1759367160

But don’t you think LLM pricing is heading toward zero? It seems to halve every six months. And on privacy, you can hope model providers won’t train on your data, (but there’s no guarantee)

queenkjuul · 2025-10-02T02:56:17 1759373777

I don't see how it can trend to zero when none of the vendors are profitable. Uber and doordash et. al. increased in price over time. The era of "free" LLM usage can't be permanent

dangoodmanUT · 2025-10-02T04:46:41 1759380401

Google’s inference is profitable

jgalt212 · 2025-10-02T12:28:48 1759408128

Not on the SERP page. The zero click Internet is bad for content producers and for those who sell ads (Google).

imiric · 2025-10-02T05:26:45 1759382805

Oh, it's going to be "free" alright, in the same way that most web services are today. I.e., you will pay for it with your data and attention.

The only difference is that the advertising will be much more insidious and manipulative, the data collection far easier since people are already willingly giving it up, and the business much more profitable.

I can hardly wait.

nbstme · 2025-10-02T01:03:25 1759367005

Why does grep in a loop fall apart? It’s expensive, sure, but LLM costs are trending toward zero. With Sonnet 4.5, we’ve seen models get better at parallelization and memory management (compacting conversations and highlighting findings).

jgalt212 · 2025-10-02T12:27:07 1759408027

If LLM costs are trending towards zero, please explain the $600B openai when Oracle and the $100B deal with Nvidia.

And if you think those deals are bogus, like I do, you still need to explain surging electricity prices.

adrianbooth17 · 2025-10-02T07:30:40 1759390240

"LLM costs are trending toward zero". They will never be zero for the cutting edge. One could argue that costs are zero now via local models but enterprises will always want the cutting edge which is likely to come with a cost

flyinglizard · 2025-10-02T15:54:37 1759420477

They're not trending toward zero; they're just aggressively subsidized with oil money.

nbstme · 2025-10-02T01:00:33 1759366833

LLMs > rerankers. Yes! I don't like rerankers. They are slow, the context window is small (4096 tokens), it's expensive... It's better when the LLM reads the whole file versus some top_chunks.

janalsncm · 2025-10-02T07:22:46 1759389766

Rerankers are orders of magnitude faster and cheaper than LLMs. Typical latency out of the box on a decent sized cross encoder (~4B) will be under 50ms on cheap gpus like an A10G. You won’t be able to run a fancy LLM on that hardware and without tuning you’re looking at hundreds of ms minimum.

More importantly, it’s a lot easier to fine tune a reranker on behavior data than an LLM that makes dozens of irrelevant queries.

CuriouslyC · 2025-10-02T11:02:39 1759402959

This is worth emphasizing. At scale, and when you have the resources to really screw around with them to tune your pipeline, rerankers aren't bad, they're just much worse/harder to use out of the box. LLMs buy you easy robustness, baseline quality and capabilities in exchange for cost and latency, which is a good tradeoff until you have strong PMF and you're trying to increase margins.

deepsquirrelnet · 2025-10-02T13:40:01 1759412401

More than that, adding longer context isn’t free either in time or money. So filling an LLM context with k=100 documents of mixed relevance may be slower than reranking and filling with k=10 of high relevance.

Of course, the devil is in the details and there’s five dozen reasons why you might choose one approach over the other. But it is not clear that using a reranker is always slower.