The best AI code is the code you delete. Models are eating abstractions faster than teams can adapt. Each model introduces a new paradigm shift. If you miss a paradigm shift, you’re dead.
After 30 years of clicking, scrolling, and optimizing pixels, websites are becoming obsolete. LLM agents will read and act for us, ending search engines, blue links, and traditional websites.
HAHAHA. Ok let's call it "transformation." As i wrote "The next decade of AI search will belong to systems that read and reason end-to-end. Retrieval isn’t dead—it’s just been demoted."
Why call it an ad? It’s not even on the company site. I only mentioned my company upfront so people get context (why we had to build a complex RAG pipeline, what kinds of documents we’re working with, and why the examples come from real production use cases).
It stands out because the flow and tone was clearly AI generated. It’s fluff, and I don’t trust it was written by a human who wasn’t hallucinating the non-company related talking points.
But don’t you think LLM pricing is heading toward zero? It seems to halve every six months. And on privacy, you can hope model providers won’t train on your data, (but there’s no guarantee)
I don't see how it can trend to zero when none of the vendors are profitable. Uber and doordash et. al. increased in price over time. The era of "free" LLM usage can't be permanent
Oh, it's going to be "free" alright, in the same way that most web services are today. I.e., you will pay for it with your data and attention.
The only difference is that the advertising will be much more insidious and manipulative, the data collection far easier since people are already willingly giving it up, and the business much more profitable.
Why does grep in a loop fall apart? It’s expensive, sure, but LLM costs are trending toward zero. With Sonnet 4.5, we’ve seen models get better at parallelization and memory management (compacting conversations and highlighting findings).
"LLM costs are trending toward zero". They will never be zero for the cutting edge. One could argue that costs are zero now via local models but enterprises will always want the cutting edge which is likely to come with a cost
LLMs > rerankers. Yes! I don't like rerankers. They are slow, the context window is small (4096 tokens), it's expensive... It's better when the LLM reads the whole file versus some top_chunks.
Rerankers are orders of magnitude faster and cheaper than LLMs. Typical latency out of the box on a decent sized cross encoder (~4B) will be under 50ms on cheap gpus like an A10G. You won’t be able to run a fancy LLM on that hardware and without tuning you’re looking at hundreds of ms minimum.
More importantly, it’s a lot easier to fine tune a reranker on behavior data than an LLM that makes dozens of irrelevant queries.
This is worth emphasizing. At scale, and when you have the resources to really screw around with them to tune your pipeline, rerankers aren't bad, they're just much worse/harder to use out of the box. LLMs buy you easy robustness, baseline quality and capabilities in exchange for cost and latency, which is a good tradeoff until you have strong PMF and you're trying to increase margins.
More than that, adding longer context isn’t free either in time or money. So filling an LLM context with k=100 documents of mixed relevance may be slower than reranking and filling with k=10 of high relevance.
Of course, the devil is in the details and there’s five dozen reasons why you might choose one approach over the other. But it is not clear that using a reranker is always slower.