Hacker Newsnew | past | comments | ask | show | jobs | submit | lossyalgo's commentslogin

> A coworker of mine told me that GPT’s LLM can use Excel files. No, it can’t. But the tools they plugged into it can.

And there's a 50/50 chance they'll use the right tool for the job. I tried the math question above multiple times on gpt5 and it gets it right about 50% of the time. If i ask to "try again" it usually gets it on the 2nd or 3rd try. Most times that it's wrong, it's not far off but it looks deceptively accurate at first glance.


Android version is also the best I've found without ads, has a decent UI and plenty of advanced features.

AI sycophancy[0] is a real problem, for multiple reasons, but your example is one that makes me disable AI on all search engines entirely.

[0] https://techcrunch.com/2025/08/25/ai-sycophancy-isnt-just-a-...


Are you perhaps getting AI-generated trash that is just SEO optimized? I've noticed a TON more of these results in DDG and Google lately. You can now block those websites completely from DDG as of very recently (or at least I only noticed it very recently, and it's a true godsend to filter out all this AI-generated trash).

how do you block them?

but yeah its a combination of SEO trash but it also seemingly not having stuff indexed

I'd search "Ask Clojure" or "Clojure Agents" and nothing from Clojure.org would show up

or like "MDN SVG circle" and the MDN would just not show up.

of course today im trying it and its all working haha


Weird, DDG supposedly uses Bing which should be indexing everything. Then again this is Microsoft, who can't even get local search working - Win11 lately can't even find Add/Remove Programs on my PC, I have to go through Settings and click 18 times before I find it.

re blocking: after every search in the upper-right corner of each link I see 3 dots which opens a menu and offers "block this site from all results".


That's weird to hear. I've been using DDG daily since years and it's gotten progressively better, though lately every search engine's top results are often AI generated trash. To combat this it seems that DDG recently added a feature to every link in the upper-right corner to "block this site from all results" which is something I've been waiting for since SEO optimizing trash became a thing.

AI is killing websites[0]. Why visit a website if the AI summary is good? But soon, if everyone is only using AI results, then there will be no reason to create new websites, unless you don't care about anyone visiting your site except for AI crawlers.

[0]: I won't bother linking any articles since there are too many articles on the subject and whatever I link is probably not the site you want (or is maybe paywalled).


I hope it was intentional humor that you summarized a view and did not link. Your own small contribution to killing websites?

There are many serious ethical and practical problems posed by the rise of LLMs, and I agree that this is one.

My hope is that AI helps to fine tune inquiries and helps users discover websites that would otherwise not have been uncovered by traditional index-based search.

Unfortunately it’s in the interests of search and AI companies to keep you inside their portals, so they may be less than willing to link to the outside even when it would improve the experience.


Hard agree. I was recently at a talk from Jaron Lanier[0], who proposed that AI should, after every query, present on the right-side of the page a list of all clickable sources where AI gathered it's data from, so that we could verify accuracy, as well as allowing us to continue giving traffic to websites.

[0] https://www.jaronlanier.com

edit: grammar


> AI should, after every query, present on the right-side of the page a list of all clickable sources

The default internet device these days is the phone; so many people don’t even use desktop any more. Space limitations on small screens mean that this is unlikely to be shown by default. Moreover, phone interfaces discourage most users from opening multiple new tabs forking off any webpage. You might show desktop users this and get some uptake, but that’s not enough to save the open web.


When I do use LLMs, I explicitly ask for all claims with a footnote and the source used for citation.

I almost always get the claim(1) and footnote with URL or book, or DOI.


These will mostly match, but they arent nessecarily the sources just some links that are plausibly the sources.

I haven't used LLMs much but Perplexity always give me tons of links, I really appreciate it vs. chatgpt.

> Unfortunately it’s in the interests of search and AI companies to keep you inside their portals, so they may be less than willing to link to the outside even when it would improve the experience.

This is true, but aren't "AI" summaries directly opposed to this interest? The user will usually get the answer they need much more quickly than if they had to scroll down the page, hunt for the right result, and get exposed to ads. So "AI" summaries are actually the better user experience.

In time I'm sure that we'll see ads embedded in these as well, but in the current stage of the "AI" hype cycle, users actually benefit from this feature.


Might as well hope that websites optimize for sending people to bookstores.

> AI is killing websites

I think that's hyperbole.

Yes, users can rely on "AI" summaries if they want a quick answer, but they've been able to do that for years via page snippets underneath each result, which usually highlight the relevant part of the page. The same argument was made when search engines began showing page snippets, yet we found a balance, and websites are still alive.

On the contrary, there's an argument to be made that search engines providing answers is the better user experience. I don't want to be forced to visit a website, which will likely have filler, popups, and be SEO'd to hell, when I can get the information I want in a fraction of the time and effort, within a consistent interface. If I do need additional information, then I can go to the source.

I do agree with the idea you mention below of search engines providing source links, but even without it, "AI" summaries can hardly be blamed for hurting website traffic. Websites are doing that on their own with user hostile design, SEO spam, scams, etc.

There is a long list of issues we can criticize search engines for, and the use of "AI" even more so, but machine-generated summaries on SERPs is not one of them IMO.


I guess you didn't take up my offer to search for how AI is killing traffic. There are numerous studies that repeatedly prove this to be true, this relatively recent article links to a big pile of them[0]. Why would anyone visit a website, if the AI summary is seemingly good enough?

My issue with AI summaries is that they are not even remotely accurate, trustworthy or deterministic. Someone else posted this wonderful evidence[1] in the comments. LLMs are sycophantic and agree with you all the time, even if it means making shit up. Maybe things will improve, but for the last 2 years, I have not seen much progress regarding hallucinations or deterministic i.e. reliable/trustworthy responses. They are still stochastic token guessers with some magic tricks sprinkled on top to make results slightly better than last month's LLMs.

And what happens when people stop creating new websites because they aren't getting any visitors (and by extension ad-revenue)? New info will stop being disseminated. Where will AI summarize data, if there is no new data to summarize? I guess they can just keep rehashing the new AI-generated websites, and it will be one big pile of endlessly recycled AI shit :)

p.s. I don't disagree with you regarding SEO spam, hostile design, cookie popups, etc. There is even a hilariously sad website[2] which points out how annoying websites have become. But using non-deterministic sycophantic AI to "summarize" websites is not the answer, at least not in the current form.

[0] https://www.theregister.com/2025/07/22/google_ai_overviews_s...

[1] https://imgur.com/a/why-llm-based-search-is-scam-lAd3UHn

[2] https://how-i-experience-web-today.com/

edit: grammar


> My issue with AI summaries is that they are not even remotely accurate, trustworthy or deterministic.

Who cares if it's deterministic? Google changes their algorithms all the time, you don't know what its devs will come up with next, when they release it, when they deploy it, when the previous cache gets cleared. It doesn't matter.


Haha, I suppose the problem is that LLM outputs are unreliable yet presented as authoritative (disclaimers do little to counteract the boffo confidence with which LLMs bullshit) — not that they are unreliable in unpredictable ways.

Presented as authoritative by its users. I mean there are very obvious disclaimers and people just ignore it.

I'm well aware of the studies that "prove" that "AI" summaries are "killing" traffic to websites. I suppose you didn't consider my point that the same was said about snippets on SERPs before "AI"[1].

> My issue with AI summaries is that they are not even remotely accurate, trustworthy or deterministic.

I am firmly on the "AI" skeptic side of this discussion. And yet if there's anything this technology is actually useful for is for summarizing content and extracting key points from it. Search engines contain massive amounts of data. Training a statistical model on it that can provide instant results to arbitrary queries is a far more efficient method of making the data useful for users than showing them a sorted list of results which may or may not be useful.

Yes, it might not be 100% accurate, but based on my own experience, it is reliable for the vast majority of use cases. Certainly beats hunting for what I need in an arbitrarily ordered list and visiting hostile web sites.

> LLMs are sycophantic and agree with you all the time, even if it means making shit up.

Those are issues that plague conversational UIs, and long context windows. "AI" summaries answer a single query and the context is volatile.

> And what happens when people stop creating new websites because they aren't getting any visitors (and by extension ad-revenue)? New info will stop being disseminated.

That's baseless fearmongering and speculation. Websites might be impacted by this feature, but they will cope, and we'll find ways to avoid the doomsday scenario you're envisioning.

Some search engines like Kagi already provide references under their "AI" summaries. If Google is pressured to do so, they will likely do the same as well.

So the web will survive this specific feature. Website authors should be more preoccupied with providing better content than with search engines stealing their traffic. I do think that "AI" is a net negative for the world in general, but that's a separate discussion.

[1]: https://ahrefs.com/blog/featured-snippets-study/


Sorry I didn't meant to discount your argument. I don't think SERPs are a valid comparison, AI is for me an apples vs. oranges comparison, or rather rocks vs. turtles :)

btw your linked article/study doesn't support your argument - SERPs are definitely stealing clicks (just not nearly as many as AI):

> In other words, it looks like the featured snippet is stealing clicks from the #1 ranking result.

I should maybe clarify: I have been using LLMs since the day they arrived on the scene and I have a love/hate relationship with them. I do use summaries sometimes, but I generally still prefer to just at least skim TFA unless it's something where I don't care about perfect accuracy. BTW did you click on that imgur link? It's pretty damning - the AI summary you get depends entirely on how you phrase your query!

> Yes, it might not be 100% accurate, but based on my own experience, it is reliable for the vast majority of use cases. Certainly beats hunting for what I need in an arbitrarily ordered list and visiting hostile web sites.

What does "vast majority" mean? 9 out of 10? Did/do you double-check the accuracy regularly? Or did you stop verifying after reaching the consensus that X/Y were accurate enough? I can imagine as a tech-savvy individual, that you still verify from time to time and remain skeptical but think of 99% of the users who don't care/won't bother - who just assume AI summaries are fact. That's where the crux of my issue lies: they are selling AI output as fact, when in fact, it's query-dependent, which is just insane. This will (or surely has) cost plenty of people dearly. Sure, reading a summary of the daily news is probably not gonna hurt anyone, but I can imagine people have/will get into trouble believing a summary for some queries e.g. renter rights - which I did recently (combination summaries + paid LLMs), and almost believed it until I double-checked with a friend who works in this area who then pointed out a few minor but critical mistakes, which then saved my ass from signing some bad paperwork. I'm pretty sure AI summaries are still just inaccurate, non-deterministic LLMs with some special sauce to make them slightly less sketchy.

> Those are issues that plague conversational UIs, and long context windows. "AI" summaries answer a single query and the context is volatile.

Just open that imgur link. Or try it for yourself. Or maybe you are just good at prompting/querying and get better results.

> So the web will survive this specific feature. Website authors should be more preoccupied with providing better content than with search engines stealing their traffic.

I agree the web will survive in some form or other, but as my Register link shows (with MANY linked studies), it already IS killing web traffic to a great degree because 99% of users believe the summaries. I really hope you are right, and the web is able to weather this onslaught.


Just to add fuel to the fire...AI output is non deterministic even with the same prompt. So users searching the same thing may get different results. The output is not just query dependent

> What does "vast majority" mean? 9 out of 10? Did/do you double-check the accuracy regularly? Or did you stop verifying after reaching the consensus that X/Y were accurate enough?

I don't verify the accuracy regularly, no. And I do concede that I may be misled by the results.

But then again, this was also possible before "AI". You can find arguments on the web supporting literally any viewpoint you can imagine. The responsiblity of discerning fact from fiction remains with the user, as it always has.

> Just open that imgur link. Or try it for yourself. Or maybe you are just good at prompting/querying and get better results.

I'm not any better at it than any proficient search engine user.

The issue I see with that Imgur link is that those are not search queries. They are presented as claims, and the "AI" will pull from sources that back up those claims. You would see the same claims made by web sites listed in the results. In fact, I see that there's a link next to each paragraph which will likely lead you to the source website. (The source website might also be "AI" slop, but that's a separate matter...) So Google is already doing what you mentioned as a good idea above.

All the "AI" is doing there is summarizing content you would find without it as well. That's not proof of hallucinations, sycophancy, or anything else you mentioned. What it does is simplify the user experience, like I said. These tools still suffer from these and other issues, but this particular use case is not proof of it.

So instead of phrasing a query as a claim ("NFL viewership is up"), I would phrase it using keywords ("NFL viewership statistics 2025"). Then I would see the summarized statistics presented by "AI", drill down and go to the source, and make up my mind on which source to trust. What I wouldn't do is blindly trust results from my biased claim, whether they're presented by "AI" or any website.

> it already IS killing web traffic to a great degree because 99% of users believe the summaries. I really hope you are right, and the web is able to weather this onslaught.

I don't disagree that this feature can impact website traffic. But I'm saying that "killing" is hyperbole. The web is already a cesspool of disinformation, spam, and scams. "AI" will make this even worse by enabling website authors to generate even more of it. But I'm not concerned at all about a feature that right now makes extracting data from the web a little bit more usable and safer. I'm sure that this feature will eventually also be enshittified by ads, but right now, I'd say users gain more from it than what they lose.

E.g. if my grandma can get the information she needs from Google instead of visiting a site that will infect her computer with spyware and expose her to scams, then that's a good thing, even if that information is generated by a tool that can be wrong. I can explain this to her, but can't easily protect her from disinformation, nor from any other active threat on the modern web.


Summary is supposed to give you a taste of what the link destination talks about. If most of the page information can be fitted in one paragraph of summarization, the problem is with webpage, and visiting that webpage would have been a waste of the user time.

> Why visit a website if the AI summary is good?

That is a big if.

A summary cannot be better than what is summarizes in any was but brevity. It can be much worse.


Let me remind you of recipe websites as an example of how summaries can be better by ignoring all of the useless crap that has nothing to do with making the dish

Agreed, but only assuming the summary is accurate and not hallucinating, which the current state of LLMs sadly can not guarantee. Maybe next year?

Noise reduction has tons of value in many fields.

I find that if I describe an esoteric bug to a high powered LLM, I often get to my answer more quickly than if I trawl through endless search results. The synthesis itself is a valuable addition.

Frequently I cannot even find source documents which match my exact circumstances; I’m uncertain whether they actually exist.


agreed, the other half is that most websites now are just AI generated slop that makes you wonder why you even bothered to look at the actual website instead of the llm.

Tell that to German-speakers, where the verb comes last, and the order of things in sentences is not anything like English, therefore requiring you to think of the entire sentence before you just spit it out. Even the numbers are backwards (twenty-two is two-and-twenty) which requires thinking.

Furthermore, when you ask an LLM to count how many r's are in the word strawberry, it will give you a random answer, "think" about it, and give you another random answer. And I guarantee you out of 3 attempts, including reasoning, it will flip-flop between right and wrong, but unlike a human, it will be random, because, unlike humans who, when asked "how many r's are in the word strawberry" will not be able to tell you the correct answer every. fucking. time.

edit: formatting


It seems models are pre-planning though:

> How does Claude write rhyming poetry? Consider this ditty:

> He saw a carrot and had to grab it,

> His hunger was like a starving rabbit

> To write the second line, the model had to satisfy two constraints at the same time: the need to rhyme (with "grab it"), and the need to make sense (why did he grab the carrot?). Our guess was that Claude was writing word-by-word without much forethought until the end of the line, where it would make sure to pick a word that rhymes. We therefore expected to see a circuit with parallel paths, one for ensuring the final word made sense, and one for ensuring it rhymes.

> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.

[https://www.anthropic.com/research/tracing-thoughts-language...]


The part about strawberry is just not right. That problem was solved. And I do think it's a stretch to say German speakers think of the entire sentence before speaking it.

LLMs were not designed to count letters[0] since they work with tokens, so whatever trick they are now doing behind the scenes to handle this case, can probably only handle this particular case. I wonder if it's now included in the system prompt. I asked ChatGPT and it said it's now using len(str) and some other python scripts to do the counting, but who knows what's actually happening behind the scenes.

[0] https://arxiv.org/pdf/2502.16705


Just ask it how many r's are in strawberry and you will realize there isn't a lot of reasoning going on here, it's just trickery on top of token generators.

This is akin to "Show a human an optical illusion that exploits their physiology".

LLM's be like "The dumb humans can't even see the dots"[1]

[1]https://compote.slate.com/images/bdbaa19e-2c8f-435e-95ca-a93...


haha that's a great analogy!

How about non-determinism (i.e. hallucinations)? Ask a human ANY question 3 times and they will give you the same answer, every time, unless you prod them or rephrase the question. Sure the answer might be wrong 3 times, but at least you have consistency. Then again, maybe that's a disadvantage for humans!


Ask an illiterate person the same thing and they will fail badly too. Is it impossible to have intelligence without literacy? (Bonus: if so, how was writing invented?)

Yes but an illiterate person can be taught to read. Also LLMs generally fail (non-deterministically) at math in general, but humans can also be taught math.

Furthermore regarding reasoning, just ask any LLM how many "r letters are in strawberry" - repeat maybe 3 times just to get a feeling for how much variance in answers you can get. And this "quirk" of the inability to get the right answer is something that after 2 years making fun of LLMs online on various forums is still an issue. The models aren't getting smarter, and definitely aren't thinking, they are still token generators with a few tricks on top to make them seem more intelligent than predecessors.

Sorry to say, Claude Sonnet 4.5 (with and without extended thinking) has this one solved now. So, not “any” LLM, but maybe some still need work.

I presume they finally got around to training the internal reasoning loop into predicting when it needs to apply answer predictions character-by-character.

—- Transcripts from Claude Sonnet 4.5, initially with extended thinking:

> How many a’s in the phrase “artichokes and aardvarks”? Be sure to check your answer.

> Let me count the letter ‘a’ in “artichokes and aardvarks”: • artichokes: 1 ‘a’ (at the beginning) • and: 1 ‘a’ (at the beginning) • aardvarks: 3 ‘a’s (two at the beginning: “aa”, and one in the middle: “ardvarks”) Total: 1 + 1 + 3 = 5 ‘a’s Let me verify by marking them: artichokes and aardvarks Yes, there are 5 letter ‘a’s in the phrase “artichokes and aardvarks”.

> Are you sure that’s correct?

> Let me recount more carefully: “artichokes and aardvarks” Going through each word letter by letter: 1. artichokes - has 1 ‘a’ (first letter) 2. and - has 1 ‘a’ (first letter) 3. aardvarks - has 3 ‘a’s (first letter, second letter, and one more in the middle) Total: 1 + 1 + 3 = 5 ‘a’s Yes, I’m confident the answer is 5. Is there something specific you’d like me to recheck?

> How many r’s in strawberry?

> Let me count the letter ‘r’ in “strawberry”: s-t-r-a-w-b-e-r-r-y Going through each letter: • s: not an r • t: not an r • r: 1st r • a: not an r • w: not an r • b: not an r • e: not an r • r: 2nd r • r: 3rd r • y: not an r There are 3 r’s in “strawberry”.

<Without extended thinking> > How many u’s in universalulis?

> Looking at “universalulis” letter by letter: u-n-i-v-e-r-s-a-l-u-l-i-s There are 2 u’s in “universalulis”.


For what it's worth, when I ask ChatGPT 5, it gets the correct answer every time. The response varies, but the answer is always three.

Now try a different language. My take is hard RL tuning to fix these "gotcha:s" since the underlying model can't do it on its own.

OpenAI is working on ChatGPT the application and ecosystem. They have transitioned from model building to software engineering with RL tuning and integration of various services to solve the problems the model can't do on its own. Make it feel smart rather than be smart.

This means that as soon as you find a problem where you step out of the guided experience you get the raw model again which fails when encountering these "gotchas".

Edit - Here's an example where we see a very tuned RL experience in English where a whole load of context is added on how to solve the problem while the Swedish prompt for the same word fails.

https://imgur.com/a/SlD84Ih


You can tell it "be careful about the tokenizer issues" in Swedish and see how that changes the behavior.

The only thing that this stupid test demonstrates is that LLM metacognitive skills are still lacking. Which shouldn't be a surprise to anyone. The only surprising thing is that they have metacognitive skills, despite the base model training doing very little to encourage their development.


LLMs were not designed to count letters[0] since they work with tokens, so whatever trick they are now doing behind the scenes to handle this case, can probably only handle this particular case. I wonder if it's now included in the system prompt. I asked ChatGPT and it said it's now using len(str) and some other python scripts to do the counting, but who knows what's actually happening behind the scenes.

[0] https://arxiv.org/pdf/2502.16705


There's no "trick behind the scenes" there. You can actually see the entire trick being performed right in front of you. You're just not paying attention.

That trick? The LLM has succeeded by spelling the entire word out letter by letter first.

It's much easier for an LLM to perform "tokenized word -> letters -> letter counts" than it is to perform "tokenized word -> letter counts" in one pass. But it doesn't know that! It copies human behavior from human text, and humans never had to deal with tokenizer issues in text!

You can either teach the LLM that explicitly, or just do RLVR on diverse tasks and hope it learns the tricks like this by itself.


re #2: Do people call it thinking, or is it just clever marketing from AI companies, that whenever you ask a question and it repeatedly prints out "...thinking...", as well as offering various modes with the word "thinking" written somewhere.

The AI companies obviously want the masses to just assume these are intelligent beings who think like humans and so we can just trust their output as being truthful.

I have an intelligent IT colleague who doesn't follow the AI news at all and who has zero knowledge of LLMs, other than that our company recently allowed us limited Copilot usage (with guidelines as to what data we are allowed to share). I noticed a couple weeks ago that he was asking it various mathematical questions, and I warned him to be wary of the output. He asked why, so I asked him to ask copilot/chatGPT "how many r letters are in the word strawberry". Copilot initially said 2, then said after thinking about it, that actually it was definitely 3, then thought about it some more then said it can't say with reasonable certainty, but it would assume it must be 2. We repeated the experiment with completely different results, but the answer was still wrong. On the 3rd attempt, it got it right, though the "thinking" stages were most definitely bogus. Considering how often this question comes up in various online forums, I would have assumed LLM models would finally get this right but alas, here we are. I really hope the lesson instilled some level of skepticism to just trust the output of AI without first double-checking.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: