Hacker Newsnew | past | comments | ask | show | jobs | submit | llm_nerd's commentslogin

>As a Canadian

Didn't take long to discover that your "as a Canadian" is actually "as a hyper-partisan Alberta separatist that thinks oil is all that matters".

Kind of gave up the game when you said this nonsense: "we've been putting up tariffs on trading partners' goods".

"It's funny, we've been trying to forge closer ties with the EU and China but again"

You really, really have no idea what you're talking about.


>Doesn’t this action from Canada explain Trump rhetoric though?

Trump already tariffed cars and car parts from Canada under the guise of national security, in complete and utter defiance of USMCA (you know, the best trade agreement ever as described by the guy who signed it: Donald Trump). Canada should return the favour. Canada had tried to play nice with the "American" automakers, but if they screw Canada to pander to the rapist, they lose that benefit.


They show it going through a machine that seems to at best maybe vacuum off dust and maybe get a part of the ball with UV? Am I missing something? Seems more like the illusion of cleaning. The description of Chucky cheese running them through an industrial dishwasher seems more like actual cleaning.

In the description it calls it a "deep clean", which seems suspect.


>They show it going through a machine that seems to at best maybe vacuum off dust and maybe get a part of the ball with UV? Am I missing something?

It gets washed off with soap and water inside the machine, you can tell because in the video you can see soap suds, shortly after it shows the UV step.


They're foamy in the middle after ingestion and before being sucked in the tube it seems


Isn't it more likely the submitter chose the title? HN doesn't even auto-recommend a title for submitted content, and instead it's up to the submitter. In rare cases after the fact a mod like dang changes the title to remove editorialization.

So unsure what this whole thread of people complaining about HN supposedly mangling titles.


HN does remove some prefixes from titles automatically


Actually came across this video independently a couple of days ago, and having never come across this gentleman before it was enough to convince me that his analysis is of negligible value of the "beg the conclusion" variety.

To wit, in his review he-

-dismisses environmental concerns with Li

-dismisses safety concerns with Li

-dismissed geopolitical concerns with Li availability. Something something "environmentalists!" (shakes fist at clouds) like with the environmental concerns.

-dismisses economic advances of Na

And then the overwhelming focus of his review is that if you deep freeze the battery, it charges slowly. This becomes the foundation of his criticism. Only firstly it's a self solving issue -- the battery warms as it charges -- but in most situations the battery will be in a heated (or will be self-heating) scenario and at an ideal temperature.

I'm no Na booster, and it seems like an incremental improvement in various dimensions for certain scenarios, but that video adds extraordinarily little value to the space.


I disagree with your conclusion about the video even as I think Na is an incremental improvement. I think the video hits solidly on why the Bluetti Na Pro product is not yet a good overall product. I still think it's promising (and I think the reviewer does too). I can see why you think he's dismissing environmental/safety/geopolitical concerns. But I don't think so; I think he's simply taking the perspective of what's the best product for someone who needs to live with this as a primary power pack that they use. Obviously, someone could weigh the concerns that you mention higher than the functionality of the power pack. But reviewing it in the context of performance doesn't equate to a dismissal of those, IMO.


Complex documents is where OCR struggles mightily. If you have a simple document with paragraphs of text, sure OCR is pretty solved. If you have a complex layout with figures and graphs and supporting images and asides and captions and so on (basically any paper, or even trade documents), it absolutely falls apart.

And GP LLMs are heinous at OCR. If you are having success with FL, your documents must be incredibly simple.

There has been enormous advances in OCR over the past 6 months, so the SoTa is a moving, rapidly advancing target.


>And during most of that time

Most of what time? The patent just expired a few months ago. Generics are now ramping up because the patent expired, not because of some hypothetical eight year clinical trials. And for that matter, generics of existing ingredients and dosages do not have to repeat clinical trials anyways, which is one of the big reasons generics are a lot less expensive.

This hypothesis seems retconned and completely at odds with the actual facts. Not least that there has been absolutely nothing exceptional about the pricing of their product in Canada, which has always been one of the cheaper pricings worldwide.


This article is saying that the patent elapsed in 2018 because thy didn’t pay the $250 to renew the patent. Instead they relied on “data exclusivity” which means their trial data is exclusively theirs and anyone who wants to sell in Canada must first run safety trials of their own at a huge expense. It’s just as good as a patent but has a shorter window of exclusivity.


> Instead they relied on “data exclusivity” which means their trial data is exclusively theirs and anyone who wants to sell in Canada must first run safety trials of their own at a huge expense.

Sounds like potentially a good thing?


Yann LeCun's "Hoisted by their own GPTards" is fantastic.


While Yann is clearly brilliant, and has a deeper understanding of the roots of the filed than many of us mortals, I think he's been on a debbie downer trend lately, and more importantly, some of his public stances have been proven wrong in mere months / years after he made them.

I remember a public talk, where he was on the stage with some young researcher from MS. (I think it was one of the authors of the "sparks of brilliance in gpt4" paper, but not sure).

Anyway, throughout that talk he kept talking above the guy, and didn't seem to listen, even though he obviously didn't try the "raw", "unaligned" model that the folks at MS were talking about.

And he made 2 big claims:

1) LLMs can't do math. He went on to "argue" that LLMs trick you with poetry that sounds good, but is highly subjective, and when tested on hard verifiable problems like math, they fail.

2) LLMs can't plan.

Well, merely one year later, here we are. AIME is saturated (with tool use), gold at IMO, and current agentic uses clearly can plan (and follow up with the plan, re-write parts, finish tasks, etc etc).

So, yeah, I'd take everything any one singular person says with a huge grain of salt. No matter how brilliant said individual is.

Edit: oh, and I forgot another important argument that Yann made at that time:

3) because of the nature of LLMs, errors compound. So the longer you go in a session, the more errors accumulate so they devolve in nonsense.

Again, mere months later the o series of models came out, and basically proved this point moot. Turns out RL + long context mitigate this fairly well. And a year later, we have all SotA models being able to "solve" problems 100k+ tokens deep.


> LLMs can't do math. He went on to "argue" that LLMs trick you with poetry that sounds good, but is highly subjective, and when tested on hard verifiable problems like math, they fail.

They really can’t. Token prediction based on context does not reason. You can scramble to submit PRs to ChatGPT to keep up with the “how many Rs in blueberry” kind of problems but it’s clear they can’t even keep up with shitposters on reddit.

And your 2nd and third point about planning and compounding errors remain challenges.. probably unsolvable with LLM approaches.


> They really can’t. Token prediction based on context does not reason.

Debating about "reasoning" or not is not fruitful, IMO. It's an endless debate that can go anywhere and nowhere in particular. I try to look at results:

https://arxiv.org/pdf/2508.15260

Abstract:

> Large Language Models (LLMs) have shown great potential in reasoning tasks through test-time scaling methods like self-consistency with majority voting. However, this approach often leads to diminishing returns in accuracy and high computational overhead. To address these challenges, we introduce Deep Think with Confidence (DeepConf), a simple yet powerful method that enhances both reasoning efficiency and performance at test time. DeepConf leverages modelinternal confidence signals to dynamically filter out low-quality reasoning traces during or after generation. It requires no additional model training or hyperparameter tuning and can be seamlessly integrated into existing serving frameworks. We evaluate DeepConf across a variety of reasoning tasks and the latest open-source models, including Qwen 3 and GPT-OSS series. Notably, on challenging benchmarks such as AIME 2025, DeepConf@512 achieves up to 99.9% accuracy and reduces generated tokens by up to 84.7% compared to full parallel thinking.


> Debating about "reasoning" or not is not fruitful, IMO.

Thats kind of the whole need isn’t it? Humans can automate simple tasks very effectively and cheaply already. If I ask my pro versions of LLM what the Unicode value of a seahorse is, and it shows a picture of a horse and gives me the Unicode value for a third completely related animal then it’s pretty clear it can’t reason itself out of a wet paper bag.


Sorry perhaps I worded that poorly. I meant debating about if context stuffing is or isn't "reasoning". At the end of the day, whatever RL + long context does to LLMs seems to provide good results. Reasoning or not :)


Well that’s my point and what I think the engineers are screaming at the top of their lungs these days.. that it’s net negative. It makes a really good demo but hasn’t won anything except maybe translating and simple graphics generation.


> You can scramble to submit PRs to ChatGPT to keep up with the “how many Rs in blueberry” kind of problems but it’s clear they can’t even keep up with shitposters on reddit.

Nobody does that. You can't "submit PRs" to an LLM. Although if you pick up new pretraining data you do get people discussing all newly discovered problems, which is a bit of a neat circularity.

> And your 2nd and third point about planning and compounding errors remain challenges.. probably unsolvable with LLM approaches.

Unsolvable in the first place. "Planning" is GOFAI metaphor-based development where they decided humans must do "planning" on no evidence and therefore if they coded something and called it "planning" it would give them intelligence.

Humans don't do or need to do "planning". Much like they don't have or need to have "world models", the other GOFAI obsession.


> LLMs can't do math.

Ignoring conversations about 'reasoning', at a fundamental level LLMs do not 'do math' in the way that a calculator or a human does math. Sure we can train bigger and bigger models that give you the impression of this but there are proofs out there that with increased task complexity (in this case multi-digit multiplication) eventually the probability of incorrect predictions converges to 1 (https://arxiv.org/abs/2305.18654)

> And your 2nd and third point about planning and compounding errors remain challenges.. probably unsolvable with LLM approaches.

The same issue applies here, really with any complex multi-step problem.

> Again, mere months later the o series of models came out, and basically proved this point moot. Turns out RL + long context mitigate this fairly well. And a year later, we have all SotA models being able to "solve" problems 100k+ tokens deep.

If you go hands on in any decent size codebase with an agent session length and context size become noticeable issues. Again, mathematically error propagation eventually leads to a 100% chance of error. Yann isn't wrong here, we've just kicked the can a little further down the road. What happens at 200k+ tokens? 500k+ tokens? 1M tokens? The underlying issue of a stochastic system isn't addressed.

>While Yann is clearly brilliant, and has a deeper understanding of the roots of the filed than many of us mortals, I think he's been on a debbie downer trend lately

As he should be. Nothing he said was wrong at a fundamental level. The transformer architecture we have now cannot scale with task complexity. Which is fine, by nature it was not designed for such tasks. The problem is that people see these models work on a subset of small scope complex projects and make claims that go against the underlying architecture. If a model is 'solving' complex or planning tasks but then fails to do similar tasks at a higher complexity it's a sign that there is no underlying deterministic process. What is more likely: the model is genuinely 'planning' or 'solving' complex tasks, or that the model has been trained with enough planning and task related examples that it can make a high probability guess?

> So, yeah, I'd take everything any one singular person says with a huge grain of salt. No matter how brilliant said individual is.

If anything, a guy like Yann with a role such as his at a Mag7 company being realistic (bearish if you are a LLM evangelist) about what the transformer architecture can do is a relief. I'm more inclined to listen to him than a guy like Altman who touts LLMs as the future of humanity meanwhile is path to profitability is AI Tik-Tok, sex chatbots, and a third party way to purchase things from Walmart during a recession.


> AIME is saturated (with tool use) [...]

But isn't tool use kinda the crux here?

Correct me if I'm mistaken, but wasn't the argument back then on whether LLMs could solve maths problems without e.g. writing python to solve? Cause when "Sparks of AGI" came out in March, prompting gpt-3.5-turbo to code solutions to assist solving maths problems over just solving them directly was already established and seemed like the path forward. Heck, it is still the way to go, despite major advancements.

Given that, was he truly mistaken on his assertions regarding LLMs solving maths? Same for "planning".


AIME was saturated with tool use (i.e. 99%) for SotA models, but pure NL, no tool still perform "unreasonably well" on the task. Not 100% but still within 90%. And with lots of compute it can reach 99% as well, apparently [1] (@512 rollouts, but still)

[1] - https://arxiv.org/pdf/2508.15260


Pretty sure you can fill a room with serious researchers that at the very least will doubt about 2) being solved with LLMs, especially when talking about formal planning with pure LLMs and without a planning framwork.

PS: So just we're clear: formal planning in AI </> making a coding plan in Cursor.


> with pure LLMs and without a planning framwork.

Sure, but isn't that moving the goalposts? Why shouldn't we use LLMs + tools if it works? If anything it shows that the early detractors weren't even considering this could work. Yann in particular was skeptical that long-context things can happen in LLMs at all. We now have "agents" that can work a problem for hours, with self context trimming, planning to md files, editing those plans and so on. All of this just works, today. We used to dream about it a year ago.


> Why shouldn't we use

So weird that you immediately move the goalposts after accusing somebody of moving the goalposts. Nobody on the planet told you not to use "LLMs + tools if they work." You've moved onto an entirely different discussion with a made-up person.

> All of this just works, today.

Also, it definitely doesn't "just work." It slops around, screws up, reinserts bugs, randomly removes features, ignores instructions, lies, and sometimes you get a lucky result or something close enough that you can fix up. Nothing that should be in production.

Not that they're not very cool and very helpful in a lot of ways. But I've found them more helpful in showing me how they would do something, and getting me so angry that they nerd-snipe me into doing it correctly. I have to admit, 1) however, that sometimes I'm not sure that I'd have gotten there if I hadn't seen it not getting there, and 2) sometimes "doing it correctly" involves dumping the context and telling it almost exactly how I want something implemented.


> Sure, but isn't that moving the goalposts?

It can be considered as that, sure, but anytime I see Lecun talking about this, he does recognize that you can patch your way around LLMs, the point is that you are going to hit limits eventually anyways. Specific planning benchmarks like Blockworld and the like show that LLMs (with frameworks) hit limits when they're exposed to out-of-distribution problems, and that's a BIG problem.

> We now have "agents" that can work a problem for hours, with self context trimming, planning to md files, editing those plans and so on. All of this just works, today. We used to dream about it a year ago.

I use them everyday but I still woulnd't really let them work for hours in greenfield projects. And we're seeing big vibe coders like Karpathy say the same.


> Sure, but isn't that moving the goalposts? Why shouldn't we use LLMs + tools if it works?

Personally i do not see it like that at all as one is referring to LLMs specifically while the other is referring to LLMs plus a bunch of other stuff around them.

It is like person A claiming that GIF files can be used to play Doom deathmatches, person B responding that, no, a GIF file cannot start a Doom deathmatch, it is fundamentally impossible to do so and person A retorting that since the GIF format has a provision for advancing a frame on user input, a GIF viewer can interpret that input as the user wanting to launch Doom in deathmatch mode - ergo, GIF files can be used to play Doom deathmatches.


At the end of the day LLM + tools is asking the LLM to create a story with very specific points where "tool calls" are parts of the story, and "tool results" are like characters that provide context. The fact that they can output stories like that, with enough accuracy to make it worthwhile is, IMO, proof that they can "do" whatever we say they can do. They can "do" math by creating a story where a character takes NL and invokes a calculator, and another character provides the actual computation. Cool. It's still the LLM driving the interaction. It's still the LLM creating the story.


I think you have that last part backwards, it is not the LLM driving the interaction, it is the program that uses the LLM to generate the instructions that does the actual driving - that is the bit that makes the LLM start doing things. Though that is just splitting hairs.

The original point was about the capabilities LLMs themselves since the context was about the technology itself, not what you can do by making them part of a larger system that combines LLMs (perhaps more than one) with other tools.

Depending on the use case and context this distinction may or may not matter, e.g. if you are trying to sell the entire system, it probably is not any more important how the individual parts of the system work than what libraries you used to make the software.

However it can be important in other contexts, like evaluating the abilities of LLMs themselves.

For example i have written a script on my PC that my window manager calls to grab whatever text i have selected on whatever application i'm running and passes it to a program i've written in llama.cpp to load Mistral Small with a prompt that makes it check for spelling and grammar mistakes which in turn produces some script-readable input that another script displays in a window.

This, in a way, is an entire system. This system helps me find grammar and spelling mistakes in the text i have selected when i'm writing documents where i care about finding such mistakes. However it is not Mistral Small that has the functionality of finding grammar and spelling mistakes in my selected text, it only provides the part that does the text checking, the rest is done by other external non-LLM pieces. An LLM cannot intercept keystrokes in my computer, it cannot grab my selected text nor can create a window on my desktop, it doesn't even understand these concepts. In a way this can be thought as a limitation from the perspective of the end result i want, but i work around it with the other software i have attached to it.


I might be missing context here, but I'm surprised to see Yann using language that plays on 'retard.'

That seems out of character for him - more like something I'd expect from Elon Musk. What's the context I'm missing?


I don’t think it’s a wordplay with the r-word, but rather a reference to the famous Shakespeare quote: “Hoist with his own petard”. It’s become an English proverb. (A petard is a smallish bomb)


From péter, to fart.

Possibly entered the language as a saying due to Shakespeare being scurrilous.


It's a play on the word petard


I found this background useful as a non-native speaker: https://en.wikipedia.org/wiki/Hoist_with_his_own_petard


Hoist (thrown in the air) by your own petard (bomb) is a common phrase.


You have been Hoisted with your own retard


>I don't think I've ever seen anyone say they're not useful.

https://news.ycombinator.com/item?id=45577203

There are thousands and thousands of comments just like this on this site. I would dare say tens of thousands. They regularly appear in any AI-related discussion.

I've been involved in many threads on here where devs with Very Important Work announce that none of the AI tools are useful for them or for anyone with Real Problems, and at best they work for copy/paste junior devs who don't know what they're doing and are doing trivial work. This is right after they declare that anyone that isn't building a giant monolithic PHP app just like them are trend-chasers who are "cargo culting, like some tribe or something".

>I also think they're over-hyped and that the current frenzy will end badly (global economically speaking)

In a world where Tesla is a trillion dollar company based upon vapourware, and the president of largest economy (for now) is launching shitcoins and taking bribes through crypto, and every Western country saw a massive real-estate ramp up by unmetered mass migration, and Bitcoin is a $2T "currency" that has literally zero real world use beyond betting on itself, and sites like Polymarket exist for insiders to scam foolish rube outsiders out of their money, and... Dude, the AI bubble doesn't even remotely measure.


The cargo cult metaphor is weak. If an article written in the year of our FSM 2025 describes Melanesian cargo cults to make a point, they're probably just copying a trope from other articles. Cargo culting, if you will, much like Melanesian cargo cults that would wear bamboo earpieces and...

Is it a gold rush? Absolutely. There is a massive FOMO and everyone is rushing to claim some land, while the biggest profiteers of all are ones selling the shovels and pick axes. It's all going to wash out and in the end a very small number of players will be making money, while everyone else goes bust.

While many people think the broadly described AI is overhyped, I think people are grossly underestimating how much this changes almost everything. Very few industries will be untouched.


The author is an anthropologist, I think she knows the original meaning of "cargo cult".

The 'cult' behaviour described in the article is that of building big data centres without knowing how they will make money for the real business of the tech companies doing it. They have all bought AI startups but that doesn't mean that the management of the wider company understands it.


>The author is an anthropologist, I think she knows the original meaning of "cargo cult".

I am perplexed how you thought this refuted or offered any value to what I said. Or are you under delusions that her being an anthropologist also makes her an expert on AI and the tech industry, ergo ipso facto her metaphor isn't incredibly dumb and ill-suited?

I never questioned if they knew the "original meaning". Yes, we've all read the meaning countless, countless times, in a million trope-filled blog entries. And indeed, the whole basis of her tosser "article" is some random blog entry that, as millions before have, decided to make everything about Cargo Cults.

Protip: If you are busy writing a blog entry and you decide to describe some island tribe (it does not actually matter where said tribe is) that had bamboo headsets, delete the entire thing and go do something actually useful.

It is a profoundly boring story at this point. And in this case, like with many, the metaphor is incredibly stupid and ill-suited. If these businesses were building "data centres" out of mud and drawings of GPUs it would be pertinent, but instead it's describing a gold rush where a lot of players are doing precisely the right thing to try to land grab o an obviously massive and important space (and in a very useful to them sense, see enormous capitalization gains in doing so), then trying to ham-fist some cliche "story" in.

Like, if that tribe built functional runways with ATC towers, and then a fleet of cargo planes -- being well funded in the process by outsiders who see how lucrative the cargo business is -- but then it turns out that the cargo business is a bit saturated so it's going to be tough for them to make it profitable on their EBIDTA statements, boy, fire up the typewriter you got a winner!

>The 'cult' behaviour described in the article is that of building big data centres without knowing how they will make money

Utterly nonsensical.


Melanesian, not Micronesian (or Polynesian as you originally said). I know all those Pacific islands look the same, but it's not the same thing at all.


Oh, damn, Melanesian? Well this changes everything! I do remember when Melanesia built those computation centres and it turns out that neighbouring Polynesia went with the newer generation of fabric and upended their business. Truly a great metaphor for so many things!

Firing up notepad and going to author the next paper that does numbers among the Shakes Fists At Clouds crowd that spend their day tilting at windmills.


Yeah, if cargo cult were applied aptly, it would be more for the folks who are all-in on using LLMs yet not really getting any net productivity boost. Those basically just LARPing a dream world, but with no tangible benefit compared to the Old Ways.


Yeah, Not seeing the connection to cargo cult unless AGI already appeared, offered us incredible bounty of benefits and then left, so we all created a religion in order to summon AGI back.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: