Checking the validity of a given proof is deterministic, but filling in the proof in the first place is hard.
It's like Chess, checking who wins for a given board state is easy, but coming up with the next move is hard.
Of course, one can try all possible moves and see what happens. Similar to Chess AI based on search methods (e.g. MinMax), there are proof search methods. See the related work section of the paper.
> imagine a folder full of skills that covers tasks like the following:
> Where to get US census data from and how to understand its structure
Reminds me of my first time using Wolfram Alpha and got blown away by its ability to use actual structured tools to solve the problem, compared to normal search engine.
tbh wolfram alpha was the craziest thing ever. haven't done much research on how this was implemented back in the day but to achieve what they did for such complex mathematical problems without AI was kind of nuts
I doubt that if the underlying parts changed, anyone outside the industry or enthusiasts would know what that is. How many people know what kind of engine is in their car? I stomp on the floor of my Corolla and away we go! Others might know that their Dodge Challenger has a Hemi. What even is that? Thankfully we have the Internet these days, and someone who's interested can just select the word and right click to Google for the Wikipedia article for it. AI is just such an entirely undefined term coloquially, that any attempts to define it will be wrong.
I think the difference now is that traditional software ultimately comes down to a long series of if/then statements (also the old AI's like Wolfram), whereas the new AI (mainly LLM's) have a fundamentally different approach.
Look into something like Prolog (~50 years old) to see how systems can be built from rules rather than it/else statements. It wasn't all imperative programming before LLMs.
If you mean that it all breaks down to if/else at some level then, yeah, but that goes for LLMs too. LLMs aren't the quantum leap people seem to think they are.
Yeah, the result is pretty cool. It's probably how it felt to eat pizza for the first time. People had been grinding grass seeds into flour, mixing with water and putting it on hot stones for millennia. Meanwhile others had been boiling fruits into pulp and figuring out how to make milk curdle in just the right way. Bring all of that together and, boom, you have the most popular food in the world.
We're still at the stage of eating pizza for the first time. It'll take a little while to remember that you can do other things with bread and wheat, or even other foods entirely.
Would really like something selfhosted that does the basic Wolfram Alpha math things.
Doesn't need the craziest math capability but standard symbolic math stuff like expression reduction, differentiation and integration of common equations, plotting, unit wrangling.
All with an easy to use text interface that doesn't require learning.
TI-89 has surprisingly good symbolics tools and solvers for something that runs all year on a single set of AAA batteries. Feels like magic alien tech.
I used it a lot for calc as it would show you how they got the answer if I remember right, also liked how it understands symbols which ibv but cool to paste an integral sign in there
> Some Chinese language source claims that it's a reaction to the Pakistan-US rare earth deal.
Maybe they approached India for a deal that was too lopsided in favour of US for the former to accept so US did the show-and-tell cozying up to Pakistan to get a better while publicly shitting on India? Just follow the money?
Xi is getting China ready to attack Taiwan in 2026 or 2027, and the now mutual unwinding of economic relations between the US and China is underway. Still frenemies at this point, but Trump is aiming for more enemy status sooner because it causes media drama and draws attention to him. The US will be screwed because domestic production takes years to happen and it has lost most of its machine tool suppliers, knowledge, and workers. Manufacturing productivity is essential for any sort of war, as evidenced by the history of the American Civil War and WW II.
If the US doesn't impeach and remove Trump and Vance, and get a real, war-time leader who isn't a celebrity reality star ASAP, it will be doomed as China will rapidly seize Taiwan, disrupt Western chip production and plunge the West into an economic armageddon, and likely widen to a war with Japan who would definitely intervene militarily to defend economic technological resources in Taiwan. No more incompetent, self-destructive, corrupt, ideologue chaos can be tolerated.
From the title I thought they solved math! Turns out to be a framework to use SMT solvers for decision-based proof. For additional types, you still need to write the bridging part. Interesting nonetheless.
Nice. When using OpenAI Codex CLI, I find the /compact command very useful for large tasks. In a way it's similar to the context editing tool. Maybe I can ask it to use a dedicated directory to simulate the memory tool.
Claude Code had this /compact command for a long time, you can even specify your preferences for compaction after the slash command. But this is quite limited and to get the best results out of your agent you need more than rely on how the tool decides to prune your context. I ask it explicitly to write down the important parts of our conversation into an md file, and I review and iterate over the doc until I'm happy with it. Then /clear the context and give it instructions to continue based on the MD doc.
Duolingo is useful, but not efficient. When people say I want to learn a language, they often mean I want to learn this language efficiently, e.g. to be able to write an essay like the post says after a realistic period of time.
I personally don't believe its pedagogical deficiency is mere incompetence. The whole business model is to keep you on the platform as long as possible, so why would they make you learn faster rather than just enough to keep you there?
As a long time user before, I have observed a lot of mechanism changes that bear out this observation.
> 10x productivity means ten times the outcomes, not ten times the lines of code. This means what you used to ship in a quarter you now ship in a week and a half.
This assumes the acceleration happens on all tasks. Amdahl's law states that the overall acceleration is constrained by the portion of the accelerated work. Probably it's just unclear if the "engineer" or "productivity" means the programming part or the overall process.
It's like Chess, checking who wins for a given board state is easy, but coming up with the next move is hard.
Of course, one can try all possible moves and see what happens. Similar to Chess AI based on search methods (e.g. MinMax), there are proof search methods. See the related work section of the paper.