Hacker Newsnew | past | comments | ask | show | jobs | submit | Xcelerate's commentslogin

As a non-mathematician, I’m confused why so many people think the conjecture (whether true or false) is provable within PA. To me, it seems like something that would be very nicely just right outside the boundary of PA’s capability, sort of like how proving all Goodstein sequences terminate requires transfinite induction up to ε_0. Add that to the fact that the Collatz Conjecture seems to fall in the same “category” of problem as the Turing machines that the Busy Beaver project is having a hard time proving non-halting behavior of, and the heuristic arguments all seem to point to: Collatz is independent of PA.

But I’m interested in hearing the counterarguments that Collatz likely is provable within PA and why this would be the case.


The Collatz conjecture is "obviously" (i.e. probabalistically) true, so it's frustrating to not be able to turn that into a proof. PA doesn't matter, there's no known approach to proving it with stronger theories either. Of course many other propositions like the twin prime conjecture are in the same situation.

Goodstein's theorem by contrast is obviously provable in slightly stronger theories than PA, and it involves a fast-growing sequence which suggests it's out of weaker theories' reach. In fact it encodes ordinals up to eps_0 in a natural way, so its equivalence to CON(PA) is unsurprising. The Collatz conjecture is nothing like that. It's beguilingly simple by comparison.


> Eliminate redundant matrix operations (like two transposes next to each other)

In 2016, I was trying to construct orthogonal irreducible matrix representations of various groups (“irreps”). The problem was that most of the papers describing how to construct these matrices used a recursive approach that depended on having already constructed the matrix elements of a lower dimensional irrep. Thus the irrep dimension n became quite an annoying parameter, and function calls were very slow because you had to construct the irrep for each new group element from the ground up on every single call.

I ended up using Julia’s @generated functions to dynamically create new versions of the matrix construction code for each distinct value of n for each type of group. So essentially it would generate “unrolled” code on the fly and then use LLVM to compile that a single time, after which all successive calls for a specific group and irrep dimension were extremely fast. Was really quite cool. The only downside was that you couldn’t generate very high dimensional irreps because LLVM would begin to struggle with the sheer volume of code it needed to compile, but for my project at the time that wasn’t much of a concern.


Hmm... I agree with parts of this and disagree with other parts. In my experience "cross-functional collaboration" splits into two distinct components: leadership and information. Anecdotal, but when leadership is split into too many people at the same level who are each in charge of a domain (that requires heavy interaction with the other domains), nothing gets accomplished—analysis paralysis and politics takes over. You absolutely need one specific person as the final decision maker. They should carefully consider all input from various sources and then make a final decision in a timely fashion. If it turns out to be the wrong path, that's fine, just reverse course quickly as well.

On the other hand, information silos are absolutely horrible. The most effective companies I've worked at have always had tons of information freely available to all employees. Unless there are privacy, cybersecurity, antitrust, or similar risks involved, every employee should have access to all information across all teams. It should be easily searchable as well. There are certainly exceptions—Apple seems to function well despite all the secrecy. But most companies aren't Apple, and I don't think it's generally a good strategy.


Haha, I like to joke that we were on track for the singularity in 2024, but it stalled because the research time gap between "profitable" and "recursive self-improvement" was just a bit too long that we're now stranded on the transformer model for the next two decades until every last cent has been extracted from it.


There's massive hardware and energy infra built out going on. None of that is specialized to run only transformers at this point, so wouldn't that create a huge incentive to find newer and better architectures to get the most out of all this hardware and energy infra?


>None of that is specialized to run only transformers at this point

isn't this what [etched](https://www.etched.com/) is doing?


Only being able to run transformers is a silly concept, because attention consists of two matrix multiplications, which are the standard operation in feed forward and convolutional layers. Basically, you get transformers for free.


devil is in the details


how do you know we're not at recursive self-improvement but the rate is just slower than human-mediated improvement?


From the abstract:

> This aligns with number theory conjectures suggesting that at higher orders of magnitude we should see diminishing noise in prime number distributions, with averages (density, AP equidistribution) coming to dominate, while local randomness regularises after scaling by log x. Taken together, these findings point toward an interesting possibility: that machine learning can serve as a new experimental instrument for number theory.

n*log(n) spacing with "local randomness" seems like such a common occurrence that perhaps it should be abstracted into its own term (or maybe it already is?) I believe the description lengths of the minimal programs computing BB(n) (via a Turing machine encoding) follow this pattern as well.


> a computation is a process that maps symbols (or strings of symbols) to other symbols, obeying certain simple rules[1]

There are quite a number of people who believe this is the universe. Namely, that the universe is the manifestation of all rule sets on all inputs at all points in time. How you extract quantum mechanics out of that... not so sure


Would love to read a HN-tailored blog post of your work or an overview of the binary lambda calculus if you ever have the time btw


A walkthrough would be nice, but he's got a lot of understandable material linked on that page. For example, here's an overview of the binary lambda calculus: https://tromp.github.io/cl/Binary_lambda_calculus.html

And here's a readable and fascinating post on "the largest number that's representable in 64 bits": https://tromp.github.io/blog/2023/11/24/largest-number.

If you go through these and find some interesting things, it'd be worth posting to HN.


https://tromp.github.io/cl/cl.html has many links to BLC materials, like my LispNYC video talk.


There are a few concepts at play here. First you have to consider what can be proven given a particular theory of mathematics (presumably a consistent, recursively axiomatizable one). For any such theory, there is some finite N for which that theory cannot prove the exact value of BB(N). So with "infinite time", one could (in principle) enumerate all proofs and confirm successive Busy Beaver values only up to the point where the theory runs out of power. This is the Busy Beaver version of Gödel/Chaitin incompleteness. For BB(5), Peano Arithmetic suffices and RCA₀ likely does as well. Where do more powerful theories come from? That's a bit of a mystery, although there's certainly plenty of research on that (see Feferman's and Friedman's work).

Second, you have to consider what's feasible in finite time. You can enumerate machines and also enumerate proofs, but any concrete strategy has limits. In the case of BB(5), the authors did not use naive brute force. They exhaustively enumerated the 5-state machines (after symmetry reductions), applied a collection of certified deciders to prove halting/non-halting behavior for almost all of them, and then provided manual proofs (also formalized) for some holdout machines.


Homemade cake mixes rarely win blind taste tests against box mix. I baked two cakes with and without glycerol monostearate—it really does make a difference.


I'm not sure a blanket statement like that can be made just because this will be so subjective and anecdotal. From a brief google, it would look pretty split. AI summary falls on the homemade cake mix side, but reading through some of the blog posts from both sides, I would really hesitate to view any of them as reliable just because bakers are going to be biased towards their preferred method, both psychologically and skill-wise. From my experience in baking competitions, my family and I have had better luck with homemade mixes.

I'm not disagreeing since taste is subjective, and for blind taste tests I could definitely see most people choosing what they are used to which would probably be box mixes. I'd be interested to see if culturally it's different since I think box mixes are generally more popular in the USA.


yes for your health


Everyone likes to debate the philosophy of whether the reals are “real”, but for me there is a much more practical question at hand: does the existence of something within a mathematical theory (i.e., derivability of a “∃ [...]” sentence) reflect back on our ability to predict the result of symbolic manipulations of arbitrary finite strings according to an arbitrary finite rule set over an arbitrary finite period of time?

For AC and CH, the answer is provably “no” as these axioms have been shown to say nothing about the behavior of halting problems, which any question about the manipulation of symbols can be phrased in terms of (well, any specific question—more general cases move up the arithmetical hierarchy).

If it’s not reflective in this precise sense, then the derivation of, e.g., a set-theoretic ∃ in some instances has no effect on any prediction of known physics (i.e., we are aware of no method of falsification).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: