Hacker Newsnew | past | comments | ask | show | jobs | submit | kkoncevicius's favoriteslogin

Earth's oceans contain approximately 1.35 billion cubic kilometers of water. To raise this entire volume from an average temperature of 3.5C to boiling (100 C), we'd need roughly: 1.35 x 10^21 kg x 4,184 J/(kg C) x 96.5C is approximately 5.45 x 10^25 joules That's 545 million exajoules or about 10,000 times humanity's annual energy consumption.

If you tried to brute-force AES-256 with conventional computers, you'd need to check 2^256 possible keys. Even with a billion billion (10^18) attempts per second: 2^256 operations / 10^18 operations/second is approximately 10^59 seconds. You'd need about 2.7 x 10^41 universe lifetimes to crack AES-256

At about 10 watts per computer, this would require approximately 10^60 joules, or roughly 2 x 10^34 times the energy needed to boil the oceans. You could boil the oceans, refill them, and repeat this process 200 trillion trillion trillion times.

For RSA-2048, the best classical algorithms would need about 2^112 operations. This would still require around 10^27 joules, or about 20 times what's needed to boil the oceans.

ECC with a 256-bit key would need roughly 2^128 operations to crack, requiring approximately 10^31 joules It's enough to boil the oceans about 2,000 times over.

Quantum computers could theoretically use Shor's algorithm to break RSA and ECC much faster. But to break RSA-2048, we'd need a fault-tolerant quantum computer with millions of qubits. Current quantum computers have fewer than 1,000 stable qubits. Even with quantum computing, the energy requirements would still be astronomical. Perhaps enough to boil all the oceans once or twice, rather than thousands of times.


A brief synopsis of eukaryotic life from my book, showing where complex life fits into our timeline:

https://impacts.to/downloads/lowres/impacts.pdf#page=12


Boyd has published a related book which is a bit more elementary but still great: https://web.stanford.edu/~boyd/vmls/

Whoever has the best story rules the world. Truth of the story is typically of little consequence.

Try and find an exception to this simple rule.


Animals' shared ancestor with fungus is thought to be among https://en.m.wikipedia.org/wiki/Choanoflagellate, which kinda look like sperm.

My guess is that they were predominantly haploid (like sperm/eggs), and just had a diploid (like most other human cells) phase for reproduction. Somewhere along the way this switched (which is thought to have happened in plants also, mosses are primarily haploid while flowering plants are primarily diploid).

...which is totally bonkers. It's like if sperm/eggs went around making all of the important decisions and only bothered to spin up a human briefly for reproductive purposes. But then the sex got a bit out of control and got a job and an apartment and now it's the sperm who look like they're just for reproduction.

Disclaimer: I took Bio 101 two semesters back, so I'm pretty much an expert.


None of the incentives line up in a remotely rational way in academia. It's a large-scale boondoggle.

Gullible young adults take on tens or hundreds of thousands of dollars of non-dischargeable loans, ostensibly in pursuit of either knowledge or certification. Most achieve neither, leaving severely indebted with (maybe) a few solid drinking buddies.

The liberal arts professors, themselves the small subset who did pursue knowledge but have now discovered the market for esoterica is vanishingly small, attempt to eke out a living by doubling down on their wasted time. They produce documents only a tiny minority of their peers (or the next round of dupes/students) will ever read, desperately hoping thereby to secure one of a tiny number of available tenured positions, succeeding in the equivalent of a Ponzi scheme measured in hours, not dollars.

Professors in science or technology fields, having failed to find another model to subsidize basic research, endure the tedium of teaching hungover classes of mediocre minds, genuinely convinced that a breakthrough discovery might be just over the horizon, finally legitimizing their single-minded focus over decades.

Both sets of professors prop up the "peer-reviewed journal" scam, which has somehow convinced people that such publications indicate intellectual or academic merit, despite failing every reliability test anyone has attempted.

And then, for good measure, there are also...administrators and athletic teams, for some reason. I can't even figure out what the incentive is for administrators, nor can I understand why a university athletic system makes more financial sense than a minor league system, except maybe as a publicity stunt for the whole apparatus?

In the end, I can't actually think of anyone in the system who does seem to be getting what they want from the arrangement.

Let it crumble.


“Any community that gets its laughs by pretending to be idiots will eventually be flooded by actual idiots who mistakenly believe that they're in good company.” - Earnest Hemingway

Okay, here's my attempt!

First, we take a sequence of words and represent it as a grid of numbers: each column of the grid is a separate word, and each row of the grid is a measurement of some property of that word. Words with similar meanings are likely to have similar numerical values on a row-by-row basis.

(During the training process, we create a dictionary of all possible words, with a column of numbers for each of those words. More on this later!)

This grid is called the "context". Typical systems will have a context that spans several thousand columns and several thousand rows. Right now, context length (column count) is rapidly expanding (1k to 2k to 8k to 32k to 100k+!!) while the dimensionality of each word in the dictionary (row count) is pretty static at around 4k to 8k...

Anyhow, the Transformer architecture takes that grid and passes it through a multi-layer transformation algorithm. The functionality of each layer is identical: receive the grid of numbers as input, then perform a mathematical transformation on the grid of numbers, and pass it along to the next layer.

Most systems these days have around 64 or 96 layers.

After the grid of numbers has passed through all the layers, we can use it to generate a new column of numbers that predicts the properties of some word that would maximize the coherence of the sequence if we add it to the end of the grid. We take that new column of numbers and comb through our dictionary to find the actual word that most-closely matches the properties we're looking for.

That word is the winner! We add it to the sequence as a new column, remove the first-column, and run the whole process again! That's how we generate long text-completions on word at a time :D

So the interesting bits are located within that stack of layers. This is why it's called "deep learning".

The mathematical transformation in each layer is called "self-attention", and it involves a lot of matrix multiplications and dot-product calculations with a learned set of "Query, Key and Value" matrixes.

It can be hard to understand what these layers are doing linguistically, but we can use image-processing and computer-vision as a good metaphor, since images are also grids of numbers, and we've all seen how photo-filters can transform that entire grid in lots of useful ways...

You can think of each layer in the transformer as being like a "mask" or "filter" that selects various interesting features from the grid, and then tweaks the image with respect to those masks and filters.

In image processing, you might apply a color-channel mask (chroma key) to select all the green pixels in the background, so that you can erase the background and replace it with other footage. Or you might apply a "gaussian blur" that mixes each pixel with its nearest neighbors, to create a blurring effect. Or you might do the inverse of a gaussian blur, to create a "sharpening" operation that helps you find edges...

But the basic idea is that you have a library of operations that you can apply to a grid of pixels, in order to transform the image (or part of the image) for a desired effect. And you can stack these transforms to create arbitrarily-complex effects.

The same thing is true in a linguistic transformer, where a text sequence is modeled as a matrix.

The language-model has a library of "Query, Key and Value" matrixes (which were learned during training) that are roughly analogous to the "Masks and Filters" we use on images.

Each layer in the Transformer architecture attempts to identify some features of the incoming linguistic data, an then having identified those features, it can subtract those features from the matrix, so that the next layer sees only the transformation, rather than the original.

We don't know exactly what each of these layers is doing in a linguistic model, but we can imagine it's probably doing things like: performing part-of-speech identification (in this context, is the word "ring" a noun or a verb?), reference resolution (who does the word "he" refer to in this sentence?), etc, etc.

And the "dot-product" calculations in each attention layer are there to make each word "entangled" with its neighbors, so that we can discover all the ways that each word is connected to all the other words in its context.

So... that's how we generate word-predictions (aka "inference") at runtime!

By why does it work?

To understand why it's so effective, you have to understand a bit about the training process.

The flow of data during inference always flows in the same direction. It's called a "feed-forward" network.

But during training, there's another step called "back-propagation".

For each document in our training corpus, we go through all the steps I described above, passing each word into our feed-forward neural network and making word-predictions. We start out with a completely randomized set of QKV matrixes, so the results are often really bad!

During training, when we make a prediction, we KNOW what word is supposed to come next. And we have a numerical representation of each word (4096 numbers in a column!) so we can measure the error between our predictions and the actual next word. Those "error" measurements are also represented as columns of 4096 numbers (because we measure the error in every dimension).

So we take that error vector and pass it backward through the whole system! Each layer needs to take the back-propagated error matrix and perform tiny adjustments to its Query, Key, and Value matrixes. Having compensated for those errors, it reverses its calculations based on the new QKV, and passes the resultant matrix backward to the previous layer. So we make tiny corrections on all 96 layers, and eventually to the word-vectors in the dictionary itself!

Like I said earlier, we don't know exactly what those layers are doing. But we know that they're performing a hierarchical decomposition of concepts.

Hope that helps!


Heavy exercising, especially aerobic exercising. Something light like half an hour walk isn't enough, I have to properly exhaust myself calm the fuck down. Some different workouts that are usually enough are: one hour of heavy barbell training at the gym, 45 minutes of running, two hours of brisk walking.

Also interacting people in a non-bullshit way works but is often more difficult to do. Keeping up appearances and roles works the opposite, but if I tell people what I actually think and let myself be more emotional and less reserved around them, I actually feel more connected, which alleviates anxiety.

I think that at the hear, anxiety is born out of insecurity. Being connected (in a real, non-pretentious/bullshit way) with people raises my security. However, this is often easier said than done. I guess exercising works as a sort of a patch by reducing my energy levels so much that I don't any left for my anxiety.


I used to work as a compiler engineer in the US for several years, before deciding to try starting over at the age of 30, in pure mathematics. I moved from the US to Paris in pursuit of an affordable mathematics education, and spent two years in a Masters program. I did have a considerable amount of savings, but it was very risky nevertheless: if it didn't work out, I'd be out-of-touch with compilers, and it would be hard to interview again, with a considerable career gap in my résumé.

For various reasons, mathematics didn't work out, and I was forced to interview again. Fortunately, I did manage to find a job as a compiler engineer again, and will be moving to London soon.

Now, the price of my adventure was quite steep. I uprooted my life when I moved from the US to Paris (especially because I didn't know French at the time), and the upcoming move to London will once again be difficult. I nearly halved my savings, by studying mathematics at my own expense, and will be back to earning the equivalent of my starting salary in the US.

However, I'm an adventurous person, and view my experience in positive light. I'd been wanting to study Jacob Lurie's books for the longest time, and I finally did it. I worked on a mathematical manuscript, which is now up on arXiv [1], and on a type theory project which has been submitted to LICS '23 [2]. I've had a good life in Paris, and my French is decent.

There's the larger philosophical question of "What is a life well-lived?", and for me, the answer is to pursue those things that you're truly passionate about, even if it doesn't work out.

[1]: https://arxiv.org/abs/2211.09652

[2]: https://artagnon.com/logic/νType.pdf


I got the opposite experience. I have IBD and recently started on some potent probiotics. Suddenly my energy and focus went up 1000%. So this is how regular people healthy live wow. I used to be lethargic all the time for 30+ years.

> Rarely is an awesome individual magically called upon to become a manager, particularly by poor managers who are already messing stuff up

There's a passage in Platos' Republic which is illuminating about this particular circumstance.

And I quote from [1].

""" And for this reason, I said, money and honour have no attraction for them; good men do not wish to be openly demanding payment for governing and so to get the name of hirelings, nor by secretly helping themselves out of the public revenues to get the name of thieves. And not being ambitious they do not care about honour. Wherefore necessity must be laid upon them, and they must be induced to serve from the fear of punishment.

And this, as I imagine, is the reason why the forwardness to take office, instead of waiting to be compelled, has been deemed dishonourable.

Now the worst part of the punishment is that he who refuses to rule is liable to be ruled by one who is worse than himself.

And the fear of this, as I conceive, induces the good to take office, not because they would, but because they cannot help --not under the idea that they are going to have any benefit or enjoyment themselves, but as a necessity, and because they are not able to commit the task of ruling to any one who is better than themselves, or indeed as good. """

Stuff that was true two millenia ago, still continues to be the same.

[1] - http://classics.mit.edu/Plato/republic.mb.txt


It has not. And "a lack of accountability" is a band-aid on the real problem: bad gatekeeping. People getting into science, not for the search for truth, but in search of respectability, green card, money, or whatever else. Trying to whip them into real scientists through transparency and accountability is like trying to achieve security in your home by flinging the gates and doors wide open but slapping cameras and motion detectors everywhere. Either they win, or you get fatigued.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: