More

ForOldHack · 2025-12-11T21:00:19 1765486819

Is there a mirror for this? my library has FortNight blocking it. ( bad certificate, leads them to believe its a spam site...).

libroot · 2025-12-11T22:07:09 1765490829

Weird, the certificate should be fine. Here's a link for Web Archive: https://web.archive.org/web/20251211220119/https://libroot.o...

We also have Tor onion site: http://librootfuuucybrkpvarmpswsxnbsakf2oqqzxncvsqrvc2j73kuu...

And I2P: http://xvqmnhevx32br7m4e7g3yoxfirizo4m3uktym3wnuntbgbr5bvna....

lucb1e · 2025-12-11T21:02:32 1765486952

What's FortNight? I tried looking it up but got fortnite as the top result, and forcing a literal search with quotes just brings up the dictionary definition. Sadly I don't know of a way to do a case-sensitive web search

pseudalopex · 2025-12-11T21:33:06 1765488786

They meant Fortinet possibly.

lucb1e · 2025-12-11T22:32:04 1765492324

Oh, that garbage. Should have known it would be a corporate firewall

ForOldHack · 2025-12-11T20:13:59 1765484039

i.e. The movie "The lives of others." :|

radicaldreamer · 2025-12-11T21:38:03 1765489083

If they remade that movie with a modern spin, it would be an AI model deciding who is loyal and who isn't.

ForOldHack · 2025-12-10T04:46:43 1765342003

Technical feedback: Every single announcement, like compression needs the addition of the lower limits of machine requirements. if a 64Gb model is compressed 224x times, should that not be able to be run on a 292mb video card?

anima-core · 2025-12-10T09:47:37 1765360057

No, the compression result doesn't mean the original 64 GB model can run on a 292 MB card. The teacher model isn’t the thing thats compressed. It still needs to be loaded during training.

What gets small is the student. The tiny head trained on the teacher’s first layer fields. That head ends up a few MB because it's not a transformer at all. It's basically a lightweight function approximator that reproduces the teacher’s behavior on the specific task it was trained for.

So training still requires the usual multi-GB footprint. (Which can be done offline) After training, inference with the student requires only the head. That's why inference is cheap but you can't load the full teacher into 292 MB of VRAM.

hirako2000 · 2025-12-10T06:16:13 1765347373

That's exactly what I was trying to infer from the abstract which sadly doesn't explicitly calls out memory requirements. I assume it increases inference time by getting rid of transformers. What's the memory requirements then ?

Edit: they claim these somewhere in the doc:

> Memory Teacher model: multi-GB (entire model must be loaded) AN1 head: a few MB (only head needed after training)

I find the claims surreal, can't wait for someone to validate this or I will do it myself. It would have been handy to upload such "few MB" weight file distilled off llama 70B so that we can see for ourself the 220x inference and in memory model size compression is true.

anima-core · 2025-12-10T09:51:08 1765360268

The memory story is actually much simpler than it looks.

The teacher still has to be loaded at training time, so the footprint is whatever the original model uses. Again, the compression doesn't shrink the teacher. It produces a small student head. After training, the teacher is no longer needed and the student runs by itself. That's why the inference footprint drops to a few MB.

It doesn't increase inference time at all. It removes transformers entirely from the inference path. The student computes directly on the layer-1 field, which is why it's so small and so fast.

On the request for a distilled “few MB” head for Llama 70B,that part is already reproducible right from the repo. The head is always task specific, not a general LLM, so uploading a single checkpoint wouldn't tell the whole story. The better path is to run the extraction script and train the head for any task you want. The pipeline is fully open, end to end. I'm looking for people to validate it independently.

If you need anything else cleared up, just let me know.

ForOldHack · 2025-12-09T20:51:54 1765313514

"The guy is also a complete tool." - Who says Hackers news is not filled with humor?

ForOldHack · 2025-12-09T20:49:28 1765313368

"I'm sorry, Google, I can't do that." - Gemini Cloud.

ForOldHack · 2025-12-09T20:48:36 1765313316

You are a sick, sick man, but you have taste.

ForOldHack · 2025-12-09T19:57:07 1765310227

Make no mistake, this article is of extraordinary historical significance, even the list of constantans being hardwired....

ForOldHack · 2025-12-09T19:56:11 1765310171

This is cool, but the renormalization and (Programmable and bidirectional) barrel shifter are of much more interest.

I had a 10Mhz XT, and ran a 8087-8 at a bit higher clock rate. I used it both for Lotus 1-2-3 and Turbo Pascal-87. It made Turbo Pascal significantly faster.

kens · 2025-12-09T20:20:47 1765311647

You're in luck, I wrote about the 8087's shifter back in 2020 :-) https://www.righto.com/2020/05/die-analysis-of-8087-math-cop...

ForOldHack · 2025-12-07T00:09:26 1765066166

It would be more useful to have some standards on what one could expect in terms of hardware requirements and expected performance.

ForOldHack · 2025-12-07T00:08:39 1765066119

Explain lobotomizing a Image Generator? Modern problems require modern terms.