More

yorwba · 2025-11-08T16:36:26 1762619786

Synthetic data doesn't have to come from an LLM. And that paper only showed that if you train on a random sample from an LLM, the resulting second LLM is a worse model of the distribution that the first LLM was trained on. When people construct synthetic data with LLMs, they typically do not just sample at random, but carefully shape the generation process to match the target task better than the original training distribution.

yorwba · 2025-11-07T05:37:18 1762493838

Currently on the front page: https://news.ycombinator.com/item?id=45833162

yorwba · 2025-11-07T04:37:17 1762490237

Even when you do get a sync conflict, Syncthing will rename one of the copies and then you can have KeePassXC merge the two files back into one. So that's still pretty much hassle-free.

yorwba · 2025-11-06T21:13:33 1762463613

It's true that to train more information into the model you need more trainable parameters, but when people ask for small models, they usually mean models that run at acceptable speeds on their hardware. Techniques like mixture-of-experts allow increasing the number of trainable parameters without requiring more FLOPs, so they're large in one sense but small in another.

And you don't necessarily need to train all information into the model, you can also use tool calls to inject it into the context. A small model that can make lots of tool calls and process the resulting large context could obtain the same answer that a larger model would pull directly out of its weights.

yorwba · 2025-11-06T18:45:34 1762454734

The US lost in the gambling case because their restrictions on foreign websites were stricter than those on domestic ones. The GATS doesn't prohibit countries from regulating trade, they only have to do so in a non-discriminatory manner. Spain isn't blocking foreign websites for copyright infringement that would be legal domestically, so they're in compliance with their obligations.

yorwba · 2025-11-06T11:10:19 1762427419

It's called checked luggage for a reason. They already check every bag, so checking for one more thing is hardly a problem.

yorwba · 2025-11-06T08:57:06 1762419426

The "attention is all you need" paper did not invent attention mechanisms. It showed that existing models that were already using attention could have their non-attention parts removed and still worked. So those other parts were unnecessary and only attention was needed.

yorwba · 2025-11-05T21:40:43 1762378843

Most libraries probably don't stock many books like that. They'd just waste shelf space until they get discarded in the end.

yorwba · 2025-11-05T13:13:07 1762348387

It is a compiler. It is not a compiler for Python, because there are valid Python programs it can't compile and isn't intended to compile.

yorwba · 2025-11-03T21:14:32 1762204472

The memory is model-readable but not model-writable, so you still need to train via backprop to get the memory to store useful data.