Early LLMs used to have this often. I think's that where the "repetition penalty" parameter comes from. I suspect output quality can be improved with better sampling parameters.
I wonder why Gmail and other email providers don't just run an LLM/ML pipeline to detect phishing emails. It seems that matching an email's content with the sender's domain (and possibly analyzing the content behind links) would be enough to show, with high certainty, a warning like "Beware: this looks like a phishing email." Is it too expensive? Too many false positives?
I think you're about 20 years behind the times if you think they don't.
There are a whole lot of problems with it when you start pressing the finer details like you list. For example, just look at the legit emails banks send out. They will tell you not to click links claiming to be your bank, then include links (claiming to be your bank) for more information.
Simply put the rules block too much corporate email because people that write corporate email do lots of dumb things with the email system.
It's true that a lot of established ML techniques were first popularized to fight spam (ie bayesian filtering), but it might also be the case that they're not applying the full might of eg Gemini-3-Pro to every email received. I suspect Gemini-3-Pro would do an effectively perfect job of determining if something is phishing, with negligible values in the false quadrants of the confusion matrix, but it's probably too expensive to use in that way. Which is why things like this can still slip through.
The most essential check is SPF and DKIM which authenticate if the message has come from an authorized server. The problem is that most mail services are too lenient with mismatched sender identification. On one hand, people would be quite vocal about their mail provider sending way too much legitimate (but slightly misconfigured) mail to the spam folder. However it allows situations like to happen where the FROM header, the "From:" address, and the return path are all different.
Most mail systems have several stages of filters, and the first ones (checking authentication) are quite basic. After that, attachments, links, and contents are checked for known malware. Machine learning might kick in after this, if certain criteria are met. Mail security is very complicated and works well except for the times it falls flat on its face like this.
I think they mean that they trained the tool-calling capabilities to skip personal information in tool call arguments (for RAG), or something like that. You need to intentionally train it to skip certain data.
>every time the whole conversation history gets reprocessed
Unless they're talking about the memory feature, which is some kind of RAG that remembers information between conversations.
I tried generating code with ChatGPT 5.2, but the results weren't that great:
1) It often overcomplicates things for me. After I refactor its code, it's usually half the size and much more readable. It often adds unnecessary checks or mini-features 'just in case' that I don't need.
2) On the other hand, almost every function it produces has at least one bug or ignores at least one instruction. However, if I ask it to review its own code several times, it eventually finds the bugs.
I still find it very useful, just not as a standalone programming agent. My workflow is that ChatGPT gives me a rough blueprint and I iterate on it myself, I find this faster and less error-prone. It's usually most useful in areas where I'm not an expert, such as when I don't remember exact APIs. In areas where I can immediately picture the entire implementation in my head, it's usually faster and more reliable to write the code myself.
Well, like I pointed out somewhere else, VS Code gives it a set of prompts and tools that makes it very effective for me. I see that a lot of people are still copy/pasting stuff instead of having the “integrated” experience, and it makes a real difference.
>It completely breaks the censor's standard playbook of IP enumeration. You can't just block a specific subnet without risking blocking future legitimate allocations
At least in Russia, they don't really care about collateral damage. Currently, without a VPN, I can't open like 30-50% links on Hacker News (mostly collateral damage after they banned large portions of IPs)
Interesting, zer- seems to be similar to Slavic raz- somewhat.
In Russian: davit - press, razdavit - to crush.
Siedlung corresponds to Russian selenie "settlement". Zersiedlung appears to correspond to "rasselenie" (morphologically) and it means more like settlement as a dispersion, movement from a single point in different outward directions.
So I suspect zer- doesn't mean destruction per se, it's just that destruction often involves this movement of parts in outward directions from an original center, which explains the frequent association of zer- with destruction.
My impression is that German prefixes don't have such well-defined meanings that a new word created with one automatically has an unambiguous definition relative to the base word. There seem to be parallels with raz- but I'm not sure if they have a common root.
>From the home page, I figured it was some text-based game or experiment and closed the page.
Same, my first thought was that it's some pentesting game where you're given a VM and your task is to somehow break it. The line "the disk persists. you have sudo" sounds like game rules.
>A world is consistent and has a set of rules that must be followed.
Large language models are mostly consistent, but they have mistakes even in grammar too, from time to time. And it's usually called a "hallucination". Can't we say physics errors are a kind of "hallucination" too, in a world model? I guess the question is, what hallucination rate are we willing to tolerate.
It's not about making no mistakes, it's about the categorical type of mistakes.
Let's consider language as a world, in some abstract sense. Lies may (or may not) be consistent here. Do they make sense linguistically? But then think about the category of errors where they start mixing languages and sound entirely nonsensical. That's rare with current LLMs in standard usage but you can still get them to have full on meltdowns.
This is the class of mistakes these models are making, not the failing to recite truth class of mistakes.
(Not a perfect translation but I hope this explanation helps)
>It is not that different from German in this matter.
Russian inflection changes the stress. In German it's fixed. Inflectional forms are much more varied in Russian. Colloquial German is much more analytical (past tense is almost always "ich habe" + participle). German has devolved to basically 3 cases at this point (with genitive dying out), compared to Russian's 6. But conceptually, they're very similar indeed.
If you just want to be understood, Russian is not very hard. I think it's true for any language. To master it, however...
reply