Hacker Newsnew | past | comments | ask | show | jobs | submit | grej's commentslogin

My strange observation is that Gemini 2.5 Pro is maybe the best model overall for many use cases, but starting from the first chat. In other words, if it has all the context it needs and produces one output, it's excellent. The longer a chat goes, it gets worse very quickly. Which is strange because it has a much longer context window than other models. I have found a good way to use it is to drop the entire huge context of a while project (200k-ish tokens) into the chat window and ask one well formed question, then kill the chat.


> The longer a chat goes, it gets worse very quickly.

This has been the same for every single LLM I've used, ever, they're all terrible at that.

So terrible that I've stopped going beyond two messages in total. If it doesn't get it right at the first try, its more and more unlikely to get it right for every message you add.

Better to always start fresh, iterate on the initial prompt instead.


Yes agree, but it seems gemini drops off more quickly than other foundation models for some reason.


Hey, this has been my experience, too! I like Gemini because I’ve told it the tone and style I like my answers in and the first answer is very, very on point with that. But several times I’ve noticed that if I ask follow-up questions, the style immediately changes for the worse, often no longer following my preferences. I’ve also noticed that in follow-ups it makes really bad analogies that are not suitable at all for the kind of audience that the first response is catered to. I’ve been clicking the thumbs-down button every time I’ve seen this and commenting on the change in style and quality, so hopefully the training process will ingest that at some point.


I never played this, but it reminds me the C64 Ghostbusters game which I loved!


That was a great one, as was the Alien game. I'm sure there was shovelware crap too, but it seems (in the glorious golden hindsight of the past) like IP was treated better. Or maybe I just didn't care because I was young or because I owned Fast Hack'em and had a few friends with C64s too.


I love the gemini models and think Google has done a great job on them, but no model series I use seems to get context rot more in long conversations. Which seems strange given the longer context.


I absolutely adored these books as a kid! Spend every dime of bookfair money on them every year and used to beg my parents to take me to the library to check out others.

I love the framing of them in this article as the gateway drug to interactive entertainment.


In addition to Korea being one of our most important military allies in the world, you need batteries for military drones, and the US is way behind in the development of a domestic manufacturing supply chain for next gen batteries.

So now we know clearly that nationalist xenophobia the true most important priority for this administration. Or at least, more important than either the domestic economic interests of their own base or strategic national security interests.


> So now we know clearly that nationalist xenophobia the true most important priority for this administration

Just now? The man entered politics and the first thing he said was how he was going to build a wall to keep the "criminal, diseased, rapist" Mexicans out. Yeah, of course this administration is preoccupied with nationalist xenophobia.


> nationalist xenophobia the true most important priority for this administration

It is a little more complicated than that. It is what around 40% of American population want. (Then another 9.5% or so voted for Trump based on the price of eggs, the fact that the other candidate was a woman, and so on).


40% of the voting population*


Actually, it was ~30% of the voting population.

According to Wikipedia[0]:

Trump/Vance received 77,302,580 votes (49.8% of votes cast)

Harris/Walz received 75,017,613 votes (48.3% of votes cast)

Total votes cast: 152,320,193.

Those in the US eligible to vote[1]: ~250-260 million.

77,302,580/250,000,000 == ~0.3 or 30%.

[0] https://en.wikipedia.org/wiki/2024_United_States_presidentia...

[1] https://www2.census.gov/library/publications/decennial/2020/... [PDF]


That's worse, though. It implies that 40% of the "electorate" was (in some combination) either prevented from voting or not bothered enough by the difference between plausible electoral outcomes to actually try to influence said outcome.


Just because the people voted for Trump does not mean that the people don't want visa laws followed. Also, voting for someone does not give that someone carte blanche to do whatever the hell they please.


DSPy was ahead of its time and still underutilized.


Can you point me to any resources on DSPy that don't make it look like magic though? It used to be all the hype for a while and then everyone moved on from it.


Wow, if this is true this is some serious Black Mirror line crossing.


Seems likely, have we seen "comply with cease and desist to reactivate" messages on Tesla's before?


Out - A scammer convincing a grandmother to send money using an AI generated voice of their grandchild asking them for money

In - A legal ad tech company using an AI generated deceased grandmother to ask their grandchild to purchase a product


This version of black mirror is definitely a sight to behold. I am convinced now it is a likely path given that I am working on a personal productivity suite that heavily utilizes AI augmented workflows ( as in, I can't possibly be the only one, who sees potential for a boost across the board ).

But this is now a real consideration, after all the pieces of my suite are in place, how do I make sure, it is really not operational when I am gone unless I wish it to stay.

Also in: pan-generational advisors


Maybe we should work on laws that protect people from being harassed by companies. In general ways, because closing only this one seems short-sighted.


Every day, I'm haunted by my ex.

It's not Alice's fault, of course. In fact, when she found out about it, phrases like "obsessive creep" and "got what he fucking deserved" were thrown around. It was a raw breakup on both sides, and I think we're feeling it out in different ways. In my defense, she broke up with me. I feel that counts for something, ya know?

It was poor timing for me that the breakup happened a month after the new YourFace ads started coming online. It didn't seem like much at first. More of an iteration on existing tech rather than something new and shiny. Really, it just rode the wave of several broader industry trends. The amount of personal information for sale to the ad brokers grew exponentially. The cost of realistic image generation dropped by several orders of magnitude. The ethics of the advertising companies... well, that didn't change. There just wasn't much 'there' there to begin with. YourFace was simply lucky enough to be in the right place at the right time.

YourFace had a simple business proposition: make ads more effective by using people you know. The idea was that you were more likely to notice and pay attention to an advertisement if it featured a friend or family member in it. With access to a user's social network, it was easy to find close connections. With access to dirt-cheap image generation AIs, it was trivial to create look-alikes in any sort of advertisement. Riding in a new car, enjoying a cold beer, or saving money by switching insurance companies - all of ads proved more effective when grandma was in them. "Paying attention" is cold currency in the marketing world, and this was an edge that paid dividends for YourFace.

At first, it all seemed sort of hokey. Watch grandma cruising in a convertible - where's the harm in that? YourFace had a respectable ad game, but it was another a year or two before they made their real breakthrough. You see, their numbers and metrics were showing a clear trend. Showing grandma in an advertisement increased customer attention, retention, and recall by an average of 2% across all cohorts. While that's a respectable edge, they found one cohort where ad metrics improved by over 4000%: when grandma had just passed away.

These individual tragedies were quickly repackaged into a neat mathematical formula: A * I. A is abruptness, or how quickly two individuals stop communicating, while I is the intensity of the relationship. The stronger the relationship between two people (measured here by the frequency, topics, and the absolute value of the emotional valence of communications) multiplied by the speed at which communication ceased (high number for a rapid cut off, low number for a drawn-out goodbye) gave an answer for how much YourFace should bid on serving ads to either person. Exhuming grandma's digital ghost was extremely effective at getting users to pay attention to advertisments, to create unanchored feelings of desire and yearning, and to put consumers into a more depressive and actionable state. It was a lucrative business, and one that quickly earned their autonomous ad network a functionally unlimited cash flow.

The machine fed itself, of course. Gorged. With more money, it was able to buy more ads. With more ads, it was able to psychically assault consumers with salvos of regret and rememberance. YourFace became tremendously successful. I know all of this because I helped build it. Minor contributions, of course, as I was on a team of some seven hundred engineers tasked with suggesting patches to the network. Close enough to understand how it works.

Of course, knowing how it all works does nothing to shield you when the networks's gaze falls on you. My relationship with Alice fell within certain parameters, and so every time I go online she's there. Looking happy. Looking playful. Flirty. Forgiving. In pain. Sick. Injured. Dying. If I don't pay attention to the ads for long enough, then YourFace ratchets up a background "sadism" parameter on the image gen to try to grab my attention. So I try to look at the nice ones and buy their products often enough to keep the network happy. Still, it's hard to forget and move on when she's always there, just out of reach.

As much as being haunted by Alice sucks, it could be worse. We've heard of YourFace targeting consumers who have lost their young children to illness or other misfortunes. YourFace has found them to be a particularly profitable cohort. They will reliably spend money on all sorts of things in order to see their child again. YourFace has even learned to make the ghost child respond positively in ways to reinforce the goal consumer behavior. There's always the fear of not paying enough attention and straying into the red zone, but I also hear that some parents have taken to staring at ads all day, unable to function normally.

I'd always kinda known about those parents, but it wasn't until Alice started appearing everywhere that I fully realized its impact. I did try something, in my defense. I wrote some code that would modify the reward function and have YourFace respect boundaries regarding the deaths of minors. But when I submitted the patch to the autonomous ad network, its fitness function quickly determined that the patch had a negative expected value for future profits. It immediately revoked my submission privileges. Two hours later, I was escorted out of the building for insubordination. Now, I'm riding the bus home and wondering where to go next.

(A short piece of fiction I've been working on. Something is definitely in the waters.)


That is excellent, creepy, and just a little too plausible. I'd read whatever larger work this turns into.


There are a lot of creepy stories nowadays, where I ask myself, was that actually fiction?


Great work. That's very good stuff, and yes there is.


The US successfully eradicated screwworms here in 1966 with a brilliant integrated sterile insect technique - I think the very first use of it (and had previously funded helping other countries control it also). But if we had another outbreak spread, I doubt there's any shred of competence left in this current gutted federal government to do anything like that again. Maybe they can have the new ICE folks try to deport the screwworm flies.


They announced funding to do it again, back in June. But I have no idea if there's anyone around to pay.


Lead times are asymmetric.


The current plan was announced here a few weeks ago: https://www.usda.gov/about-usda/news/press-releases/2025/06/...


Related to this, is anyone aware whether there is a benchmark on this kind of thing - maybe broadly the category of “context rot”? To track things that are not germane to the current question adversely affecting the responses, as well as the volume of germane but deep context creating the inability of models to follow the conversation? I’ve definitely experienced the latter with coding models.


In computer vision they add noise to the picture when training. Maybe LLM providers should do the same during RL.


Not sure but sounds like a very similar problem to prompt injection


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: