Gemini 2.5 Flash Image

fariszr · 2025-08-26T15:15:30 1756221330

This is the gpt 4 moment for image editing models. Nano banana aka gemini 2.5 flash is insanely good. It made a 171 elo point jump in lmarena!

Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111

qingcharles · 2025-08-26T16:53:15 1756227195

I've been testing it for several weeks. It can produce results that are truly epic, but it's still a case of rerolling the prompt a dozen times to get an image you can use. It's not God. It's definitely an enormous step though, and totally SOTA.

spaceman_2020 · 2025-08-26T17:09:10 1756228150

If you compare to the amount of effort required in Photoshop to achieve the same results, still a vast improvement

qingcharles · 2025-08-26T17:19:50 1756228790

I work in Photoshop all day, and I 100% agree. Also, I just retried a task that wouldn't work last night on nano-banana and it worked first time on the released model, so I'm wondering if there were some changes to the released version?

spaceman_2020 · 2025-08-26T21:05:10 1756242310

We had an exhibition some time back where I used AI to generate the posters for our product. This is a side project and not something we do seriously, but the results were outstanding - better than what the majority of much bigger exhibitors had.

It took me a LOT of time to get things right, but if I was to get an actual studio to make those images, it would have cost me a thousands of dollars

Bombthecat · 2025-08-27T11:32:10 1756294330

Yeah, played around with it, it created an amazing poster for starfinder ttrpg ( something like DND) with specifies who looked really! Good. Usually stuff likes this fails hard, since there isn't much training data of unique fantasy creatures.

But flash 2.5? Worked! It did it, crazy stuff

Bombthecat · 2025-08-27T11:26:57 1756294017

How many times did you tried? I uploaded a black and white photo and let it colourize, something like 20 percent were still black and white.

echelon · 2025-08-26T18:54:07 1756234447

Vibe coding might not be real, but vibe graphics design certainly is.

https://imgur.com/a/internet-DWzJ26B

Anyone can make images and video now.

cwmoore · 2025-08-27T04:11:20 1756267880

Are those oil derricks, or wind turbines? Who cares! Graphic design is easy now!

viraptor · 2025-08-27T10:24:11 1756290251

They're Australian farm windmills https://media.istockphoto.com/id/959193466/photo/australian-...

(But yeah, some got a generator attached...)

lebimas · 2025-08-26T20:37:34 1756240654

What tools did you use to make those videos from the PG image?

echelon · 2025-08-26T20:55:40 1756241740

I used a bunch of models in conjunction:

- Midjourney (background)

- Qwen Image (restyle PG)

- Gemini 2.5 Flash (editing in PG)

- Gemini 2.5 Flash (adding YC logo)

- Kling Pro (animation)

I didn't spend too much time correcting mistakes.

I used a desktop model aggregation and canvas tool that I wrote [1] to iterate and structure the work. I'll be open sourcing it soon.

[1] https://getartcraft.com

kstenerud · 2025-08-26T23:47:22 1756252042

The app looks interesting, but I think it needs some documentation. I think I generated something? Maybe? I saw a spinny thing for awhile, but then nothing.

I couldn't get the 3d thing to do much. I had assets in the scene but I couldn't for the life of me figure out how to use the move, rotate or scale tools. And the people just had their arms pointing outward. Are you supposed to pose them somehow? Maybe I'm supposed to ask the AI to pose them?

Inpainting I couldn't figure out either... It's for drawing things into an existing image (I think?) but it doesn't seem to do anything other than show a spinny thing for awhile...

I didn't test the video tool because I don't have a midjourney account.

unixhero · 2025-08-27T07:40:41 1756280441

What is PG?

sethaurus · 2025-08-27T07:46:44 1756280804

In this context, it's Paul Graham, the head Y Combinator guy whose cartoon likeness appears in the generated video: https://news.ycombinator.com/user?id=pg

fhd2 · 2025-08-27T07:46:12 1756280772

Paul Graham, Y Combinator founder.

spaceman_2020 · 2025-08-26T21:09:06 1756242546

Midjourney with style references is just about the easiest way right now for an absolute noob to get good aesthetics

bacchusracine · 2025-08-28T12:44:11 1756385051

This post may or may not violate our community standards so we aren't going to display it.

throwaway638637 · 2025-08-27T01:14:36 1756257276

What is up with that T rex's arms?

benreesman · 2025-08-27T14:02:16 1756303336

I think much like coding, the top of the game is all the old stuff and a bunch of new stuff that is impossible to master without some real math or at least outlier mathematical intuition.

The old top of the game is available to more people (though mid level people trying to level up now face a headwind in a further decoupling of easily read signals and true taste, making the old way of developing good taste harder).

This stuff makes people who were already "master rate" who are also nontrivially sophisticated machine learning hobbyists minimum and drives their peak and frontier out, drives break even collaboration overhead down.

It's always been possible to DIY code or graphic design, it's always been possible to tell the efforts of dabblers and pros apart, and unlike many commodities? There is rarely a "good enough". In software this is because compute is finite and getting more out of it pays huge, uneven returns, in graphic design its because extreme quality work is both aesthetically pleasing as well as a mark of quality (imperfect but a statement someone will commit resources).

And it's just hard to see it being different in any field. Lawyers? Opposing counsel has the best AI, your lawyer better have it too. Doctors? No amount of health is "enough" (in general).

I really think HN in particular but to some extent all CNBC-adjacent news (CEO OnlyFans stuff of all categories) completely misses the forest (the gap between intermediate and advanced just skyrocketed) for the trees (space-filling commodity knowledge work just plummeted in price).

But "commodity knowledge work" was always kind of an oxymoron, David Graeber called such work "bullshit jobs". You kinda need it to run a massive deficit in an over-the-hill neoliberal society, it's part of the " shift from production to consumption" shell game. But it's a very recent, very brief thing that's already looking more than wobbly. Outside of that? Apprentices, journeymen, masters is the model that built the world.

AI enables a new even more extreme form of mastery, blurs the line between journeyman and dabbler, and makes taking on apprentices a much longer-term investment (one of many reasons the PRC seems poised to enjoy a brief hegemony before demographics do in the Middle Kingdom for good, in China, all the GPUs run Opus, none run GPT-5 or LLaMA Behemoth).

The thing I really don't get is why CEOs are so excited about this and I really begin to suspect they haven't as a group thought it through (Zuckerberg maybe has, he's offering Tulloch a billion): the kind of CEO that manages a big pile of "bullshit jobs"?

AI can do most of their job today. Claude Opus 4.1? It sounds like if a mid-range CEO was exhaustively researched and gaff immune. Ditto career machine politicians. AI non practitioner prognosticators. That crowd.

But the top graphic communications people and CUDA kernel authors? Now they have to master ComfyUI or whatever and the color theory to get anything from it that stands out.

This is not a democratizing thing. And I cannot see it accruing to the Zuckerberg side of the labor/capital divvy up without a truly durable police state. Zuck offering my old chums nation state salaries is an extreme and likely transitory thing, but we know exactly how software professional economics work when it buckets as "sorcery" and "don't bother": that's 1950 to whenever we mark the start of the nepohacker Altman Era, call it 2015. In that world good hackers can do whatever they want, whenever they want, and the money guys grit their teeth. The non-sorcery bucket has paper mache hack-magnet hackathon projects in it at a fraction of the old price. So disruption, wow.

Whether that's good or bad is a value judgement I'll save for another blog post (thank you for attending my TED Talk).

captnFwiffo · 2025-08-26T20:17:33 1756239453

Sure, now the client wants 130 edits without losing coherency with the original. What does a vibe designer do? Just keep re-prompting and re-generating until it works? Sounds hard to me.

Filligree · 2025-08-27T11:45:24 1756295124

They use Kontext, Qwen-Edit or Gemini.

petralithic · 2025-08-27T11:24:48 1756293888

Why would you compare it to Photoshop? If you compare it to other tools in the same category, of image generation, you will find models like Flux and Qwen do much better.

vitorgrs · 2025-08-27T05:14:18 1756271658

The model seems good, but it seems to have huge issues in doing garbage most of times lol.

Still needs more RLHF tuning I guess? As the previous version was even worse.

druskacik · 2025-08-26T17:12:40 1756228360

Is it because the model is not good enough at following the prompt, or because the prompt is unclear?

Something similar has been the case with text models. People write vague instructions and are dissatisfied when the model does not correctly guess their intentions. With image models it's even harder for model to guess it right without enough details.

toddmorey · 2025-08-26T18:51:51 1756234311

Remember in image editing, the source image itself is a huge part of the prompt, and that's often the source of the ambiguity. The model may clearly understand your prompt to change the color of a shirt, but struggle to understand the boundaries of the shirt. I was just struggling to use AI to edit an image where the model really wanted the hat in the image to be the hair of the person wearing it. My guess for that bias is that it had just been trained on more faces without hats than with them on.

qingcharles · 2025-08-26T17:22:03 1756228923

No, my prompts are very, very clear. It just won't follow them sometimes. Also this model seems to prefer shorter prompts, in my experience.

ericlang · 2025-08-26T18:43:56 1756233836

How did you get early access? Thanks.

Thorrez · 2025-08-26T21:20:20 1756243220

I believe lmarena.

hapticmonkey · 2025-08-26T21:29:17 1756243757

Before AI, people complained that Google was taking world class engineering talent and using it for little more than selling people ads.

But look at that example. With this new frontier of AI, that world class engineering talent can finally be put to use…for product placement. We’ve come so far.

vineyardmike · 2025-08-27T06:44:04 1756277044

> finally be put to use…for product placement.

Did you think that Google would just casually allow their business to be disrupted without using the technology to improve the business and also protecting their revenue?

Both Meta and Google have indicated that they see Generative AI as a way to vertically integrate within the ad space, disrupting marketing teams, copyrighters, and other jobs who monitor or improve ad performance.

Also FWIW, I would suspect that the majority of Google engineers don't work on an ad system, and probably don't even work on a profitable product line.

johnfn · 2025-08-27T11:55:17 1756295717

Oh come on - you have this incredible technology at your disposal and all you can think to use it for is product placement?

torginus · 2025-08-27T18:25:14 1756319114

I am pretty sure a lot of said engineering talent isn't actually contributing to AI but doing other stuff

torginus · 2025-08-26T19:44:57 1756237497

Another nitpick - the pink puffer jacket that got edited into the picture is not the same as the one in the reference image - it's very similar but if I were to use this model for product placement, or cared about these sort of details, I'd definitely have issues with this.

drmath · 2025-08-27T02:01:08 1756260068

Even in the just-photoshop-not-ai days product photos had become pretty unreliable as a means of understanding what you're buying. Of course it's much worse now.

ethbr1 · 2025-08-27T02:19:30 1756261170

Note: Please understand that monitor may color different. If image does not match product received then kindly your monitor calibration. Seller not responsible. /ebay&amazon

wiz21c · 2025-08-27T06:09:22 1756274962

look at the bottom of the sleeves, they don't match. the bottom of the jacket doesn't match either.

I didn't see it at first sight but it certainly is not the same jacket. If you use that as an advertisement, people can sue you for lying about the product.

dcre · 2025-08-26T15:58:44 1756223924

Alarming hands on the third one: it can't decide which way they're facing. But Gemini didn't introduce that, it's there in the base image.

725686 · 2025-08-26T19:19:03 1756235943

Yes, the base image's hands are creepy.

meatmanek · 2025-08-26T22:31:13 1756247473

I noticed the AI pattern on the sunglasses first. I guess all of the source images are AI-generated? In a sense, that makes the result slightly less impressive -- is it going to be as faithful to the original image when the input isn't already a highly likely output for an AI model? Were the input images generated with the same model that's being used to manipulate them?

dcre · 2025-08-27T15:32:48 1756308768

It doesn't seem to matter: people have posted tons of examples on social media of non-AI base images that it was equally able to hold steady while making edits.

ceroxylon · 2025-08-26T15:37:53 1756222673

It seems like every combination of "nano banana" is registered as a domain with their own unique UI for image generation... are these all middle actors playing credit arbitrage using a popular model name?

bonoboTP · 2025-08-26T15:55:03 1756223703

I'd assume they are just fake, take your money and use a different model under the hood. Because they already existed before the public release. I doubt that their backend rolled the dice on LMArena until nano-banana popped up. And that was the only way to use it until today.

ceroxylon · 2025-08-26T16:07:58 1756224478

Agreed, I didn't mean to imply that they were even attempting to run the actual nano banana, even through LMarena.

There is a whole spectrum of potential sketchiness to explore with these, since I see a few "sign in with Google" buttons that remind me of phishing landing pages.

vunderba · 2025-08-26T17:02:05 1756227725

They're almost all scams. Nano banana AI image generator sites were showing up when this model was still only available in LM Arena.

93po · 2025-08-26T17:52:56 1756230776

Completely agree - I make logos for my github projects for fun, and the last time I tried SOTA image generation for logos, it was consistently ignoring instructions and not doing anything close to what i was asking for. Google's new release today did it near flawlessly, exactly how I wanted it, in a single prompt. A couple more prompts for tweaking (centering it, rotating it slightly) got it perfect. This is awesome.

ivape · 2025-08-26T20:35:00 1756240500

Regardless, it seems Google is on the frontier of every type of model and robotics (cars). It’s nutty how we forget what a intellectual juggernaut they are.

fariszr · 2025-08-26T21:00:19 1756242019

Tool use and sycophancy are still big issues in gemini 2.5 models.

summerlight · 2025-08-26T18:41:08 1756233668

I wonder how the creative workflow looks like when this kind of models are natively integrated into digital image tools. Imagine fine-grained controls on each layer and their composition with the semantic understanding on the full picture.

koakuma-chan · 2025-08-26T15:46:01 1756223161

Why is it called nano banana?

ehsankia · 2025-08-26T16:44:18 1756226658

Before a model is announced, they use codenames on the arenas. If you look online, you can see people posting about new secret models and people trying to guess whose model it is.

mvdtnz · 2025-08-26T18:37:03 1756233423

What are "the arenas"?

patates · 2025-08-26T18:48:15 1756234095

Blind rating battlegrounds, one is https://lmarena.ai/ (first google result)

kstenerud · 2025-08-27T00:09:02 1756253342

I don't quite get what this is? I asked the AI on the site "What is imarena.ai?" and it just gave some hallucinated answer that made no sense.

adventured · 2025-08-27T01:14:12 1756257252

People vote on the performance of AI, generating ranking boards.

kstenerud · 2025-08-27T05:19:47 1756271987

Ah, that was the missing piece of information! Thanks!

Jensson · 2025-08-26T15:50:06 1756223406

Engineers often have silly project names internally, then some marketing team rewrites the name for public release.

ZephyrBlu · 2025-08-26T16:17:26 1756225046

I'm pretty sure it's because an image of a banana under a microscope generated by the model went super viral

polynomial · 2025-08-26T19:44:22 1756237462

Or was that just marketing?

rplnt · 2025-08-26T16:48:43 1756226923

Oh no, even more mis-scaled product images.

torginus · 2025-08-26T21:02:40 1756242160

No, it's not really that much of an improvement. Once you start coming up with specific tasks, it fails just like the others.

littlestymaar · 2025-08-27T11:59:00 1756295940

> An example. https://x.com/D_studioproject/status/1958019251178267111

“Nano banana” is probably good, given its score on the leaderboard, but the examples you show don't seem particularly impressive, it looks like what Flux Kontext or Qwen Image do well already.

polishdude20 · 2025-08-27T00:52:21 1756255941

The fingernails on one of them. Ohhh nooo

ethbr1 · 2025-08-27T02:21:43 1756261303

Image genai made me realize just how inattentive to detail a lot of people are.

goosejuice · 2025-08-27T02:28:22 1756261702

Yet it's failed spectacularly at almost everything I've given it.

r33b33 · 2025-08-27T08:40:06 1756284006

nano banana is good, but not insanely good

2025-08-27T06:20:32 1756275632

[dead]

mooncakes_ooohh · 2025-08-27T06:32:39 1756276359

Be gone scammer

fHr · 2025-08-26T20:47:51 1756241271

echelon · 2025-08-26T15:59:17 1756223957

> This is the gpt 4 moment for image editing models.

No it's not.

We've had rich editing capabilities since gpt-image-1, this is just faster and looks better than the (endearingly? called) "piss filter".

Flux Kontext, SeedEdit, and Qwen Edit are all also image editing models that are robustly capable. Qwen Edit especially.

Flux Kontext and Qwen are also possible to fine tune and run locally.

Qwen (and its video gen sister Wan) are also Apache licensed. It's hard not to cheer Alibaba on given how open they are compared to their competitors.

We've left the days of Dall-E, Stable Diffusion, and Midjourney of "prompt-only" text to image generation.

It's also looking like tools like ComfyUI are less and less necessary as those capabilities are moving into the model layer itself.

raincole · 2025-08-26T16:03:34 1756224214

In other words, this is the gpt 4 moment for image editing models.

Gpt4 isn't "fundamentally different" from gpt3.5. It's just better. That's the exact point the parent commenter was trying to make.

jug · 2025-08-26T20:18:02 1756239482

I'd say it's more like comparing Sonnet 3.5 to Sonnet 4. GPT-4 was a rather fundamental improvement. It jumped to professional applications compared to the only causal use you could use ChatGPT 3.5 for.

retinaros · 2025-08-26T16:06:11 1756224371

did you see the generated pic demis posted on X? it looks like slop from 2 years ago. https://x.com/demishassabis/status/1960355658059891018

raincole · 2025-08-26T16:11:29 1756224689

I've tested it on Google AI Studio since it's available to me (which is just a few hours so take it with a grain of salt). The prompt comprehension is uncannily good.

My test is going to https://unsplash.com/s/photos/random and pick two random images, send them both and "integrate the subject from the second image into the first image" as the prompt. I think Gemini 2.5 is doing far better than ChatGPT (admittedly ChatGPT was the trailblazer on this path). FluxKontext seems unable to do that at all. Not sure if I were using it wrong, but it always only considers one image at a time for me.

Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.

echelon · 2025-08-27T00:51:13 1756255873

> FluxKontext

Flux Kontext is an editing model, but the set of things it can do is incredibly limited. The style of prompting is very bare bones. Qwen (Alibaba) and SeedEdit (ByteDance) are a little better, but they themselves are nowhere near as smart as Gemini 2.5 Flash or gpt-image-1.

Gemini 2.5 Flash and gpt-image-1 are in a class of their own. Very powerful instructive image editing with the ability to understand multiple reference images.

> Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.

Both gpt-image-1 and Gemini 2.5 Flash feel like "Comfy UI in a prompt", but they're still nascent capabilities that get a lot wrong.

When we get a gpt-image-1 with Midjourney aesthetics, better adherence and latency, then we'll have our "GPT 4" moment. It's coming, but we're not there yet.

They need to learn more image editing tricks.

krackers · 2025-08-26T18:31:25 1756233085

I'm confused as well, I thought gpt-image could already do most of these things, but I guess the key difference is that gpt-image is not good for single point edits. In terms of "wow" factor it doesn't feel as big as gpt 3->4 though, since it sure _felt_ like models could already do this.

echelon · 2025-08-26T18:56:25 1756234585

People really slept on gpt-image-1 and were too busy making Miyazaki/Ghibli images.

I feel like most of the people on HN are paying attention to LLMs and missing out on all the crazy stuff happening with images and videos.

LLMs might be a bubble, but images and video are not. We're going to have entire world simulation in a few years.

fariszr · 2025-08-26T20:27:18 1756240038

I'm sorry I absolutely don't agree. This model is on a whole other level.

It's not even close. https://twitter.com/fareszr/status/1960436757822103721

bsenftner · 2025-08-27T12:54:08 1756299248

I'm totally with you. Dismayed by all these fanbois.

vunderba · 2025-08-26T16:49:39 1756226979

I've updated the GenAI Image comparison site (which focuses heavily on strict text-to-image prompt adherence) to reflect the new Google Gemini 2.5 Flash model (aka nano-banana).

https://genai-showdown.specr.net

This model gets 8 of the 12 prompts correct and easily comes within striking distance of the best-in-class models Imagen and gpt-image-1 and is a significant upgrade over the old Gemini Flash 2.0 model. The reigning champ, gpt-image-1, only manages to edge out Flash 2.5 on the maze and 9-pointed star.

What's honestly most astonishing to me is how long gpt-image-1 has remained at the top of the class - closing in on half a year which is basically a lifetime in this field. Though fair warning, gpt-image-1 is borderline useless as an "editor" since it almost always changes the whole image instead of doing localized inpainting-style edits like Kontext, Qwen, or Nano-Banana.

Comparison of gpt-image-1, flash, and imagen.

https://genai-showdown.specr.net?models=OPENAI_4O%2CIMAGEN_4...

bla3 · 2025-08-26T19:17:03 1756235823

Why do Hunyuan, OpenAI 4o and Gwen get a pass for the octopus test? They don't cover "each tentacle", just some. And midjourney covers 9 of 8 arms with sock puppets.

vunderba · 2025-08-26T19:22:47 1756236167

Good point. I probably need to adjust the success pass ratios to be a bit stricter, especially as the models get better.

> midjourney covers 9 of 8 arms with sock puppets.

Midjourney is shown as a fail so I'm not sure what your point is. And those don't even look remotely close to sock puppets, they resemble stockings at best.

bn-l · 2025-08-26T18:11:57 1756231917

You need a separate benchmark for editing of course

cubefox · 2025-08-26T23:36:46 1756251406

What's interesting is that Imagen 4 and Gemini 2.5 Flash Image look suspiciously similar in several of these tests cases. Maybe Gemini 2.5 Flash first calls Imagen in the background to get a detailed baseline image (diffusion models are good at this) and then Gemini edits the resulting image for better prompt adherence.

pkach · 2025-08-27T11:32:16 1756294336

Yes, saw on a reddit about an employee confirming this is the case (at least on Gemini app) where the request for an image from scratch is routed to imagen and the follow-up edits are done using Gemini.

MrOrelliOReilly · 2025-08-27T08:31:07 1756283467

This is incredibly useful! I was manually generating my own model comparisons last night, so great to see this :)

I will note that, personally, while adherence is a useful measure, it does miss some of the qualitative differences between models. For your "spheron" test for example, you note that "4o absolutely dominated this test," but the image exhibits all the hallmarks of a ChatGPT-generated image that I personally dislike (yellow, with veiny, almost impasto brush strokes). I have stopped using ChatGPT for image generation altogether because I find the style so awful. I wonder what objective measures one could track for "style"?

It reminders be a bit of ChatGPT vs Claude for software development... Regardless of how each scores on benchmarks, Claude has been a clear winner in terms of actual results.

vunderba · 2025-08-27T18:30:07 1756319407

Yeah - unfortunately the ubiquitous "piss filter" strikes again. You pretty much have to pass GPT-image-1 through a tone map, LUT, etc. in something like Krita or Photoshop to try to mitigate this. I'm honestly a bit surprised that they haven't built this in already given how obvious the color shift is.

gundmc · 2025-08-26T18:11:54 1756231914

> Though fair warning, gpt-image-1 is borderline useless as an "editor" since it almost always changes the whole image instead of doing localized inpainting-style edits like Kontext, Qwen, or Nano-Banana.

Came into this thread looking for this post. It's a great way to compare prompt adherence across models. Have you considered adding editing capabilities in a similar way given the recent trend of inpainting-style prompting?

vunderba · 2025-08-26T19:36:35 1756236995

Adding a separate section for image editing capabilities is a great idea.

I've done some experimentation with Qwen and Kontext and been pretty impressed, but it would be nice to see some side by sides now that we have essentially three models that are capable of highly localized in-painting without affecting the rest of the image.

https://mordenstar.com/blog/edits-with-kontext

dostick · 2025-08-28T09:49:25 1756374565

For editing prompts testing it is best to start with “only change …” to prevent model from changing everything. Even Nano banana does that.

jay_kyburz · 2025-08-26T22:06:19 1756245979

I really like your site.

Do you know of any similar sites that that compares how well the various models can adhere to a style guide? Perhaps you could add this?

I.e. pride the model with a collection of drawings in a single style, then follow prompts and generate images in the same style?

For example if you wanted to illustrate a book, and have all the illustrations look like they were from the same artists.

vunderba · 2025-08-27T18:33:51 1756319631

Hi Jay, unfortunately I haven't see a site like that but being able to rank models in terms of "style adherence" but it would be a nice feature.

It's basically a necessity if you're working on something like a game or comic where you need consistency around characters, sprites, etc.

mrcwinn · 2025-08-27T15:47:22 1756309642

I really enjoyed reviewing this! Good work.

carlosbaraza · 2025-08-26T21:32:00 1756243920

Unfortunately, it suffers from the same safetyism than other many releases. Half of the prompts get rejected. How can you have character consistency if the model is forbidden from editing any human. And most of my photo editing involves humans, so basically this is just a useless product. I get that Google doesn't want to be responsible for deep fake advances, but that seems inevitable, so this is just slightly delaying progress. Eventually we will have to face it and allow for society to adapt.

This trend of tools that point a finger at you and set guardrails is quite frustrating. We might need a new OSS movement to regain our freedom.

Workaccount2 · 2025-08-26T22:08:00 1756246080

I have an old photo of my girlfriend with her cousin when they were young, wearing Christmas dresses in front of the tree, not long before they were separated to other sides of the world for decades now. The photo is itself low quality on top of the photo itself being physically beat up.

So far no model is willing to clean it up :/

gaudystead · 2025-08-26T23:39:31 1756251571

There are reddit communities (I admittedly don't remember which, but could probably be found from a simple search) where people will offer their photo editing skills to touch up the photo, often for free. Could be worth trying a real human if the robots are going full HAL 9000 and telling you they can't do it.

boulos · 2025-08-27T08:05:53 1756281953

https://www.reddit.com/r/PhotoshopRequest/

People sometimes do it for free ("my son died, and this is the only photo I have") or for an agreed upon tip.

boppo1 · 2025-08-27T15:40:07 1756309207

If you are not personally offended by looking at CRAZY pornography, you could start digging into the comfyui ecosystem. It's not all porn, there are lots of pro photo-manipulators doing sfw stuff, but the community overlap with NSFW is basically borderless, so you'll probably bump into it.

However, the results the comfyui people get are lightyears ahead of any oneshot-prompt model. Either you can find someone to do cleanup for you (should be trivial, I wouldn't pay more than $10-15) or if you have good specs for inference you could learn to do it yourself.

AuryGlenz · 2025-08-27T04:41:52 1756269712

If you have a decent GPU Qwen Edit can probably do it and certainly won’t refuse.

Keep in mind no editing model is magic and if the pixels just aren’t there for their faces it’s essentially going to be making stuff up.

yfontana · 2025-08-27T07:57:39 1756281459

Open source models like Flux Kontext or Qwen image edit wouldn't refuse, but you need to either have a sufficiently strong GPU or get one in the cloud (not difficult nor expensive with services like runpod), then set up your own processing pipeline (again, not too difficult if you use ComfyUI). Results won't be SOTA, but they shouldn't be too far off.

danpalmer · 2025-08-27T03:57:29 1756267049

I've done ~20 prompts so far and not had one be rejected so far. What sort of things are you asking it to do? I've tried things like changing clothing and accessories on people.

carlosbaraza · 2025-08-27T07:21:08 1756279268

Basic things like: "{uploaded image of a man} can you remove the glasses?" or "make everyone in the picture smile" or "open the eyes of everyone in the photo". Nothing that a human would consider "unsafe". I am based in EU and using Google AI Studio with all safety toggles set to "Off".

danpalmer · 2025-08-28T02:38:28 1756348708

Strange. I wouldn't have thought the safety rules would differ by region, at least not for things like that. I uploaded a photo and asked to change the glasses and change the shirt and it did both with no problem.

I just went back to the chat and asked it to remove the glasses and it worked. Asking it to remove the shirt also succeeded, although a) this is a head and shoulders photo so nothing NSFW, and b) it didn't do a great job of guessing what my shoulders look like.

technofiend · 2025-08-27T18:23:39 1756319019

For a joke between friends I had it take my selfie and make me a bald Catholic priest and then add hair to a friend who is bald. No refusals, although those are pretty tame. In contrast to the quality images nano-banana produced, Copilot removed my glasses and made my eyes brown.

simedw · 2025-08-27T09:10:08 1756285808

I noticed that I get far fewer refusals when I set my VPN to the USA.

mudkipdev · 2025-08-26T21:51:12 1756245072

I was using Veo two days ago when video generations were free. I removed all words that sounded even remotely bad, but it still refused. Eventually gave up but now I'm thinking it's because I tried to generate myself

minimaxir · 2025-08-26T22:11:40 1756246300

There is one thing Gemini 2.5 Flash Image can do that no other edit model can do: incorporate multiple images simultaneously without shenanigans due to its multimodality, e.g. for Flux Kontext, if you want to "put the person in the first image into the second image", you have to concatenate them pre-VAE which can be unwieldly, but this model doesn't have that issue. You can even incorporate more than two images, but that may cause too much chaos.

In quick testing, prompt adherence does appear to be much better for massive prompt and the syntatic sugar does appear to be more effective. And there are other tricks not covered which I suspect may allow more control, but I'm still testing.

Given that generations are at the same price as its competitors, this model will shake things up.

blinding-streak · 2025-08-26T22:31:53 1756247513

I very much enjoy this feature. My next door neighbor is on vacation, and I'm feeding his fish for him. I took a picture of the fish tank and asked Gemini to put the fish tank at various local tourist attractions in my city, as if we're going on day trips.

I send him one photo a day and he's been loving it. Just a fun little thing to put a smile on his face (and mine).

AuryGlenz · 2025-08-27T04:40:05 1756269605

Fun fact - I trained a lora on our almost-toddler at the time on SDXL and generated images of her doing dangerous things to send to my wife the first day she had a trip away from us.

It was all fun and games until the little shit crawled out of our doggy door for the first and only time when I was going to the bathroom. As I was looking for her I got a notification we were in a tornado warning.

Luckily the dog knew where she had gone and led me to her, having crawled down our (3 step) deck, across our yard, and was standing looking up at the angry clouds.

ojr · 2025-08-27T02:34:20 1756262060

it can't put two images of people together in one photo, this model still has the issue, also, I have seen cases where Flux Kontext works better in things like removing objects

dsrtslnd23 · 2025-08-27T06:17:26 1756275446

gpt-image-1 works with multiple input images. I even had good success with >4 images.

notsylver · 2025-08-26T15:15:39 1756221339

I digitised our family photos but a lot of them were damaged (shifted colours, spills, fingerprints on film, spots) that are difficult to correct for so many images. I've been waiting for image gen to catch up enough to be able to repair them all in bulk without changing details, especially faces. This looks very good at restoring images without altering details or adding them where they are missing, so it might finally be time.

Almondsetat · 2025-08-26T15:41:42 1756222902

All of the defects you have listed can be automatically fixed by using a film scanner with ICE and a software that automatically performs the scan and the restoration like Vuescan. Feeding hundreds (thousands?) of photos to an experimental proprietary cloud AI that will give you back subpar compressed pictures with who knows how many strange artifacts seems unnecessary

notsylver · 2025-08-26T16:02:44 1756224164

I scanned everything into 48-bit RAW and treat those as the originals, including the IR scan for ICE and a lower quality scan of the metadata. The problem is sharing them - important images I manually repair and export as JPEG which is time consuming (15-30 minutes per image, there are about 14000 total) so if its "generic family gathering picture #8228" I would rather let AI repair it, assuming it doesn't butcher faces and other important details. Until then I made a script that exports the raws with basic cropping and colour correction but it can't fix the colours which is the biggest issue.

exe34 · 2025-08-26T17:13:12 1756228392

this reminds me of a joke we used to tell as kids when there was a new Photoshop version coming out - "this one will remove the cow from the picture and we'll finally see what great-grandpa looked like!"

wingworks · 2025-08-26T20:09:46 1756238986

How did you get the 49bit and ICE data separately? Did you double scan everything?

I'm scanning my parents photos at the moment.

wingworks · 2025-08-26T20:17:15 1756239435

Vuescan is terrible. SilverFast has better defaults. But nothing beats the orig Nikon scan software when using ICE. It does a great job of removing dust, fingerprints etc Even when you zoom in. VS what iSRD does in SilverFast, which if you zoom in and compare the 2. iSRD kinda smooches/blurs the infrared defects whereas Nikon Scan clones the surrounding parts, which usually looks very good when zooming in.

Both Silverfast and Nikon Scan methods look great when zoomed out. I never tried Vuescan's infrared option. I just felt the positive colors it produced looks wrong/"dead".

reaperducer · 2025-08-26T16:16:55 1756225015

I've been waiting for image gen to catch up enough to be able to repair them all in bulk without changing details, especially faces.

I've been waiting for that, too. But I'm also not interesting in feeding my entire extended family's visual history into Google for it to monetize. It's wrong for me to violate their privacy that way, and also creepy to me.

Am I correct to worry that any pictures I send into this system will be used for "training?" Is my concern overblown, or should I keep waiting for AI on local hardware to get better?

Zopieux · 2025-08-26T21:47:03 1756244823

You're looking for Flux Kontext, a model you can run yourself offline on a high end consumer GPU. Performance and accuracy are okay, not groundbreaking, but probably enough for many needs.

bjackman · 2025-08-26T17:41:02 1756230062

I don't really understand the point of this usecase. Like, can't you also imagine what the photos might look like without the damage? Same with AI upscaling in phone cameras... if I want a hypothetical idea of what something in the distance might look like, I can just... imagine it?

I think we will eventually have AI based tools that are just doing what a skilled human user would do in Photoshop, via tool-use. This would make sense to me. But just having AI generate a new image with imagined details just seems like waste of time.

bibabaloo · 2025-08-26T23:21:10 1756250470

Why take photos at all if you can just imagine them?

bjackman · 2025-08-27T06:41:12 1756276872

Well, that goes to the heart of my point. I take pictures because I value how literal they are. I enjoy the fact that they directly capture the arrangement of light in the moment I took them. That

So yeah, if I'm gonna then upscale them or "repair" them using generative AI, then it's a bit pointless to take them in the first place.

gretch · 2025-08-27T16:22:48 1756311768

If you want 2 people to look at the same photo and share the same experience, you have to fix the photo.

If you leave to imagination, it's likely they each imagine something different.

w4yai · 2025-08-26T19:02:45 1756234965

Not everyone has a great imagination.

Filligree · 2025-08-26T19:20:32 1756236032

Read up on aphantasia.

zwog · 2025-08-26T15:41:31 1756222891

Do you happen to know some software to repair/improve video files? I'm in the process of digitalizing a couple of Video 2000 and VHS casettes of childhood memories of my mom who start suffering from dementia. I have a pretty streamlined setup for digitalizing the videos but I'd like to improve the quality a bit.

nycdatasci · 2025-08-26T16:35:46 1756226146

I've used products from topazlabs.com for the same problem and have generally been happy with them.

qingcharles · 2025-08-26T16:55:22 1756227322

Topaz is probably the SOTA in video restoration, but it can definitely fuck shit up. Use carefully and sparingly and check all the output for weird AI glitches.

actionfromafar · 2025-08-26T16:27:09 1756225629

VHSdecode if you want a rabbit hole.

notsylver · 2025-08-26T16:07:24 1756224444

I didn't do any videos, just pictures, but considering how little I found for pictures I doubt you'll find much

Barbing · 2025-08-26T15:42:06 1756222926

Hope it works well for you!

In my eyes, one specific example they show (“Prompt: Restore photo”) deeply AI-ifies the woman’s face. Sure it’ll improve over time of course.

notsylver · 2025-08-26T16:23:06 1756225386

I tried a dozen or so images. For some it definitely failed (altering details, leaving damage behind, needing a second attempt to get a better result) but on others it did great. With a human in the loop approving the AI version or marking it for manual correction I think it would save a lot of time.

This is the first image I tried:

https://i.imgur.com/MXgthty.jpeg (before)

https://i.imgur.com/Y5lGcnx.png (after)

Sure, I could manually correct that quite easily and would do a better job, but that image is not important to us, it would just be nicer to have it than not.

I'll probably wait for the next version of this model before committing to doing it, but its exciting that we're almost there.

qingcharles · 2025-08-26T16:58:43 1756227523

Being pragmatic, the after is a good restoration. There is nothing really lost (except some sharpness that could be put back). The main failing of AI is on faces because our brains are so hardwired to see any changes or weirdness. This is the sort of image that is perfect for AI because the subject's face is already occluded.

indigodaddy · 2025-08-26T15:56:00 1756223760

Another question/concern for me: if I restore an old picture of my Gramma, will my Gramma (or a Gramma that looks strikingly similar) ever pop up on other people's "give me a random Gramma" prompts?

Barbing · 2025-08-27T13:53:59 1756302839

It might show her for prompts of “show me the world’s best grandma” :)

On free tier, I’d essentially believe that to be the default behavior. In reality they might simply use your feedback and your text prompts instead. Certainly know free Google/OpenAI LLM usage entails prompts being used for research.

Edit: decent chance it would NOT directly integrate grandma into its training, but would try hard to use an offline model for any privacy concerns

danielbln · 2025-08-26T15:42:22 1756222942

That time had arrived a few months ago already with Flux Kontext (https://bfl.ai/models/flux-kontext).

atleastoptimal · 2025-08-27T01:34:58 1756258498

I can imagine an automated blackmail bot that scrapes image, video, voice samples from anyone with the most meagre online presence, which then creates high resolution videos of that person doing the most horrid acts, then threatening to share those videos with that person's family, friends and business contacts unless they are paid $5000 in a cryptocurrency to an anonymous address.

And further, I can imagine some person actually having such footage of themselves being threatened to be released, then using the former narrative as a cover story were it to be released. Is there anything preventing AI generated images, video, etc from being always detectible by software that can intuit if something is AI? what if random noise is added, would the "Is AI" signal persist just as much as the indication to human that the footage seems real?

shibeprime · 2025-08-27T02:32:12 1756261932

I’m more bullish on cryptographic receipts than on AI detectors. Capture signing (C2PA) plus an identity bind could give verifiable origin. The hard parts, in my view, are adoption and platform plumbing.

If we have a trust worthy way to verify proof-of-human made content than anything missing those creds would be red flags.

https://iptc.org/news/googles-pixel-10-phone-supports-c2pa-u...

arsome · 2025-08-27T15:19:43 1756307983

This seems absolutely silly, it's not hard to take a photo of a photo and there's both analog (building a lightbox) and digital (modifying the sensor input) means which would make this entirely trivial to spoof.

goosejuice · 2025-08-27T01:57:32 1756259852

SynthID claims to be designed to persist through several methods of modification. I suspect such attacks you mention will happen, but by those with deep pockets. Like a nation-state actor with access to models that don't produce watermarks.

UltraSane · 2025-08-27T04:32:14 1756269134

But these new amazing AI image generators lets you just say "It wasn't me, it is an AI fake". Long term they will seriously devalue blackmail material.

I read a scifi novel where they invented a wormhole that only light could pass through but it could be used as a camera that could go anywhere and eventually anytime and there was absolutely no way to block it. So some people adapted to this fact by not wearing clothes anymore.

SirFredman · 2025-08-27T07:25:35 1756279535

The light of other days, by Arthur C. Clarke and Stephen Baxter. Really cool book.

Revisional_Sin · 2025-08-27T05:58:25 1756274305

> So some people adapted to this fact by not wearing clothes anymore.

Erm... What?

UltraSane · 2025-08-27T12:33:33 1756298013

Because anyone could use the wormhole camera to see anyone naked. It made modesty effectively impossible.

geysersam · 2025-08-28T02:18:37 1756347517

Don't know why you're being downvoted. That is the logical conclusion.

Although, there's also a chance that those "blackmail gangs" never materialize. After all, you could already ten years ago pay cheap labor to create reasonably good fake images using Photoshop.

atonse · 2025-08-29T04:47:57 1756442877

I’m also wondering if the opposite will be true. That people might claim something is AI generated to discredit it?

m3kw9 · 2025-08-27T15:29:05 1756308545

If you are willing to pay once, you will be re-targeted. Just like Facebook ads

crustaceansoup · 2025-08-26T16:23:39 1756225419

I tried to reproduce the fork/spaghetti example and the fashion bubble example, and neither looks anything like what they present. The outputs are very consistent, too. I am copying/pasting the images out of the advertisement page so they may be lower resolution than the original inputs, but otherwise I'm using the same prompts and getting a wildly different result.

It does look like I'm using the new model, though. I'm getting image editing results that are well beyond what the old stuff was capable of.

mortenjorck · 2025-08-26T16:57:02 1756227422

The output consistency is interesting. I just went through half a dozen generations of my standard image model challenge, (to date I have yet to see a model that can render piano keyboard octaves correctly, and Gemini 2.5 Flash Image is no different in that regard), and as best I can tell, there are no changes at all between successive attempts: https://g.co/gemini/share/a0e1e264b5e9

This is in stark contrast to ChatGPT, where an edit prompt typically yields both requested and unrequested changes to the image; here it seems to be neither.

BoorishBears · 2025-08-26T19:15:55 1756235755

Flash 2.0 Image had the same issue: it does better than gpt-image for maintaining consistency in edits, but that also introduces a gap where sometimes it gets "locked in" on a particular reference image and will struggle to make changes to it.

In some cases you'll pass in multiple images + a prompt and get back something that's almost visually indistinguishable from just one of the images and nothing from the prompt.

crustaceansoup · 2025-08-26T19:11:31 1756235491

Wildly different and subjectively less "presentable", to be clear. The fashion bubble just generates a vague bubble shape with the subject inside it instead of the"subject flying through the sky inside a bubble" presented on the site. The other case just adds the fork to the bowl of spaghetti. Both are reproducible.

Arguably they follow the prompt better than what Google is showing off, but at the same time look less impressive.

skybrian · 2025-08-26T16:55:13 1756227313

Like most image generators, it didn’t pass the piano keyboard test. (Black keys are wrong.)

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

joombaga · 2025-08-26T16:59:56 1756227596

What is the piano keyboard test? Your link requires granting AI Studio access to Google Drive, which I do not want to do.

raincole · 2025-08-26T17:08:39 1756228119

Just ask it to generate a correct piano keyboard. It's something the current gen of image generator AIs fail at.

ZiiS · 2025-08-26T19:26:56 1756236416

Do most humans pass?

raincole · 2025-08-26T20:13:59 1756239239

Most humans fail at 4 digits multiplication, or drawing a cube in perspective.

phainopepla2 · 2025-08-26T19:56:02 1756238162

Presumably most humans with a camera do

adzm · 2025-08-26T19:38:44 1756237124

2-2-1-2-2-2-1

polynomial · 2025-08-26T19:58:43 1756238323

I still feel like most humans would fail, haha.

twodave · 2025-08-27T02:29:44 1756261784

Maybe, but anyone who knows what a chromatic scale is should be able to reason it out. E# == F, B# == C, so no black keys between those.

Workaccount2 · 2025-08-26T17:16:00 1756228560

The selling point of this model really seems to be it's consistency between generations rather than it's raw generating ability.

for instance:

https://aistudio.google.com/app/prompts/1gTG-D92MyzSKaKUeBu2...

skybrian · 2025-08-26T19:28:43 1756236523

I can’t see it. You probably need to set permissions to “anyone with the link can access.”

pbhjpbhj · 2025-08-26T17:52:27 1756230747

Are their models that have vector space that includes ideas, not just words/media but not entirely corporeal aspects?

So when generating a video of someone playing a keyboard the model would incorporate the idea of repeating groups of 8 tones, which is a fixed ideational aspect which might not be strongly represented in words adjacent to "piano".

It seems like models need help with knowing what should be static, or homomorphic, across or within images associated with the same word vectors and that words alone don't provide a strong enough basis [*1] for this.

*1 - it's so hard to find non-conflicting words, obviously I don't mean basis as in basis vectors, though there is some weak analogy.

heyjamesknight · 2025-08-26T19:21:49 1756236109

How would you encode those ideas?

pbhjpbhj · 2025-08-27T13:31:26 1756301486

I don't know, in part that's why I asked ... I wonder if there's a way to provide a loosely-defined space.

Perhaps it's a second word-vector space that allows context defined associations? Maybe it just needs tighter association of piano_keyboard with 8-step_repetition??

mikepurvis · 2025-08-26T16:59:44 1756227584

Interesting! I feel like that's maybe similar to the business of being able to correctly generate images of text— it looks like the idea of a keyboard to a non-musician, but is immediately wrong to someone who is actually familiar with it at all.

I wonder if the bot is forced to generate something new— certainly for a prompt like that it would be acceptable to just pick the first result off a google image search and be like "there, there's your picture of a piano keyboard".

vunderba · 2025-08-26T17:05:20 1756227920

Anything that is heavily periodic can definitely trip up image gen - that being I just used Flux Kontext T2I and got a got pretty close (disregard the hammers though since thats a right mess). Only towards the upper register did it start to make mistakes.

https://imgur.com/a/fyX42my

psbp · 2025-08-26T17:08:29 1756228109

Doesn't pass the analog clock test either.

cubefox · 2025-08-26T17:41:39 1756230099

Like most image models, except GPT-4o, it also didn't pass the wooden Penrose triangle test. (It creates normal triangles.)

carimura · 2025-08-26T17:37:23 1756229843

or my "hands with palms facing down" test.... no matter how hard I try it just can't get open hands, palms down.

vunderba · 2025-08-26T18:00:00 1756231200

It's probably just a matter of rerolling a few times. I was able to get it around 25% of the time.

https://imgur.com/a/H9gH3Zy

carimura · 2025-08-27T03:17:54 1756264674

that's pretty good. I was using a cartoon girl as an example of a dance move for kids.

https://g.co/gemini/share/0e0de0d42029

pbhjpbhj · 2025-08-26T17:56:42 1756231002

I guess the vast majority of images have the palms the other way, that this biases the output. It's like how we misinterpret images to generate optical illusions, because we're expecting valid 3D structures (Escher's staircases, say).

vunderba · 2025-08-26T18:01:23 1756231283

Yes - it's the same reason generating a 5-leaf clover fails - massive amounts of training data that predisposes the model against it.

conception · 2025-08-26T21:45:37 1756244737

Failed my horizontal text test as well.

torginus · 2025-08-26T20:43:33 1756241013

A bit mixed opinions - I tried colorizing manga pages with it, and the results were perfect.

Interestingly, it can change pages with tons of text on them without any problem, but cannot seem to do translation, if I ask it to translate a French comic page, the text ends up garbled (even though it can perfectly read and translate the text by itself).

I tried with another page, and it copypasted the same character (in different poses!) all over the panels. Absolutely uncanny!

However when I asked to remake a Western comic book in a manga style (provided a very similar manga page to the comic one), it totally failed.

Also about 50% of the time, it just tells me it'll generate the image but doesn't actually do it - not sure what's going on but a retry fixes it, but it's annoying.

anyg · 2025-08-27T10:34:59 1756290899

I had a similar experience.

It did not change the text on a hat (ended up changing 1 of 3 words).

On one occasion it regenerated the same image again, ignoring my instructions to edit.

I get the feeling that this model is optimised for images with people in it than objects or drawings etc

matsemann · 2025-08-26T15:45:48 1756223148

Half the time I ask Gemini to generate some image it claims it doesn't have the capability. And in general I've felt it's so hard to actually use the features Google announce? Like, a third of them is in one product, some in another which I can't use, and no idea what or where I should pay to get access. So confusing.

IanCal · 2025-08-27T07:01:38 1756278098

Google have been terrible at every single rollout I’ve ever seen them do.

I see an announcement and it’s a waitlist. It says I can use it right now and I get a 404, or a waitlist, or it doesn’t work in my country. With the AI stuff more often it takes me to a place where I can do something but not what they say, and have zero information about whether I’m using the new thing or not.

Like this is flash image preview, but I have flash which is also a thing so is it the new one or not? The ui hasn’t changed but now it can do pictures so has my flash model moved from a GA model to a preview one? Probably! Or maybe it gets routed? Who knows!

Al-Khwarizmi · 2025-08-26T15:54:09 1756223649

Yeah, in fact the website says "Try it in Gemini" and I'm not sure if I'm already trying it or not - if I choose Gemini 2.5 Flash in the regular Gemini UI, I'm using this?

throwup238 · 2025-08-26T16:04:45 1756224285

It’s going to be a messy rollout as usual. The web app (gemini.google.com) shows “Images with Imagen” for me under tools for 2.5 flash but I just tried a few image edits and remixes in the iOS app and it looks like it’s been updated to this model.

oliwary · 2025-08-26T16:19:37 1756225177

Also very confused at this... It told me "I'm unable to create images of specific individuals in different settings." I wish it would at least say somewhere which model we are using at the moment.

sega_sai · 2025-08-26T16:00:12 1756224012

I think not. Because at least in the aistudio there is a dedicated gemini-2.5-flash-image-preview model. So I am assuming it is not available in the standard gemini chat window.

jeffbee · 2025-08-26T18:07:56 1756231676

It's not in the Gemini app or site at all. You have to use AI Studio or another means. Yes, this is all very confusing on Google's part.

IanCal · 2025-08-27T07:02:43 1756278163

Hmm could the old models generate images before? Had they hooked up imagen or something? I can make images on the Gemini site.

bonoboTP · 2025-08-27T11:00:39 1756292439

Yes, there was gemini-2.0-flash-preview-image-generation before, which could generate and edit also. But weaker than the new one.

IanCal · 2025-08-27T12:08:07 1756296487

Thanks, I'd not realised that, which means I have no idea if the things I've done outside of the API are this new one or not. That does feel classic google.

bonoboTP · 2025-08-27T14:54:29 1756306469

Yes, there's a conflict between wanting to just provide the good stuff by default under a unified Gemini brand where you don't have to worry about model names, it just works, versus building hype for a specific model and then being unclear about whether you're using that one or not. The nano-banana name is unique and fun, and got some recognition on social media already, they should just make a page with that heading and a chatbox. But again, that would focus on the new image editor thing only, and they probably want to lure people into their whole ecosystem, to switch to Gemini in general, from competitors like ChatGPT.

__rito__ · 2025-08-26T17:02:35 1756227755

I am glad that I never decided to become a photoshop pro. I always contemplated about it, seemed attractive for a while, but glad that I decided against it. RIP r/photoshopbattles.

It was in the endless list of new shiny 'skills' that feels good to have. Now I can use nano-banana instead. Other models will soon follow, I am sure.

esafak · 2025-08-26T17:20:30 1756228830

Retouching is an art. To the pro, this is just another tool to increase efficiency. You pay them not just for knowing how to use Photoshop, but for exercising good judgement. That said, I imagine this will shrink the field, since fewer retouchers will be able to do the same work, unless the amount of work goes up commensurately. Will people get more retouching done if the price goes down? Not sure.

neom · 2025-08-26T19:43:50 1756237430

Especially colouring, In college I worked for a dude who would re-colour old B&Ws for people, 60% the work (the work he enjoyed) was trying to research enough to know reasonably well what colour something actually ought to be, not just what we thought looked good.

polynomial · 2025-08-26T19:59:49 1756238389

"Realism is overrated." /s

__rito__ · 2025-08-27T16:17:14 1756311434

I didn't say Lightroom and said Photoshop and mentioned that subreddit for a reason.

ctippett · 2025-08-26T17:24:52 1756229092

Interesting take. I'm a programmer, but learned Photoshop in the early 2000s and had a blast making and editing images for fun. Sure, the generative models today can do a far better job than anything I could come up with, but that doesn't detract from the experience and skills I picked up over the years.

If anything, knowing Photoshop (I use Affinity Designer/Photo these days) is actually incredibly useful to finesse the output produced by AI. No regrets.

__rito__ · 2025-08-27T17:30:59 1756315859

> learned Photoshop in the early 2000s and had a blast making and editing images for fun

> "had a blast"

One can have blasts in many things nowadays. Like playing Factorio, writing functional code for recreational problem solving, playing Chess, making SBC/Microprocessor projects for fun, doing Math for fun, and so on...

Photoshop just couldn’t compete with the existing blasts in my life, and I felt a little bad for not learning it. But that teeny, tiny bad feeling has been wiped away by nano-banana.

polynomial · 2025-08-26T20:00:41 1756238441

Photoshop was hella fun, turned out that programming paid more. And now AI pays much more.

SoKamil · 2025-08-26T17:21:07 1756228867

If you commented it a decade ago, I would say that at least you own the program and skills in case Google decides to turn off the lights or ask prohibitive price tag. Now you need to pay subscription for PS and maybe there would be some decent open weight model released.

stefs · 2025-08-26T18:26:53 1756232813

qwen3 is open weights and offers passable image generation

echelon · 2025-08-26T17:06:33 1756227993

Programming and everything else will eventually fall to automation, too. It's just a matter of time.

Engineering probably takes a while (5 years? 10 years?) because errors multiply and technical debt stacks up.

In images, that's not so much of a big deal. You can re-roll. The context and consequences are small. In programs, bad code leads to an unmaintainable mess and you're stuck with it.

But eventually this will catch up with us too.

quantumHazer · 2025-08-26T17:07:50 1756228070

Both of you are wrong and this is not good discussion level for HN