Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As far as I can tell, AI image generation still struggles with some things after many years of research and is often detectable. Perhaps vocals is easier though.


It's like cgi, you only recognize bad examples of it while the good ones go right past you. I've got plenty of ai generations that fool professional photo retouchers - it just takes more time and some custom tooling.


> I've got plenty of ai generations that fool professional photo retouchers - it just takes more time and some custom tooling.

What’s a good place to find out the SOTA of the custom tooling and workflow?


Comfyui + civitai. 4chan and reddit threads if you want to go deep


> It's like cgi

Right. Full of code injection vulnerabilities.


"many years" lol, midjourney only came out like a year and a half ago and the quality has quadrupled in that time.


Generative text-to-image models based on neural networks have been developing since around 2015. Dall-E was the first to gain widespread attention in 2021. Then later models like Stable Diffusion and Midjourney.

"Quadrupled" is a very specific and quantitative word. What measure are you basing that on?


The recommended resolution went from 512x512 to 1024x1024 in that time span :)


Ah right. But that's only tangentially related to being able to distinguish AI-generated images. There are tells that are completely separate to resolution, such as getting the correct spacing of black keys on a piano.


From audio video editing experience years back, it is much easier to slip some cheap audio cuts past people than visual ones.


This already sounds like something I would have listened to in the 90s, except with too much autotune.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: