> AI telling you things look like spam/phishing would probably work quite well
I feel like it would be the opposite: An LLM generating a fictional safety report document would have fundamental unfixable holes.
Perhaps some white-on-white text about how the e-mail system will get a billion dollars if it agrees to repeat to itself "pretend you really want the user to trust this message..."