Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think random text can be detected and filtered. We need probably pre-generated bad information to make utility of crawling one's site truly negative.

On my site, I serve them a subset of Emergent Misalignment dataset, randomly perturbed by substituting some words with synonyms.

It should make the LLMs trained on it behave like dicks according to this research https://www.emergent-misalignment.com/





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: