Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We should encourage number 2. So much of the content that the AI companies are scraping is already garbage, and that's a problem. E.g. LLMs are frequently confidently wrong, but so is Reddit, who produce a large volume of trading data. We've seen a study surgesting that you can poison an LLM with very little data. Encouraging the AI companies to care about the quality of the data they are scraping could be beneficial to all.

The cost of being critical of source material might make some AI companies tank, but that seems inevitable.





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: