It won't be long before generalized bots stop requesting links that don't have a...

bastawhiz · 2025-10-26T17:23:00 1761499380

If bots get good enough to know what links they're scraping, chances are they'll also avoid scraping links they don't need to! The problem solves itself!

akoboldfrying · 2025-10-26T23:14:52 1761520492

Maybe you're joking, but assuming you're not: This problem doesn't solve itself at all. If bots get good enough to know what links have garbage behind them, they'll stop scraping those links, and go back to scraping your actual content. Which is the thing we don't want.

bastawhiz · 2025-10-28T00:14:46 1761610486

That's sort of the point: almost nobody runs a site as large as Reddit. The average website has a relatively small handful of pages. Even a very active blog has few enough pages that it could be fully scraped in under a few minutes. Where scrapers get hung up is when they're processing links that add things like query parameters, or navigating through something like a git repository and clicking through every file in every commit. If a scraper has enough intelligence to look at what the link is, it _surely_ has enough intelligence to understand what it does and does not need to scrape.