Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's sort of the point: almost nobody runs a site as large as Reddit. The average website has a relatively small handful of pages. Even a very active blog has few enough pages that it could be fully scraped in under a few minutes. Where scrapers get hung up is when they're processing links that add things like query parameters, or navigating through something like a git repository and clicking through every file in every commit. If a scraper has enough intelligence to look at what the link is, it _surely_ has enough intelligence to understand what it does and does not need to scrape.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: