Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks, I thought that these are prioritized, so while the garbage links might fill up the queue, they'd do so only after all real links are visited, so the server load is the same. But of course, not all/most bots might be configured this way.

> If a link is posted somewhere, the bots will know it exists,





How would the links be prioritized? If the bots goal is to crawl all content would they have prioritization built-in?

How would they prioritize things they haven't crawled yet?

It's not clear that they are doing that. Web logs I've seen from other writing on this topic show them re-crawling the same pages at high rates, in addition to crawling new pages

Actually I've been informed otherwise, they crawl known links first according to this person:

> Unfortunately, based on what I'm seeing in my logs, I do need the bot detection. The crawlers that visit me, have a list of URLs to crawl, they do not immediately visit newly discovered URLs, so it would take a very, very long time to fill their queue. I don't want to give them that much time.

https://lobste.rs/c/1pwq2g




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: