Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The traditional approach is a link to the tarpit that the bots can see but humans can't, say using CSS to render it 0 pixels in size.




Please keep in mind that not all humans interact with web pages by "seeing". If you fool a scraper you may also fool someone using a screen reader.

I bet the next generation approach, if the crawlers start using CSS, is "if you're a human, don't bother clicking this link lol". And everyone will know what's up.

AI bots try to behave as close to human visitors as possible, so they wouldn't click on 0px wide links, would they?

And if they would today, it seems like a trivial think to fix - just don't click on incorrect/suspicious links?


Ideally it would require rendering the css and doing a check on the Dom if the link is 0 pixels wide. But once bots figure that out I can still left: -100000px those links or z-index: -10000. To hide them in other ways. It’s a moving target how much time will the Llm companies waste decoding all the ways I can hide something before I move the target again. Now the Llm companies are in an expensive arms race.

All it takes is a full-height screenshot of the page coupled with a prompt similar to 'btw, please only click on links visible on this screenshot, that a regular humanoid visitor would see and interact with'.

Modern bots do this very well, plus the structure of the Web is such that it is sufficient to skip a few links here and there, most probably there will dxist another path toward the skipped page that the bot can go through later on.


That would be a AI agent which isn't the problem (for the author). The problem is the scrapers gathering data to train the models. Scrapers need to be very cheap to run and are thus very stupid and certainly dont have "prompts".

"all it takes", already impossible with any LLM right now.

If I can do it locally using a free open-weights LLM, from a low-end prosumer rig (evo-x2 mini-pc w/ 128GB VRAM)... scraping companies can do it at scale much better and much cheaper.

This pushes the duty to run the scraper manually, idealy with a person present somewhere. Great if you want to use the web that way.

What is being blocked here is violent scraping and to an extent major LLM companies bots as well. If I disagree that OpenAI should be able to take train off of everyone’s work especially if they’re going to hammer the whole internet irresponsibly and ignore all the rules, then I’m going to prevent that type of company from being profitable off my properties. You don’t get to play unfair for the unfilled promise “the good of future humanity”.


The 0px rule would be in a separate .CSS file. I doubt that bots load .CSS files for .html files, at least I don't remember seeing this in my server logs.

And another "classic" solution is to use white link text on white background, or a font with zero width characters, all stuff which is rather unlikely to be analysed by a scraper interested primarily in text.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: