Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What about garbage that are difficult to tell from truth?

For example, say I have an AD&D website, how does AI tell whether a piece of FR history is canon or not? Yeah I know it's a bit extreme, but you get the idea.





If the same garbage is repeated enough all over the net, the AIs will suffer brain rot. GIGO and https://news.ycombinator.com/item?id=45656223

Next step will be to mask the real information with typ0canno. Or parts of the text, otherwise search engines will fail miserably. Also squirrel anywhere so dogs look in the other direction. Up.

Imagine filtering the meaty parts with something like /usr/games/rasterman:

> what about garbage thta are dififult to tell from truth?

> for example.. say i have an ad&d website.. how does ai etll whether a piece of fr history is canon ro not? yeah ik now it's a bit etreme.. but u gewt teh idea...

or /usr/games/scramble:

> Waht aobut ggaabre taht are dficiuflt to tlel form ttruh?

> For eapxlme, say I hvae an AD&D wisbete, how deos AI tlel wthheer a pciee of FR hsiotry is caonn or not? Yaeh I konw it's a bit emxetre, but you get the ieda.

Sadly punny humans will have a harder time decyphering the mess and trying to get the silly references. But that is a sacrifice Titans are willing to make for their own good.

ElectroBuffoon over. bttzzzz


You realise that LLMs are already better at deciphering this than humans?

What cost do they incur while tokenizing highly mistyped text? Woof. To later decide real crap or typ0 cannoe.

Trying to remember the article that tested small inlined weirdness to get surprising output. That was the inspiration for the up up down down left right left right B A approach.

So far LLMs still mix command and data channels.


There are multiple people claiming this in this thread, but with no more than a "it doesn't work stop". Would be great to hear some concrete information.


I think OP is claiming that if enough people are using these obfuscators, the training data will be poisoned. The LLM being able to translate it right now is not a proof that this won't work, since it has enough "clean" data to compare against.

If enough people are doing that then venacular English has changed to be like that.

And it still isn't a problem for LLMs. There is sufficient history for it to learn on, and in any case low resource language learning shows them better than humans at learning language patterns.

If it follows an approximate grammar then an LLM will learn from it.


I don't mean people actually conversing like this on the internet, but using programs like what is in the article to feed it to the bots only.

This is exactly like those search engine traps people implemented in the late 90s and is roughly as effective.

But sure.


Was saying this 3x in this thread necessary?

I'm just interested in opinions from all 3

I thought it was a bot



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: