I’ve been trying to design a puzzle for a game this year that humans can solve b...

ekimekim · 2025-10-26T09:57:05 1761472625

There was one in a previous AoC that I think stumped a lot of AI at the time because it involved something that was similar to poker with the same terminology but different rules. The AI couldn't help but fall into a "this is poker" trap and make a solution that follows the standard rules.

sunrunner · 2025-10-26T10:30:49 1761474649

Was that 2023's Day 7 'Camel Cards' [1]?

[1] https://adventofcode.com/2023/day/7

Cpoll · 2025-10-26T14:43:12 1761489792

Isn't that easily solved by changing the terminology before giving it to the LLM?

petesergeant · 2025-10-26T10:20:49 1761474049

Interesting! Maybe that’s the general way to approach these things

gf000 · 2025-10-26T10:56:48 1761476208

I mean, wasn't pretty much the second half of all AoC exercises beyond LLM capabilities?

I remember there being multiple accounts trying to one-shot AoC and all ended on day 10 or so.

fud101 · 2025-10-26T11:22:05 1761477725

[flagged]

CaptainOfCoit · 2025-10-26T13:08:40 1761484120

We all have our writing quirks, like how some people use shorthand for words where there is only a marginal difference (like "people" => "ppl"), or even people who capitalize the start of sentences, but not the start of their whole text.

Some thoughts maybe should remain internal :)

dinkelberg · 2025-10-26T11:54:15 1761479655

A huge pet peeve of mine is people getting annoyed by phrases like "I mean." :)

toast0 · 2025-10-26T17:28:12 1761499692

There's plenty of prior work to go on. I mean, you could use a font ligature or one of the browser extensions (although I don't know if Chrome still lets you have a browser extension touch all text).

Change ChatGPT to 'my drunk uncle' while you're at it.

marjipan200 · 2025-10-26T17:56:21 1761501381

here you go, helping with exposure therapy

https://gist.github.com/clairefro/1cf81f5d7125e124975f4aba22...

snovv_crash · 2025-10-26T15:28:28 1761492508

It affects a certain disposition for the writer; the information it contains isn't in the actual data they are expressing, but rather the state of mind that they express it from, which can be important context. Oftentimes it can indicate exasperation, which is an important social queue to be able to pick up on.

A little excerpt from Arlo Guthrie

"I mean, I mean, I mean that just, I'm sittin' here on the bench, I mean I'm sittin here on the Group W bench, because you want to know if I'm moral enough to join the army, burn women, kids, houses and villages after being a litterbug."

Imagine that without the "I mean"s in it, and the importance of how they convey his stance on the situation.

TeMPOraL · 2025-10-26T12:16:11 1761480971

Since a "sentence", much like everything else in practice, is almost but not quite what the formal definition says, just use an LLM for this task.

fainpul · 2025-10-26T11:41:14 1761478874

I mean, you could just vibe code that.

GCUMstlyHarmls · 2025-10-26T13:31:02 1761485462

I mean is there a difference between asking an LLM via a prompt or asking an LLM via comment box?

Gigachad · 2025-10-26T11:22:35 1761477755

Have a look at https://arcprize.org/

They have hundreds of challenges that humans can solve in under a minute which LLMs can not. Seems the general trend is figuring out the rules or patterns of the challenge when there are few examples and no instructions.

mewpmewp2 · 2025-10-26T17:06:59 1761498419

Perhaps coding exercises that require 2d or 3d thinking, or similar. This is where I have seen LLMs struggle a lot. There are probably other areas too.

petesergeant · 2025-10-26T11:35:14 1761478514

Ah, it also needs to be challenging for humans. It's a prize to win something. I just didn't want people to throw the question into Claude Code.

LPisGood · 2025-10-26T20:48:19 1761511699

For more examples of such problems check Jane street puzzles of the month

petesergeant · 2025-10-27T06:09:10 1761545350

Those will almost certainly be too hard for the target audience

loeg · 2025-10-26T15:59:47 1761494387

Just have to incorporate good judgement in some way.

huflungdung · 2025-10-26T10:33:50 1761474830

How many <$letter>s are in the word <word with $letters>

Crespyl · 2025-10-26T15:18:56 1761491936

The bigger LLMs have generally figured out this specific problem.