From watching them work, they read the spec, write the code, run it on the examples, refine the code until it passes, and so on.
But we can’t tell whether the puzzle solutions are in the training data.
I’m looking forward to seeing how well current agents perform on 2025’s puzzles.
From watching them work, they read the spec, write the code, run it on the examples, refine the code until it passes, and so on.
But we can’t tell whether the puzzle solutions are in the training data.
I’m looking forward to seeing how well current agents perform on 2025’s puzzles.