The paper says that it enhances existing methods such as prompt engineering (chain of thought) and LLM debate. This agent method is orthogonal to LLM debate.
In optimization problems, randomness can often get you out of local minima/maxima, and so averaging out a bunch of random search paths might get you better results in the worst case. Something similar might be happening here. The training set will be biased in various ways that might create weird local min/max points and so this process could avoid those weird kinks.