I've seen Gleam and Onyx, and I think the real problem is that there is a lot of garbage coming in. If you want to solve the problem, you need to find a way to clean the information coming in. And if you've cleaned the information coming in, you have a lower need to have an LLM answer the question.
Our experience, especially with the most recent reasoning models, is that the LLM's are a lot better now at sifting through the garbage. So if you last gave these products a try more than a month ago, I would try them again.
(Additionally, there are a lot of details that do make a big difference in data processing / search algo too, which have taken our own internal accuracy on hard questions from 30% => 80%+)