I go back and forth on this. A year ago, I was optimistic and I have had 1 case ...

daemonologist · 2025-10-19T21:10:26 1760908226

Labels are so essential - even if you're not training anything, being able to quickly and objectively test your system is hugely beneficial - but it's a constant struggle to get them. In the unlikely event you can get budget and priority for an SME to do the work, communicating your requirements to them (the need to apply very consistent rules and make few errors) is difficult and the resulting labels tend to be messy.

More than once I've just done labeling "on my own time" - I don't know the subject as well but I have some idea what makes the neurons happy, and it saves a lot of waiting around.

I've found tuning large models to be consistently difficult to justify. The last few years it seems like you're better off waiting six months for a better foundation model. However, we have a lot of cases where big models are just too expensive and there it can definitely be worthwhile to purpose-train something small.

hommes-r · 2025-10-20T06:59:30 1760943570

My personal opinion is that true engineering, which revolves around turning complex theory into working practice, has seen a decline in grace. Why spend a lot of time trying to master the art of engineering if you can ride the wave of engineering services and get away with it?

In true hacker spirit, I don't think trying to train a model on a wonky GPU is something that needs an ROI for the individual engineer. It's something they do because they yearn to acquire knowledge.

sdenton4 · 2025-10-19T19:01:43 1760900503

Eventually someone will make a killing on doing actual outcome measurements instead of just trusting the LLMs, Michael Lewis will write a popular book about it, and the cycle will begin anew...

XenophileJKO · 2025-10-19T19:35:49 1760902549

I'm also seeing teams who expected big gains from fine tuning get incremental or moderate gains. Then they put it in production and regret the action as SOTA marches quickly.

I have avoided fine tuning because the models are currently improving at a rate that exceeds big corporate product development velocity.

deepsquirrelnet · 2025-10-19T19:48:45 1760903325

Absolutely the first thing you should try is a prompt optimizer. The GEPA optimizer (implemented in DSPy) often outperforms GRPO training[1]. But I think people are usually building with frameworks that aren't machine learning frameworks.

[1] https://arxiv.org/abs/2507.19457