I think after the big training they do smaller training to change some details. ...

I think after the big training they do smaller training to change some details. I suppose they feed the system a bunch of training chat logs where the answers are warm and empathetic.

Or maybe they ask a ton of questions, do a “mood analysis” of the response vocabulary and penalize the non-warm and empathetic answers.