I think after the big training they do smaller training to change some details. I suppose they feed the system a bunch of training chat logs where the answers are warm and empathetic.
Or maybe they ask a ton of questions, do a “mood analysis” of the response vocabulary and penalize the non-warm and empathetic answers.
Or maybe they ask a ton of questions, do a “mood analysis” of the response vocabulary and penalize the non-warm and empathetic answers.