Here's the full talk[0] from a Microsoft lead researcher that worked with the early / uncensored version of GPT-4.
Simplified, tuning it for censorship heavily limits the dimensionality the model can move in to find an answer which means worse results in general for some reason.
Simplified, tuning it for censorship heavily limits the dimensionality the model can move in to find an answer which means worse results in general for some reason.
[0]: https://www.youtube.com/watch?v=qbIk7-JPB2c