Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But isn't the whole gotcha of RLHF that it isn't as simple as removing something? The reason these things are so good is relative to subjectivity and/or guiding principles. You can't simply "disable" anything. People really need to start understanding this!

You can certainly do you're own feedback on a base model, matching whatever form of "safety" is right for you, but the idea you have a "right" to something else is precisely what I am saying. You want to see the same movie, but with "your" morality.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: