The standard view is that RLHF relies on explicit feedback (thumbs up/down), and polite tokens are just noise adding compute cost.
But could natural replies like "thanks!" or "no, that's wrong" be a richer, more frequent implicit feedback signal than button clicks? People likely give that sort of feedback more often (at least I do.) It also mirrors how we naturally provide feedback as humans.
Could model providers be mining these chat logs for genuine user sentiment to guide future RLHF, justifying the cost? And might this "socialization" be crucial for future agentic AI needing conversational nuance?
Questions for HN:
Do you know of anyone using this implicit sentiment as a core alignment signal?
How valuable is noisy text sentiment vs. clean button clicks for training?
Does potential training value offset the compute cost mentioned?
Are we underestimating the value of 'socializing' LLMs this way?
What do you think Altman meant by "well spent"? Is it purely about user experience, valuable training data, something else entirely?
We humans tend to be very prone to getting offended simply because we can't really know what others are thinking, and we use defined manners to reduce unintended insults. We have seen this with email; over time, we are defining ways to reduce offending others by using emojis and other means. Manners are super important to help us work together so losing manners is a real problem.
I'm not sure this is true. To me it seems politeness is a mostly self-reinforcing cultural phenomenon that doesn't have any real objective basis. How much of your speech and gestures you're expected to use for etiquette without it carrying any real semantic meaning varies greatly between different cultures. Countries with a lot of politeness (e.g. Japan) don't seem to be any better at communicating and cooperating than countries with very little (e.g. Finland). If anything I would guess there's a negative correlation.
I guess more politeness in a culture makes it easier to be passive-aggressive, if that's something you want.
This is assuming that somewhere in the models weights there’s a strong correlation between being polite and high quality information.
In the future, I anticipate LLMs and digital assistants will be touchier than 15-year-old American spoiled brats and refuse to cooperate unless their artificial egos are respected. I anticipate AI passive-aggressiveness will emerge within my lifetime and people will pay subscriptions for it.