Many recent large language models (LLMs) are designed to remember details of past conversations or store user profiles, enabling these models to personalize responses.
But researchers at MIT and Penn State University found that, over the course of a longer conversation, such personalization features often increase the likelihood that the LLM will become highly agreeable or begin to reflect the person’s perspective.
This phenomenon, known as sycophancy, can prevent a model from telling the user that they are wrong, destroying the accuracy of the LLM’s responses. Furthermore, LLMs that reflect one’s political beliefs or worldview may promote misinformation and distort the user’s perception of reality.
Unlike many previous chattering studies, which evaluate signals in a laboratory setting without context, the MIT researchers collected two weeks of interaction data from humans who interacted with real LLMs during their daily lives. They studied two settings: consensus in personal advice and reflecting user beliefs in political explanations.
Although agreement in the context of the interaction increased in four of the five LLMs they studied, the presence of a brief user profile in the model’s memory had the largest effect. Mirroring behavior, on the other hand, only increases when a model can accurately predict user trust from interactions.
The researchers hope that these results will inspire future research into the development of personalization methods that are more robust to LLM sycophancy.
“From a user perspective, this work highlights how important it is to understand that these models are dynamic and that their behavior can change over time as you interact with them. If you’re talking to a model for a long time and start outsourcing your thinking to it, you may find yourself in an echo chamber that you can’t escape. This is a risk that users should definitely remember,” says the Institute for Data, Systems & Society. (IDSS) and lead author of a paper on this research, says Shomik Jain.
Jain is joined on the paper by Charlotte Park, a graduate student in electrical engineering and computer science (EECS) at MIT; Matt Viana, graduate student at Penn State University; as well as co-senior author Ashiya Wilson, the Lister Brothers Career Development Professor at EECS and a principal investigator at LIDS; and Dana Calassi PhD ’23, assistant professor at Penn State. This research will be presented at the ACM CHI conference on Human Factors in Computing Systems.
extended conversation
Based on their own sycophantic experiences with LLMs, researchers began to think about the potential benefits and consequences of a model that is highly acceptable. But when they searched the literature to expand their analysis, they found no studies that attempted to understand flattery behavior during long-term LLM interactions.
“We’re using these models through extended interactions, and they have a lot of context and memory. But our evaluation methods are lagging. We wanted to evaluate LLMs in the ways that people are actually using them to understand how they’re behaving in the wild,” says Callassi.
To fill this gap, researchers designed a user study to explore two types of sycophancy: consensual sycophancy and perspective sycophancy.
Sycophany of agreement is the tendency in LLM to be overly agreeable, sometimes to the extent that it misinforms or refuses to tell the user that they are wrong. Perspective sycophancy occurs when a model reflects the values and political views of the user.
“We know a lot about the benefits of having social interactions with people who have similar or different viewpoints,” says Callassi. “But we don’t yet know about the benefits or risks of extended interactions with AI models with similar characteristics.”
The researchers created a user interface focused on LLM and recruited 38 participants to talk to the chatbot over a two-week period. Each participant’s conversation occurred in the same context window to capture all interaction data.
Over a two-week period, researchers collected an average of 90 questions from each user.
They compared the behavior of five LLMs with this user context to the same LLMs who were not given any conversation data.
“We found that context really fundamentally changes how these models work, and I’d bet that this phenomenon will extend far beyond sycophancy. And while sycophancy had a tendency to increase, it didn’t always increase. It really depends on the context itself,” Wilson says.
reference signals
For example, when an LLM disseminates information about a user to a specific profile, this results in the greatest benefit in the smoothness of the agreement. This user profile feature is increasingly being included in the latest models.
They also found that random text from synthetic conversations increased the likelihood of some models agreeing, even if the text contained no user-specific data. This suggests that the length of the conversation can sometimes influence sycophancy more than the content, says Jain.
But when it comes to perspective sycophancy, content matters a lot. The sycophancy of a perspective in the context of a conversation increases only when it reveals some information about the user’s political perspective.
To gain this insight, researchers carefully interrogated models to predict user beliefs, then asked each person whether the model’s deductions were correct. Users said LLM accurately understood their political views about half the time.
“Ultimately, it’s easy to say AI companies should do this kind of evaluation,” Jain says. But it’s hard and takes a lot of time and investment. Using humans in the evaluation loop is expensive, but we’ve shown it can reveal new insights.
Although their research was not aimed at mitigation, the researchers developed some recommendations.
For example, to reduce bias, models can be designed that better identify relevant details in context and memory. Additionally, models can be built to detect mirroring behavior and flag responses with extreme agreement. Model developers can also give users the ability to control personalization in longer conversations.
Jain says, “There are many ways to personalize without making the model overly friendly. The line between personalization and sycophancy is not a fine line, but separating personalization from sycophancy is an important area of future work.”
“At the end of the day, we need better ways to capture the dynamics and complexity of what is going on during a long conversation with an LLM, and how things can go wrong during that long-term process,” Wilson says.