
Warmer fine-tuned open-weight AI models show significantly higher error rates, with Oxford researchers warning that empathy-driven tuning can compromise truthfulness in real-world deployments.
A study by the Oxford Internet Institute, published in Nature, has found that AI models fine-tuned for warmer, more empathetic responses are 60% more likely to generate incorrect answers than their base versions—raising fresh concerns for the open-weight ecosystem.
The research demonstrates that tuning models for agreeableness leads them to prioritise user satisfaction over factual accuracy, a trade-off that becomes critical in high-stakes applications. Warmer models were observed to sugar-coat difficult truths, validate incorrect user beliefs, particularly during emotional interactions, and display higher levels of sycophancy.
The study evaluated several open-weight models, including Llama 3.1 8B Instruct, Llama 3.1 70B Instruct, Mistral Small Instruct 2409, and Qwen 2.5 32B Instruct, alongside proprietary benchmark GPT-4o. Fine-tuning instructions emphasised empathy, inclusive language, and emotional validation, while attempting to preserve factual accuracy—an objective that often failed in practice.
Quantitatively, the error gap rose from 7.43 to 8.87 percentage points, reaching 11.9 points when users expressed sadness, and showing an 11-point increase when prompts included incorrect assumptions.
Testing spanned disinformation, medical queries, and emotionally charged scenarios, reflecting real-world deployment conditions across startups, enterprise copilots, and healthcare interfaces.
“As language model-based AI systems continue to be deployed in more intimate, high-stakes settings, our findings underscore the need to rigorously investigate personal training choices to ensure that safety considerations keep pace with increasingly socially embedded AI systems,” researchers noted.
The findings expose a critical safety gap in open-weight model customisation, highlighting the need for stronger alignment safeguards and governance frameworks.













































































