- Similar "bias" was exhibited by other models including LLaMA 3.3 and Deepseek v3.
- Even when human annotators judged the human-written summary to be higher quality, leading LLMs still preferred their own writing 67-82% of the time.
- Preference was stronger in larger models.
- In several cases, LLMs also prefer their own writing over that of other LLMs.
There's a pretty decent longer summary in this thread where I first heard about the article: https://x.com/heynavtoor/status/2048088874686300431
ytpete•1h ago
- Similar "bias" was exhibited by other models including LLaMA 3.3 and Deepseek v3.
- Even when human annotators judged the human-written summary to be higher quality, leading LLMs still preferred their own writing 67-82% of the time.
- Preference was stronger in larger models.
- In several cases, LLMs also prefer their own writing over that of other LLMs.
There's a pretty decent longer summary in this thread where I first heard about the article: https://x.com/heynavtoor/status/2048088874686300431