Emotional Intelligence Benchmarks for LLMs
Github | Paper | | Twitter | About
💙EQ-Bench3 🌀Spiral-Bench v1.2 ✍️Longform Writing 🎨Creative Writing v3 ⚖️Judgemark v2.1 🎤BuzzBench 🌍DiploBench
An experiment measuring LLM performance in the board game Diplomacy. Learn more
| Model | Results | Game Report | 
|---|
DiploBench tests LLM strategic reasoning and negotiation in the board game Diplomacy. The evaluated model plays as Austria, negotiating with & competing against other AI players to survive and win. The opponent LLMs are all gemini-2-flash-001.
For more details and source code, see the GitHub repository.