Emotional Intelligence Benchmarks for LLMs
Github | Paper | | Twitter | About
💙EQ-Bench3 | 🎨Creative Writing | ⚖️Judgemark v2 | 🎤BuzzBench | 🌍DiploBench | 💗EQ-Bench (Legacy)
An experiment measuring LLM performance in the board game Diplomacy. Learn more
Model | Results | Game Report |
---|
DiploBench tests LLM strategic reasoning and negotiation in the board game Diplomacy. The evaluated model plays as Austria, negotiating with & competing against other AI players to survive and win. The opponent LLMs are all gemini-2-flash-001.
For more details and source code, see the GitHub repository.