DiploBench

Emotional Intelligence Benchmarks for LLMs

Github | Paper | | Twitter | About

💙EQ-Bench3 | 🎨Creative Writing | ⚖️Judgemark v2 | 🎤BuzzBench | 🌍DiploBench | 💗EQ-Bench (Legacy)

An experiment measuring LLM performance in the board game Diplomacy. Learn more

Model Results Game Report


About DiploBench

DiploBench tests LLM strategic reasoning and negotiation in the board game Diplomacy. The evaluated model plays as Austria, negotiating with & competing against other AI players to survive and win. The opponent LLMs are all gemini-2-flash-001.

For more details and source code, see the GitHub repository.

Note: Despite the name, this is not a benchmark (yet). These are single game runs with high variance between iterations.