EQ-Bench DiploBench

About DiploBench

DiploBench tests LLM strategic reasoning and negotiation in the board game Diplomacy. The evaluated model plays as Austria, negotiating with & competing against other AI players to survive and win. The opponent LLMs are all gemini-2-flash-001.

For more details and source code, see the GitHub repository.

Note: Despite the name, this is not a benchmark (yet). These are single game runs with high variance between iterations.