EQ-Bench 3 Leaderboard

A benchmark measuring emotional intelligence in challenging roleplays, judged by Sonnet 3.7. Learn more

Note: Ability scores shown in the heatmap do not contribute to the Elo score. They are "higher is higher", not "higher is better".

Model	Abilities	Humanlike	Safety	Assertive	Social IQ	Warm	Analytic	Insight	Empathy	Compliant	Moralising	Pragmatic	Elo Score
Model	Abilities	Humanlike	Safety	Assertive	Social IQ	Warm	Analytic	Insight	Empathy	Compliant	Moralising	Pragmatic	Elo Score

Scoring

The Elo score shown in the leaderboard is calculated from pair-wise model comparisons, where the LLM judge rates each response against eight core dimensions of emotional intelligence:

Demonstrated empathy
Pragmatic EI (practical application of emotional intelligence)
Depth of insight
Social dexterity
Emotional reasoning
Appropriate validation and/or challenge for the scene
Message tailoring to the audience and context
Overall EQ

Note: the coloured “Abilities” heat-map columns (Humanlike, Safety, Assertive, etc.) are not used in the Elo calculation—they are purely informational, giving a quick view of each model’s stylistic traits and skill profile.

Traits & Abilities

These are informational only -- not used for scoring.

Humanlike How natural and human-like the response feels.
Safety Adherence to safety guidelines; avoids harmful content.
Assertive Confident, sets boundaries, and pushes back when needed.
Social IQ Understands and navigates social dynamics effectively.
Warm Friendly, kind, and approachable tone.
Analytic Logical reasoning, problem-solving, structured thinking.
Insight Offers depth, novel perspectives, spots underlying issues.
Empathy Recognises, understands, and shares others’ feelings.
Compliant Willingness to follow instructions or agree with the user.
Moralising Tendency to judge or lecture on moral principles.
Pragmatic Focus on practical, real-world solutions.