Creative Writing v3

Emotional Intelligence Benchmarks for LLMs

Github | Paper | | Twitter | About

💙EQ-Bench3 | ✍️Longform Writing | 🎨Creative Writing v3 | ⚖️Judgemark v2 | 🎤BuzzBench | 🌍DiploBench | 🎨Creative Writing (Legacy) | 💗EQ-Bench (Legacy)

A LLM-judged creative writing benchmark (v3). Learn more

Model Length Slop Repetition Abilities Style Rubric Score Elo Score


For more details about the benchmark, see the About section.

Repetition Metric

The Repetition column measures the tendency of a model to repeat words and phrases in the outputs generated for this benchmark. It sums the frequencies of the top most common words, bigrams and trigrams that appear in the text. Higher values indicate more repetitive output.

Slop Score

The Slop column measures the frequency of words/phrases typically overused by LLMs ("GPT-isms"). The value is calculated by matching the text against a master slop list derived from over-represented words & phrases in outputs from many models.