EQ-Bench Creative Writing v3 Leaderboard

For more details about the benchmark, see the About section.

Repetition Metric

The Repetition column measures the tendency of a model to repeat words and phrases in the outputs generated for this benchmark. It sums the frequencies of the top most common words, bigrams and trigrams that appear in the text. Higher values indicate more repetitive output.

Slop Score

The Slop column measures the frequency of words/phrases typically overused by LLMs ("GPT-isms"). The value is calculated by matching the text against a master slop list derived from over-represented words & phrases in outputs from many models.