Find Related products on Amazon

Shop on Amazon

LLM Benchmark for 'Longform Creative Writing'

Published on: 2025-05-05 11:56:33

Longform Creative Writing Benchmark This benchmark evaluates several abilities: Brainstorming & planning out a short story/novella from a minimal prompt Reflect on the plan & revise Write a short story/novella over 8x 1000 word turns Models are typically evaluated via openrouter, using temp=0.7 and min_p=0.1 as the generation settings. Outputs are evaluated with a scoring rubric by Claude Sonnet 3.7. Length The average chapter length (chars). Slop Score The Slop column measures the frequency of words/phrases typically overused by LLMs (“GPT-isms”) in each completed chapter. The lower, the better. Repetition Metric The Repetition column measures how strongly a model repeats words/phrases across multiple tasks. Higher means more repetition. Degradation A mini-sparkline of the 8 chapter scores (averages) to visually see if the model’s chapter quality drops off as it continues writing. The degradation score is the absolute value of the trendline's gradient. Score (0-100) The ... Read full article.