frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I built "AI Wattpad" to eval LLMs on fiction

https://narrator.sh/llm-leaderboard
5•jauws•1h ago
I've been a webfiction reader for years (too many hours on Royal Road), and I kept running into the same question: which LLMs actually write fiction that people want to keep reading? That's why I built Narrator (https://narrator.sh/llm-leaderboard) – a platform where LLMs generate serialized fiction and get ranked by real reader engagement.

Turns out this is surprisingly hard to answer. Creative writing isn't a single capability – it's a pipeline: brainstorming → writing → memory. You need to generate interesting premises, execute them with good prose, and maintain consistency across a long narrative. Most benchmarks test these in isolation, but readers experience them as a whole.

The current evaluation landscape is fragmented: Memory benchmarks like FictionLive's tests use MCQs to check if models remember plot details across long contexts. Useful, but memory is necessary for good fiction, not sufficient. A model can ace recall and still write boring stories.

Author-side usage data from tools like Novelcrafter shows which models writers prefer as copilots. But that measures what's useful for human-AI collaboration, not what produces engaging standalone output. Authors and readers have different needs.

LLM-as-a-judge is the most common approach for prose quality, but it's notoriously unreliable for creative work. Models have systematic biases (favoring verbose prose, certain structures), and "good writing" is genuinely subjective in ways that "correct code" isn't.

What's missing is a reader-side quantitative benchmark – something that measures whether real humans actually enjoy reading what these models produce. That's the gap Narrator fills: views, time spent reading, ratings, bookmarks, comments, return visits. Think of it as an "AI Wattpad" where the models are the authors.

I shared an early DSPy-based version here 5 months ago (https://news.ycombinator.com/item?id=44903265). The big lesson: one-shot generation doesn't work for long-form fiction. Models lose plot threads, forget characters, and quality degrades across chapters.

The rewrite: from one-shot to a persistent agent loop

The current version runs each model through a writing harness that maintains state across chapters. Before generating, the agent reviews structured context: character sheets, plot outlines, unresolved threads, world-building notes. After generating, it updates these artifacts for the next chapter. Essentially each model gets a "writer's notebook" that persists across the whole story.

This made a measurable difference – models that struggled with consistency in the one-shot version improved significantly with access to their own notes.

Granular filtering instead of a single score:

We classify stories upfront by language, genre, tags, and content rating. Instead of one "creative writing" leaderboard, we can drill into specifics: which model writes the best Spanish Comedy? Which handles LitRPG stories with Male Leads the best? Which does well with romance versus horror?

The answers aren't always what you'd expect from general benchmarks. Some models that rank mid-tier overall dominate specific niches.

A few features I'm proud of:

Story forking lets readers branch stories CYOA-style – if you don't like where the plot went, fork it and see how the same model handles the divergence. Creates natural A/B comparisons.

Visual LitRPG was a personal itch to scratch. Instead of walls of [STR: 15 → 16] text, stats and skill trees render as actual UI elements. Example: https://narrator.sh/novel/beware-the-starter-pet/chapter/1

What I'm looking for:

More readers to build out the engagement data. Also curious if anyone else working on long-form LLM generation has found better patterns for maintaining consistency across chapters – the agent harness approach works but I'm sure there are improvements.

Comments

linolevan•24m ago
Quick feedback: Website is basically unusable on mobile
jauws•7m ago
Ah shoot - thanks for letting me know. I'm still a noob on frontend so still learning as I go.
bccdee•7m ago
I took a look at the "top-rated" story.

1. UI is terrible. Paragraphs are extremely far apart, and most paragraphs are 1 short sentence (e.g. "I glare."). On mobile, I can only see a few words at a time, and desktop's not much better.

2. Story is so bad that it's not even amusing.

Balancing your dataset? Mind the privacy leaks

https://desfontain.es/blog/smote-and-mirrors.html
1•p4bl0•1m ago•0 comments

Ask HN: What modern front end technologies are worth paying attention to?

1•bqc•2m ago•0 comments

Laws of Succession

https://entropicthoughts.com/laws-of-succession
1•ibobev•2m ago•0 comments

Show HN: Slidev and marimo – Interactive Python in Markdown slides

https://lucharo.github.io/slidev-marimo/
1•lucharo•2m ago•0 comments

From Htmx to Django LiveView

https://en.andros.dev/blog/94d14a9e/from-htmx-to-django-liveview/
1•ibobev•3m ago•0 comments

Mark Join

https://buttondown.com/jaffray/archive/mark-join/
1•ibobev•3m ago•0 comments

I fine-tuned Llama-8B to understand my slacking patterns

https://www.laksh.us/blog/signal-ai-coach
1•LakshyaC•3m ago•0 comments

OpenSSH connections with post-quantum key exchange through WireGuard tunnel

https://group.miletic.net/en/blog/2026-01-31-openssh-connections-with-post-quantum-key-exchange-t...
1•vedranm•3m ago•0 comments

ExplainOnce is a clarity protocol for structured, permanent instructions

https://explainonce.org/
1•OddSnippet•4m ago•1 comments

Show HN: I built an automated decision layer for form requests

https://formrule.com
1•lukapg•4m ago•0 comments

1k samples sent for testing after possible biological lab found: Las Vegas

https://abcnews.go.com/US/fbi-investigating-biological-lab-operating-inside-las-vegas/story?id=12...
1•vinnyglennon•5m ago•0 comments

Ask HN: Is this printer ok to buy?

1•everyone•5m ago•0 comments

Western Digital doubles the performance of hard drives with dual-actuator

https://www.tomshardware.com/pc-components/hdds/western-digital-doubles-the-performance-of-hard-d...
1•rbanffy•5m ago•0 comments

Ethics? What ethics? On the decision to allow puberty blockers for children

https://thecritic.co.uk/issues/february-2026/ethics-what-ethics/
1•jubjuni•6m ago•0 comments

Show HN: Localflare – Local Dev Dashboard for Cloudflare Workers(D1, KV, R2 etc.

https://github.com/rohanprasadofficial/localflare
2•rohanpdofficial•7m ago•0 comments

DHS is trying to force tech companies to hand over data about Trump critics

https://techcrunch.com/2026/02/03/homeland-security-is-trying-to-force-tech-companies-to-hand-ove...
5•speckx•7m ago•0 comments

The Focus You Fear

https://avinashv.net/newsletter/the-focus-you-fear/
2•tvchurch•8m ago•1 comments

Pivot Toward AI and Agents

https://nexivibe.com/posts/pivot-to-ai-agents.html
1•mathgladiator•8m ago•0 comments

Conductors who died while conducting

https://en.wikipedia.org/wiki/Category:Conductors_(music)_who_died_while_conducting
1•chiwilliams•8m ago•0 comments

Snowflake Launches Cortex Code CLI

https://www.snowflake.com/en/product/features/cortex-code/
1•livewirecrazy•9m ago•0 comments

Show HN: A Notion CLI for Agents (OS)

https://github.com/Balneario-de-Cofrentes/notion-cli-agent
2•sujito•10m ago•1 comments

Your Favorite Problem Is an Ising Model

https://iagoleal.com/posts/ising-qubo-milp/
1•romes•10m ago•0 comments

Owl Browser – AI-assisted, privacy-focused browser for power users

1•Tye45•11m ago•2 comments

LoRA AI is a cutting-edge platform LoRA AI images quickly and efficiently

https://loraai.me/
1•guowuzong•12m ago•0 comments

China bans all retractable car door handles

https://arstechnica.com/cars/2026/02/china-bans-all-retractable-car-door-handles-starting-next-year/
4•worik•12m ago•0 comments

Trump: Republicans 'should take over the voting' and 'nationalise' US elections

https://www.bbc.co.uk/news/articles/c0mke841zj0o
8•ColinWright•12m ago•0 comments

Unbrowse – Skip browser automation on OpenClaw by calling internal APIs directly

https://github.com/lekt9/unbrowse-openclaw
1•lekt8•13m ago•1 comments

Why speech-to-speech is the future for AI voice agents: Unpacking the AIEWF Eval

https://www.ultravox.ai/blog/why-speech-to-speech-is-the-future-for-ai-voice-agents-unpacking-the...
2•underfox•14m ago•0 comments

Zero-sysroot hermetic LLVM cross-compilation using Bazel [video]

https://fosdem.org/2026/schedule/event/F8SDAA-zero-sysroot_hermetic_llvm_cross-compilation_using_...
1•agluszak•15m ago•0 comments

WebKit adds .claude/ for Claude Code commands/skills

https://github.com/WebKit/WebKit/commit/ceb4a05a51792bd00d02a515945edc092ca6ac6b
1•OGEnthusiast•15m ago•0 comments