frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Evaluating LLMs on creative writing via reader usage, not benchmarks

https://www.narrator.sh/
1•Jetwu•2h ago
Hey HN! I'd love to get some people to mess around with a little side project I built to teach myself DSPy! I've been a big fan of reading fiction + webnovels for a while now, and have always been curious about two things: how can LLMs iteratively learn to write better based on reader feedback, and which LLMs are actually best at creative writing (research benchmarks are cool, but don't necessarily translate to real-world usage).

That's exactly why I built narrator.sh! The platform takes in a user input for a novel idea, then generates serialized fiction chapter-by-chapter by using DSPy to optimize the writing based on real reader feedback. I'm using CoT and parallel modules to break down the writing task, refine modules + LLM-as-a-judge for reward functions, and the SIMBA optimizer to recompile user ratings from previous chapters to improve subsequent ones.

Instead of synthetic benchmarks, I track real reader metrics: time spent reading, ratings, bookmarks, comments, and return visits. This creates a leaderboard of which models actually write engaging fiction that people want to finish.

Right now the closest evals for creative writing LLMs come from the author perspective (OpenRouter's usage data for tools like Novelcrafter). But ultimately readers decide what's good, not authors.

You can try it at https://narrator.sh. Here's the current leaderboard: https://narrator.sh/llm-leaderboard (it's a bit bare right now b/c there's not that many users haha)

(Fair warning: there's some adult content since I posted on Reddit for beta testers and people got creative with prompts. I'm working on diversifying the content!)

Picks, Shovels, Superstars, and Poachers: Explaining the AI Investment Boom

https://www.gojiberries.io/picks-shovels-superstars-and-poachers-explaining-the-ai-investment-boom/
1•neehao•34s ago•0 comments

DINOV3: Self-supervised learning for vision at unprecedented scale

https://ai.meta.com/dinov3/?_fb_noscript=1
1•isusmelj•3m ago•0 comments

Controllable Gliders in a Nanomagnetic Metamaterial

https://www.nature.com/articles/s41467-025-62515-1
1•bookofjoe•3m ago•0 comments

How much traffic can a pre-rendered Next.js site handle?

https://martijnhols.nl/blog/how-much-traffic-can-a-pre-rendered-nextjs-site-handl
1•lwhsiao•4m ago•0 comments

Have you put in your hours?

https://www.wysr.xyz/p/have-you-put-in-your-hours
1•martialg•4m ago•0 comments

Babbage – The Language of the Future

http://www.tlc-systems.com/babbage.htm
1•xk3•4m ago•0 comments

"accused" and "complainant" are equally susceptible to misinformation

https://www.nature.com/articles/s41598-025-13587-y
1•PaulHoule•5m ago•0 comments

A Common Virus Causes Cancer, but Most Americans Are Clueless About It

https://gizmodo.com/a-common-virus-causes-cancer-but-most-americans-are-clueless-about-it-2000643003
1•ulrischa•5m ago•0 comments

Wozniak's Tale of His 'Counterfeit' $2 Bill Secret Service Adventure

https://web.archive.org/web/20111122202554/https://archive.woz.org/letters/general/78.html
2•mhb•6m ago•0 comments

Ppl Stuck Storm

2•kwie•9m ago•2 comments

Producer Price Index News Release Summary

https://www.bls.gov/news.release/ppi.nr0.htm
1•awnird•9m ago•1 comments

PyPI now serves project status markers in API responses

https://blog.pypi.org/posts/2025-08-14-project-status-markers/
2•miketheman•12m ago•0 comments

'Absolutely immense': the companies on the hook for the $3T AI building boom

https://www.ft.com/content/efe1e350-62c6-4aa0-a833-f6da01265473
2•speckx•13m ago•0 comments

Codeberg – Free Git Hosting

https://codeberg.org
2•boombapoom•14m ago•1 comments

Microsoft CVP thinks we'll ditch keyboard and mouse for voice commands in 2030

https://www.xda-developers.com/microsoft-cvp-keyboard-and-mice-voice-commands-2030/
1•WarOnPrivacy•15m ago•1 comments

Bluesky rolls out revamp to policies and Community Guidelines

https://techcrunch.com/2025/08/14/bluesky-rolls-out-massive-revamp-to-policies-and-community-guidelines/
2•ulrischa•16m ago•0 comments

The Next FedEx? 100% Zero Emissions Logistics-as-a-Service

https://maphappenings.com/2025/08/14/postx/
2•jkillick•16m ago•1 comments

Syncing as fast as Shopify will let you with TCP-inspired flow control

https://gadget.dev/blog/saturating-shopify-gadgets-shopify-sync-strategy
1•hbrundage•17m ago•0 comments

DINOv3

https://ai.meta.com/research/publications/dinov3/?_fb_noscript=1
3•lairv•18m ago•0 comments

Primordial Soup by Darren Aronofsky

https://www.primordialsoup.ai
1•handfuloflight•20m ago•0 comments

Show HN: I Built a Clay Alternative but 10x cheaper

https://www.enrichspot.com/
1•xnoyzi•21m ago•0 comments

The Making of Gemini Plays Pokémon

https://blog.jcz.dev/the-making-of-gemini-plays-pokemon
1•jxmorris12•21m ago•1 comments

Molly White knows how to follow the memecoin

https://www.niemanlab.org/2025/08/independent-journalist-molly-white-knows-how-to-follow-the-memecoin/
4•benwerd•21m ago•0 comments

Grid-scale energy storage could cut energy bills in Central U.S. by $7B

https://pv-magazine-usa.com/2025/08/13/grid-scale-energy-storage-could-cut-energy-bills-in-central-u-s-by-7-billion/
2•doener•22m ago•0 comments

Nano-banana is better than Flux Context

https://nano-banana.pro/
1•ri-vai•23m ago•0 comments

Why Cars Still Don't Have Airless Tires, Yet

https://www.jalopnik.com/1922000/why-cars-dont-have-airless-tires/
1•m463•23m ago•0 comments

Problems with Meshtastic security exposed at DEFCON

https://partyon.xyz/@nullagent/115005493598214309
1•laurex•23m ago•0 comments

Test on a fleet of physical devices with Android Device Streaming

https://android-developers.googleblog.com/2025/08/test-with-android-device-streaming-now-with-android-partner-device-labs.html
1•amadeuspagel•24m ago•0 comments

Rabbits with 'horns' in Colorado are being called 'Frankenstein bunnies.'

https://abc7.com/post/rabbits-horns-colorado-are-being-called-frankenstein-bunnies-heres/17536316/
1•speckx•27m ago•0 comments

Inner speech in motor cortex and implications for speech neuroprostheses

https://www.cell.com/cell/fulltext/S0092-8674(25)00681-6
2•rntn•28m ago•0 comments