frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

1•Chance-Device•1m ago•0 comments

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
1•ColinWright•4m ago•0 comments

Jim Fan calls pixels the ultimate motor controller

https://robotsandstartups.substack.com/p/humanoids-platform-urdf-kitchen-nvidias
1•robotlaunch•7m ago•0 comments

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

https://www.jeffgeerling.com/blog/2026/exploring-a-modern-smpte-2110-broadcast-truck-with-my-dad/
1•HotGarbage•7m ago•0 comments

AI UX Playground: Real-world examples of AI interaction design

https://www.aiuxplayground.com/
1•javiercr•8m ago•0 comments

The Field Guide to Design Futures

https://designfutures.guide/
1•andyjohnson0•9m ago•0 comments

The Other Leverage in Software and AI

https://tomtunguz.com/the-other-leverage-in-software-and-ai/
1•gmays•11m ago•0 comments

AUR malware scanner written in Rust

https://github.com/Sohimaster/traur
3•sohimaster•13m ago•1 comments

Free FFmpeg API [video]

https://www.youtube.com/watch?v=6RAuSVa4MLI
3•harshalone•13m ago•1 comments

Are AI agents ready for the workplace? A new benchmark raises doubts

https://techcrunch.com/2026/01/22/are-ai-agents-ready-for-the-workplace-a-new-benchmark-raises-do...
2•PaulHoule•18m ago•0 comments

Show HN: AI Watermark and Stego Scanner

https://ulrischa.github.io/AIWatermarkDetector/
1•ulrischa•18m ago•0 comments

Clarity vs. complexity: the invisible work of subtraction

https://www.alexscamp.com/p/clarity-vs-complexity-the-invisible
1•dovhyi•19m ago•0 comments

Solid-State Freezer Needs No Refrigerants

https://spectrum.ieee.org/subzero-elastocaloric-cooling
2•Brajeshwar•20m ago•0 comments

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

1•mc-0•21m ago•1 comments

From Zero to Hero: A Brief Introduction to Spring Boot

https://jcob-sikorski.github.io/me/writing/from-zero-to-hello-world-spring-boot
1•jcob_sikorski•21m ago•1 comments

NSA detected phone call between foreign intelligence and person close to Trump

https://www.theguardian.com/us-news/2026/feb/07/nsa-foreign-intelligence-trump-whistleblower
8•c420•22m ago•1 comments

How to Fake a Robotics Result

https://itcanthink.substack.com/p/how-to-fake-a-robotics-result
1•ai_critic•22m ago•0 comments

It's time for the world to boycott the US

https://www.aljazeera.com/opinions/2026/2/5/its-time-for-the-world-to-boycott-the-us
3•HotGarbage•23m ago•0 comments

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

https://jslambda.github.io/tldr-vsearch/
1•jslambda•23m ago•1 comments

The AI CEO Experiment

https://yukicapital.com/blog/the-ai-ceo-experiment/
2•romainsimon•24m ago•0 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
4•surprisetalk•28m ago•0 comments

MS-DOS game copy protection and cracks

https://www.dosdays.co.uk/topics/game_cracks.php
4•TheCraiggers•29m ago•0 comments

Updates on GNU/Hurd progress [video]

https://fosdem.org/2026/schedule/event/7FZXHF-updates_on_gnuhurd_progress_rump_drivers_64bit_smp_...
2•birdculture•30m ago•0 comments

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

https://xcancel.com/search?f=tweets&q=davenewworld_2%2Fstatus%2F2020128223850316274
14•doener•30m ago•2 comments

MyFlames: View MySQL execution plans as interactive FlameGraphs and BarCharts

https://github.com/vgrippa/myflames
1•tanelpoder•31m ago•0 comments

Show HN: LLM of Babel

https://clairefro.github.io/llm-of-babel/
1•marjipan200•31m ago•0 comments

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

https://github.com/lance0/xfr
3•tanelpoder•33m ago•0 comments

Famfamfam Silk icons – also with CSS spritesheet

https://github.com/legacy-icons/famfamfam-silk
1•thunderbong•33m ago•0 comments

Apple is the only Big Tech company whose capex declined last quarter

https://sherwood.news/tech/apple-is-the-only-big-tech-company-whose-capex-declined-last-quarter/
4•elsewhen•36m ago•0 comments

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
2•todsacerdoti•38m ago•0 comments
Open in hackernews

Show HN: Evaluating LLMs on creative writing via reader usage, not benchmarks

https://www.narrator.sh/
36•jauws•5mo ago
Hey HN! I'd love to get some people to mess around with a little side project I built to teach myself DSPy! I've been a big fan of reading fiction + webnovels for a while now, and have always been curious about two things: how can LLMs iteratively learn to write better based on reader feedback, and which LLMs are actually best at creative writing (research benchmarks are cool, but don't necessarily translate to real-world usage).

That's exactly why I built narrator.sh! The platform takes in a user input for a novel idea, then generates serialized fiction chapter-by-chapter by using DSPy to optimize the writing based on real reader feedback. I'm using CoT and parallel modules to break down the writing task, refine modules + LLM-as-a-judge for reward functions, and the SIMBA optimizer to recompile user ratings from previous chapters to improve subsequent ones.

Instead of synthetic benchmarks, I track real reader metrics: time spent reading, ratings, bookmarks, comments, and return visits. This creates a leaderboard of which models actually write engaging fiction that people want to finish.

Right now the closest evals for creative writing LLMs come from the author perspective (OpenRouter's usage data for tools like Novelcrafter). But ultimately readers decide what's good, not authors.

You can try it at https://narrator.sh. Here's the current leaderboard: https://narrator.sh/llm-leaderboard (it's a bit bare right now b/c there's not that many users haha)

(Fair warning: there's some adult content since I posted on Reddit for beta testers and people got creative with prompts. I'm working on diversifying the content!)

Comments

BoorishBears•5mo ago
I run a site that does something similar, but on a more granular level (prompts at the page level rather than the chapter)

I think right now we're at the point where novelcrafter is an excellent proxy for the best models for readers, because LLMs are still mostly losing engagement due to technical errors as opposed to subjective ones:

That's repetition problems, moralizing/soft-censorship, grammatical quirks, missing instructions, forgetting major plot points, etc.

Those kinds of errors are so obvious you can almost rank these models with an N=1 vibe test, and they limit how much people will consume unless you're scratching certain itches like NSFW

-

However I do think with enough post-training you can beat that level of problems and move to a stage where the writing is technically sound (and that's what I've spent most of the last year working on).

From there you get to more challenging problems that require much more feedback along some level of specialization per user (like what Midjourney does during onboarding to build up a style profile). Once you're not making technical mistakes, you now have to codify the ethereal concept of "user taste", and that will be a really interesting challenge for LLMs.

jauws•5mo ago
Thanks for the comment! Do you mind linking the site - would love to check it out! That's a very fair point about the technical error aspect. Though with all the confounding variables (author skill differences, model selection based on price/speed, etc.) I'd say it's probably the most mature signal we have right now, but still far from ideal.

Really interested in what you've been working on for the past year! Are you doing custom fine-tuning or more on the prompting/post-processing side? Also I definitely need to check out the Midjourney onboarding, it sounds super interesting for inspo regarding your point about personalization + taste!

BoorishBears•5mo ago
My 2nd most recent submission has a link to it

Most of it has been fine-tuning (SFT/DPO/GRPO), but also a lot of prompting and adding steps between the user's prompt and the output

johnnyfeng•5mo ago
Nice approach! Reader engagement beats synthetic benchmarks any day. Bookmarked to try later - curious which models actually hook readers vs just score well on tests.
jauws•5mo ago
Thanks Johnny! I totally agree with you, really appreciate you for checking out my project!
mwkaufma•5mo ago
The plagiarism tumbler turns and turns.
Der_Einzige•5mo ago
Btw, creative writing is something where good sampler settings uniquely improve your experience a lot. That's why the coomer/ERP crowd is usually the first to implement a new sampler technique.

You should explore high temperature (far above 2) sampling with good truncations like min_p, top n sigma, TFS, mirostat, typicality sampling, etc. Basically anything that isn't top_p/top_k. This is the path to highly diverse outputs.

jauws•5mo ago
This is an amazing suggestion! Will definitely try to figure out a way to incorporate this into the leaderboard without making it a constant each time. I'm currently using OpenRouter's default parameters which is totally a brainfart on my part.
skyzouwdev•5mo ago
Really like the shift from synthetic benchmarks to actual reader engagement — feels way more aligned with what “good writing” actually means. Curious if you’ve noticed certain models consistently improving more with feedback than others.
jauws•5mo ago
Thanks! Anecdotally, I'd tend to say that Claude 3.7 tends to improve the most, but it seems like (via the leaderboard), some people really prefer Grok-3 lol.
layer8•5mo ago
> currently in early access. just a fun side project :)

This causes me cognitive dissonance.

sinharishabh•5mo ago
Really cool project, and the site looks clean too! Have you thought about tracking where reader drops off in a chapter? Could be great signal for narrative flow.