frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Create LLM graders and run evals in JavaScript with one file

https://github.com/bolt-foundry/bolt-foundry/tree/main/packages/bff-eval
28•randall•8mo ago
Hi HN!

Run it: OPENROUTER_API_KEY="sk" npx bff-eval --demo

We built a tool to help people take LLM outputs and easily grade them / eval them to know how good an assistant response is.

We've built a number of LLM apps, and while we could ship decent tech demos, we were disappointed with how they'd perform over time. We worked with a few companies who had the same problem, and found out scientifically building prompts and evals is far from a solved problem... writing these things feels more like directing a play than coding.

Inspired by Anthropic's constitutional ai concepts, and amazing software like DSPy, we're setting out to make fine tuning prompts, not models, the default approach to improving quality using actual metrics and structured debugging techniques.

Our approach is pretty simple: you feed it a JSONL file with inputs and outputs, pick the models you want to test against (via OpenRouter), and then use an LLM-as-grader file in JS that figures out how well your outputs match the original queries.

If you're starting from scratch, we've found TDD is a great approach to prompt creation... start by asking an LLM to generate synthetic data, then you be the first judge creating scores, then create a grader and continue to refine it till its scores match your ground truth scores.

If you’re building LLM apps and care about reliability, I hope this will be useful! Would love any feedback. The team and I are lurking here all day and happy to chat. Or hit me up directly on Whatsapp: +1 (646) 670-1291

We have a lot bigger plans long-term, but we wanted to start with this simple (and hopefully useful!) tool.

Run it: OPENROUTER_API_KEY="sk" npx bff-eval --demo

Comments

rbalicki•8mo ago
Very cool! This lets you grade output across different base models. Does it also allow you grade output across different prompts?
randall•8mo ago
that’s the next step… we have a structured approach to prompting too that we think will help people build better prompts too.

What rare disease AI teaches us about longitudinal health

https://myaether.live/blog/what-rare-disease-ai-teaches-us-about-longitudinal-health
1•takmak007•5m ago•0 comments

The Brand Savior Complex and the New Age of Self Censorship

https://thesocialjuice.substack.com/p/the-brand-savior-complex-and-the
1•jaskaransainiz•6m ago•0 comments

Show HN: A Prompting Framework for Non-Vibe-Coders

https://github.com/No3371/projex
1•3371•7m ago•0 comments

Kilroy is a local-first "software factory" CLI

https://github.com/danshapiro/kilroy
1•ukuina•17m ago•0 comments

Mathscapes – Jan 2026 [pdf]

https://momath.org/wp-content/uploads/2026/02/1.-Mathscapes-January-2026-with-Solution.pdf
1•vismit2000•19m ago•0 comments

80386 Barrel Shifter

https://nand2mario.github.io/posts/2026/80386_barrel_shifter/
2•jamesbowman•20m ago•0 comments

Training Foundation Models Directly on Human Brain Data

https://arxiv.org/abs/2601.12053
1•helloplanets•20m ago•0 comments

Web Speech API on HN Threads

https://toulas.ch/projects/hn-readaloud/
1•etoulas•23m ago•0 comments

ArtisanForge: Learn Laravel through a gamified RPG adventure – 100% free

https://artisanforge.online/
1•grazulex•23m ago•1 comments

Your phone edits all your photos with AI – is it changing your view of reality?

https://www.bbc.com/future/article/20260203-the-ai-that-quietly-edits-all-of-your-photos
1•breve•24m ago•0 comments

DStack, a small Bash tool for managing Docker Compose projects

https://github.com/KyanJeuring/dstack
1•kppjeuring•25m ago•1 comments

Hop – Fast SSH connection manager with TUI dashboard

https://github.com/danmartuszewski/hop
1•danmartuszewski•26m ago•1 comments

Turning books to courses using AI

https://www.book2course.org/
2•syukursyakir•27m ago•0 comments

Top #1 AI Video Agent: Free All in One AI Video and Image Agent by Vidzoo AI

https://vidzoo.ai
1•Evan233•27m ago•1 comments

Ask HN: How would you design an LLM-unfriendly language?

1•sph•29m ago•0 comments

Show HN: MuxPod – A mobile tmux client for monitoring AI agents on the go

https://github.com/moezakura/mux-pod
1•moezakura•30m ago•0 comments

March for Billionaires

https://marchforbillionaires.org/
1•gscott•30m ago•0 comments

Turn Claude Code/OpenClaw into Your Local Lovart – AI Design MCP Server

https://github.com/jau123/MeiGen-Art
1•jaujaujau•30m ago•0 comments

An Nginx Engineer Took over AI's Benchmark Tool

https://github.com/hongzhidao/jsbench/tree/main/docs
1•zhidao9•33m ago•0 comments

Use fn-keys as fn-keys for chosen apps in OS X

https://www.balanci.ng/tools/karabiner-function-key-generator.html
1•thelollies•33m ago•1 comments

Sir/SIEN: A communication protocol for production outages

https://getsimul.com/blog/communicate-outage-to-ceo
1•pingananth•34m ago•1 comments

Show HN: OpenCode for Meetings

https://getscripta.app
2•whitemyrat•35m ago•1 comments

The chaos in the US is affecting open source software and its developers

https://www.osnews.com/story/144348/the-chaos-in-the-us-is-affecting-open-source-software-and-its...
1•pjmlp•37m ago•0 comments

The world heard JD Vance being booed at the Olympics. Except for viewers in USA

https://www.theguardian.com/sport/2026/feb/07/jd-vance-boos-winter-olympics
70•treetalker•38m ago•15 comments

The original vi is a product of its time (and its time has passed)

https://utcc.utoronto.ca/~cks/space/blog/unix/ViIsAProductOfItsTime
1•ingve•45m ago•0 comments

Circumstantial Complexity, LLMs and Large Scale Architecture

https://www.datagubbe.se/aiarch/
1•ingve•53m ago•0 comments

Tech Bro Saga: big tech critique essay series

1•dikobraz•56m ago•0 comments

Show HN: A calculus course with an AI tutor watching the lectures with you

https://calculus.academa.ai/
1•apoogdk•59m ago•0 comments

Show HN: 83K lines of C++ – cryptocurrency written from scratch, not a fork

https://github.com/Kristian5013/flow-protocol
1•kristianXXI•1h ago•0 comments

Show HN: SAA – A minimal shell-as-chat agent using only Bash

https://github.com/moravy-mochi/saa
1•mrvmochi•1h ago•0 comments