frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

FSD helped save my father's life during a heart attack

https://twitter.com/JJackBrandt/status/2019852423980875794
1•blacktulip•2m ago•0 comments

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

https://writtte.xyz
1•lasgawe•4m ago•0 comments

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

https://www.youtube.com/watch?v=e9FUdOfp8ME
1•zeristor•6m ago•0 comments

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
1•gnufx•8m ago•0 comments

Transcribe your aunts post cards with Gemini 3 Pro

https://leserli.ch/ocr/
1•nielstron•12m ago•0 comments

.72% Variance Lance

1•mav5431•13m ago•0 comments

ReKindle – web-based operating system designed specifically for E-ink devices

https://rekindle.ink
1•JSLegendDev•15m ago•0 comments

Encrypt It

https://encryptitalready.org/
1•u1hcw9nx•15m ago•1 comments

NextMatch – 5-minute video speed dating to reduce ghosting

https://nextmatchdating.netlify.app/
1•Halinani8•16m ago•1 comments

Personalizing esketamine treatment in TRD and TRBD

https://www.frontiersin.org/articles/10.3389/fpsyt.2025.1736114
1•PaulHoule•17m ago•0 comments

SpaceKit.xyz – a browser‑native VM for decentralized compute

https://spacekit.xyz
1•astorrivera•18m ago•1 comments

NotebookLM: The AI that only learns from you

https://byandrev.dev/en/blog/what-is-notebooklm
1•byandrev•18m ago•1 comments

Show HN: An open-source starter kit for developing with Postgres and ClickHouse

https://github.com/ClickHouse/postgres-clickhouse-stack
1•saisrirampur•18m ago•0 comments

Game Boy Advance d-pad capacitor measurements

https://gekkio.fi/blog/2026/game-boy-advance-d-pad-capacitor-measurements/
1•todsacerdoti•19m ago•0 comments

South Korean crypto firm accidentally sends $44B in bitcoins to users

https://www.reuters.com/world/asia-pacific/crypto-firm-accidentally-sends-44-billion-bitcoins-use...
2•layer8•20m ago•0 comments

Apache Poison Fountain

https://gist.github.com/jwakely/a511a5cab5eb36d088ecd1659fcee1d5
1•atomic128•21m ago•2 comments

Web.whatsapp.com appears to be having issues syncing and sending messages

http://web.whatsapp.com
1•sabujp•22m ago•2 comments

Google in Your Terminal

https://gogcli.sh/
1•johlo•23m ago•0 comments

Shannon: Claude Code for Pen Testing: #1 on Github today

https://github.com/KeygraphHQ/shannon
1•hendler•23m ago•0 comments

Anthropic: Latest Claude model finds more than 500 vulnerabilities

https://www.scworld.com/news/anthropic-latest-claude-model-finds-more-than-500-vulnerabilities
2•Bender•28m ago•0 comments

Brooklyn cemetery plans human composting option, stirring interest and debate

https://www.cbsnews.com/newyork/news/brooklyn-green-wood-cemetery-human-composting/
1•geox•28m ago•0 comments

Why the 'Strivers' Are Right

https://greyenlightenment.com/2026/02/03/the-strivers-were-right-all-along/
1•paulpauper•30m ago•0 comments

Brain Dumps as a Literary Form

https://davegriffith.substack.com/p/brain-dumps-as-a-literary-form
1•gmays•30m ago•0 comments

Agentic Coding and the Problem of Oracles

https://epkconsulting.substack.com/p/agentic-coding-and-the-problem-of
1•qingsworkshop•30m ago•0 comments

Malicious packages for dYdX cryptocurrency exchange empties user wallets

https://arstechnica.com/security/2026/02/malicious-packages-for-dydx-cryptocurrency-exchange-empt...
1•Bender•31m ago•0 comments

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

https://github.com/pheonix-delta/axiom-voice-agent
1•shubham-coder•31m ago•0 comments

Penisgate erupts at Olympics; scandal exposes risks of bulking your bulge

https://arstechnica.com/health/2026/02/penisgate-erupts-at-olympics-scandal-exposes-risks-of-bulk...
4•Bender•32m ago•0 comments

Arcan Explained: A browser for different webs

https://arcan-fe.com/2026/01/26/arcan-explained-a-browser-for-different-webs/
1•fanf2•33m ago•0 comments

What did we learn from the AI Village in 2025?

https://theaidigest.org/village/blog/what-we-learned-2025
1•mrkO99•34m ago•0 comments

An open replacement for the IBM 3174 Establishment Controller

https://github.com/lowobservable/oec
2•bri3d•36m ago•0 comments
Open in hackernews

Show HN: Create LLM graders and run evals in JavaScript with one file

https://github.com/bolt-foundry/bolt-foundry/tree/main/packages/bff-eval
28•randall•8mo ago
Hi HN!

Run it: OPENROUTER_API_KEY="sk" npx bff-eval --demo

We built a tool to help people take LLM outputs and easily grade them / eval them to know how good an assistant response is.

We've built a number of LLM apps, and while we could ship decent tech demos, we were disappointed with how they'd perform over time. We worked with a few companies who had the same problem, and found out scientifically building prompts and evals is far from a solved problem... writing these things feels more like directing a play than coding.

Inspired by Anthropic's constitutional ai concepts, and amazing software like DSPy, we're setting out to make fine tuning prompts, not models, the default approach to improving quality using actual metrics and structured debugging techniques.

Our approach is pretty simple: you feed it a JSONL file with inputs and outputs, pick the models you want to test against (via OpenRouter), and then use an LLM-as-grader file in JS that figures out how well your outputs match the original queries.

If you're starting from scratch, we've found TDD is a great approach to prompt creation... start by asking an LLM to generate synthetic data, then you be the first judge creating scores, then create a grader and continue to refine it till its scores match your ground truth scores.

If you’re building LLM apps and care about reliability, I hope this will be useful! Would love any feedback. The team and I are lurking here all day and happy to chat. Or hit me up directly on Whatsapp: +1 (646) 670-1291

We have a lot bigger plans long-term, but we wanted to start with this simple (and hopefully useful!) tool.

Run it: OPENROUTER_API_KEY="sk" npx bff-eval --demo

Comments

rbalicki•8mo ago
Very cool! This lets you grade output across different base models. Does it also allow you grade output across different prompts?
randall•8mo ago
that’s the next step… we have a structured approach to prompting too that we think will help people build better prompts too.