frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Pi Co-pilot – Evaluation of AI apps made easy

https://withpi.ai/
6•achintms•3h ago
Hey HN — tl;dr, 2 months ago we shared our first product with the HN community. Despite receiving lots of traffic from HN, we didn’t see any traction or retention. One of our major takeaways was that our product was too complicated. So we’ve spent the last 2 months iterating towards a much more focused product that tries to do just one thing really well. Today, we’d like to share our second launch with HN.

Our original idea [1] was to help software engineers build high-quality LLM applications by integrating their domain knowledge into a scoring system, which could then drive everything from prompt tuning to fine-tuning, RL, and data filtering. But what we quickly learned (with the help of HN – thank you!) is that most people aren’t optimizing as their first, second, or even third step — they’re just trying to ship something reasonable using system prompts and off-the-shelf models.

In looking to build a product that’s useful to a wider audience, we found one piece of the original product that most people _did_ notice and want: the ability to check that the outputs of their AI apps look good. Whether you’re tweaking a prompt, switching models, or just testing a feature, you still need a way to catch regressions and evaluate your changes. Beyond basic correctness, developers also wanted to measure more subtle qualities — like whether a response feels friendly.

So we rebuilt the product around this single use case: helping developers define and apply subjective, nuanced evals to their LLM outputs. We call it Pi Co-pilot.

You can start with any/all of the below:

- a few good/bad examples

- a system prompt, or app description

- an old eval prompt you wrote

The co-pilot helps you turn that into a scoring spec — a set of ~10–20 concrete questions that probe the output against dimensions of quality you care about (e.g. “is it verbose?”, “does it have a professional tone?”, etc). For each question, it selects either:

- a fast encoder-based model (trained for scoring) – Pi scorer. See our original post [1] for more details on why this is a good fit for scoring compared to the “LLM as a judge” pattern.

- or generates Python functions when that makes more sense (word count, regex etc.)

You iterate over examples, tweak questions, adjust scoring behavior, and quickly reach a spec that reflects your actual taste — not some generic benchmark or off-the-shelf metrics. Then you can plug the scoring system into your own workflow: Python, TypeScript, Promptfoo, Langfuse, Spreadsheets, whatever. We provide easy integrations with these systems.

We took inspiration from tools like v0 and Bolt: natural language on the left, structured artifacts on the right. That pattern felt intuitive — explore conversationally, and let the underlying system crystallize it into things you can inspect and use (scoring spec, examples and code). Here is a loom demo of this [2]

We’d appreciate feedback from the community on whether this second iteration of our product feels more useful. We are offering $10 of free credits (about 25M input tokens), so you can try out the Pi co-pilot for your use-cases. No sign-in required to start exploring: https://withpi.ai

Overall stack: Co-pilot next.js and Vercel on GCP. Models: 4o on Azure, fine tuned Llama & ModernBert on GCP. Training: Runpod and SFCompute.

– Achint (co-founder, Pi Labs)

[1] https://news.ycombinator.com/item?id=43362535

[2] https://www.loom.com/share/82c2e7b511854a818e8a1f4eabb1a8c2

Russia to pass experimental law that tracks foreigners in Moscow via smartphones

https://www.theregister.com/2025/05/22/russia_expected_to_pass_experimental/
1•rntn•1m ago•0 comments

Clarke's Law (Part 2)

https://seths.blog/2025/05/clarkes-law-part-2/
1•herbertl•2m ago•0 comments

No exemption from one star reviews

https://alearningaday.blog/2025/05/22/no-exemption/
1•herbertl•2m ago•0 comments

I used to shoot $500k pharma commercials. I made this for $500 in Veo 3

https://twitter.com/PJaccetturo/status/1925464847900352590
1•enraged_camel•4m ago•0 comments

Write Like a Patent Litigator: Avoid Mistakes Made by Non-Patent Lawyers (2017)

https://repository.law.uic.edu/cgi/viewcontent.cgi?article=1426&context=ripl
1•Tomte•5m ago•0 comments

AI Succession [video] (2023)

https://www.youtube.com/watch?v=NgHFMolXs3U
1•droideqa•5m ago•0 comments

Write Down Your Password (2005)

https://www.schneier.com/blog/archives/2005/06/write_down_your.html
1•Tomte•5m ago•0 comments

Gccx transforms CPX (JSX like syntax) into asm-DOM Virtual DOM (C++)

https://github.com/mbasso/gccx
1•90s_dev•6m ago•0 comments

It's Not Just Data Centers

https://halcyon.io/blog/not-just-data-centers
1•brucefalck•7m ago•0 comments

MCP explained without hype or fluff

https://blog.nilenso.com/blog/2025/05/12/mcp-explained-without-hype-or-fluff/
2•captn3m0•7m ago•0 comments

Administrative court: Cookie banner must contain "Reject all" button

https://www.heise.de/en/news/Administrative-court-Cookie-banner-must-contain-Reject-all-button-10390520.html
4•nabla9•10m ago•0 comments

Why does Elon Musk love this socialist sci-fi series?

https://www.vox.com/culture/413502/iain-banks-culture-series-elon-musk-jeff-bezos-mark-zuckerberg
1•GeoAtreides•12m ago•0 comments

From confectioners to robots – Tor Alva in Mulegns is unveiled

https://ethz.ch/en/news-and-events/eth-news/news/2025/05/from-confectioners-to-robots-tor-alva-in-mulegns-is-unveiled.html
1•gnabgib•13m ago•0 comments

Ask HN: Founders – how are you automating support?

1•jesper______•14m ago•3 comments

Sweden bans paying for OnlyFans content

https://www.euractiv.com/section/tech/news/sweden-bans-paid-online-sexual-acts-in-law-targeting-platforms-like-onlyfans/
5•michalpleban•15m ago•5 comments

OpenAI's Ambitions Just Became Crystal Clear

https://www.theatlantic.com/technology/archive/2025/05/openai-io-jony-ive/682884/
1•toss1•16m ago•0 comments

Why Engineering Teams Should Build Their Own AI Coding Agents

https://qckfx.com/blog/why-engineering-teams-should-build-their-own-ai-coding-agents
1•chw9e•16m ago•0 comments

Near-infrared spatiotemporal color vision enabled by upconversion contact lenses

https://www.cell.com/cell/fulltext/S0092-8674(25)00454-4
3•ArnoVW•17m ago•1 comments

Xiaomi launches in-house XRing O1 chipset to enter high-end SoC arena

https://www.canalys.com/insights/xiaomi-in-house-o1
1•nimar•18m ago•0 comments

Metrics Are Easy–Impact Is Hard

https://eleganthack.com/metrics-are-easy-impact-is-hard/
1•adrianhoward•23m ago•0 comments

Launch HN: WorkDone (YC X25) – AI Audit of Medical Charts

9•digitaltzar•23m ago•2 comments

Accelerating Docker Builds by Halving EC2 Boot Time

https://depot.dev/blog/accelerating-builds-improve-ec2-boot-time
4•Telstrom90•24m ago•0 comments

Bayesian Cognition and the Future of Human-AI Interaction

https://learnbayesstats.com/episode/132-bayesian-cognition-and-the-future-of-human-ai-interaction-tom-griffiths
1•synthetictask•24m ago•0 comments

Show HN: CLI Quote Saver

1•DeepTechTaiye•25m ago•2 comments

Coinbase Data Breach Notification

https://www.maine.gov/agviewer/content/ag/985235c7-cb95-4be2-8792-a1252b4f8318/f61fae18-f669-499e-9a87-f4d323d281f8.html
2•typeofhuman•26m ago•0 comments

Archaeologist sailing like a Viking makes unexpected discoveries

https://phys.org/news/2025-05-archaeologist-viking-unexpected-discoveries.html
4•Brajeshwar•28m ago•0 comments

Are groundbreaking science discoveries becoming harder to find?

https://www.nature.com/articles/d41586-025-01548-4
3•Brajeshwar•28m ago•1 comments

By putting AI into everything, Google wants to make it invisible

https://www.technologyreview.com/2025/05/21/1117251/by-putting-ai-into-everything-google-wants-to-make-it-invisible/
3•Brajeshwar•28m ago•0 comments

TypeScript Native Previews

https://devblogs.microsoft.com/typescript/announcing-typescript-native-previews/
1•simlevesque•29m ago•0 comments

Show HN: Whenish – Plan Group Events in iMessages

https://apps.apple.com/us/app/whenish/id6745035749
12•devgoth•32m ago•5 comments