frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: CRTX – AI code gen that tests and fixes its own output (OSS)

https://github.com/CRTXAI/CRTX
1•johnnycash926•2h ago
We built an open-source CLI that generates code, runs tests, fixes failures, and gets an independent AI review — all before you see the output. We started with a multi-model pipeline where different AI models handled different stages (architect, implement, refactor, verify). We assumed more models meant better code. Then we benchmarked it: 39% average quality score at $4.85 per run. A single model scored 94% at $0.36. Our pipeline was actively making things worse. So we killed it and rebuilt around what developers actually do when they get AI-generated code: run it, test it, fix what breaks. The Loop generates code, runs pytest automatically, feeds failures back for targeted fixes, and repeats until all tests pass. Then an independent Arbiter (always a different model than the generator) reviews the final output. Latest benchmark across three tasks (simple CLI, REST API, async multi-agent system): Single Sonnet: 94% avg, 10 min dev time, $0.36 Single o3: 81% avg, 4 min dev time, $0.44 Multi-model: 88% avg, 9 min dev time, $5.59 CRTX Loop: 99% avg, 2 min dev time, $1.80 "Dev time" estimates how long a developer would spend debugging the output before it's production-ready. The Loop's hardest prompt produced 127 passing tests with zero failures. When the Loop hits a test it can't fix, it has a three-tier escalation: diagnose the root cause before patching, strip context to just the failing test and source file, then bring in a different model for a second opinion. The goal is zero dev time on every run. Model-agnostic — works with Claude, GPT, o3, Gemini, Grok, DeepSeek. Bring your own API keys. Apache 2.0. pip install crtx https://github.com/CRTXAI/crtx We published the benchmark tool too — run crtx benchmark --quick to reproduce our results with your own keys. Curious what scores people get on different providers and tasks.

Comments

johnnycash926•54m ago
Creator here. Solo developer, building in public. Some context that didn’t fit above: we published the full benchmark data and the tool to reproduce it because we think the AI code gen space has a transparency problem. Everyone claims their tool is better, nobody shows data. The “dev time” metric is the one I’m most proud of. It estimates how long you’d spend debugging the output before it’s production-ready. A model can score 95% but still hand you code with broken imports and failing tests — that’s 15 minutes of your time. The Loop’s goal is zero. Website with more details: https://crtx-ai.com Happy to answer questions about the benchmark methodology, the gap closing system, or the architecture. And if anyone runs crtx benchmark --quick with their own keys, I’d genuinely love to see the results.

Meta Deployed AI and It Is Killing Our Agency

https://mojodojo.io/blog/meta-is-systematically-killing-our-agency/
1•zenincognito•1m ago•0 comments

dwata: Local Financial Data Extraction from Emails with Ministral 3 3B, Ollama

https://www.youtube.com/watch?v=LVT-jYlvM18
1•brainless•2m ago•0 comments

Show HN: Claude Chrome Parallel – Ultrafast Parallel Browser MCP for Chrome

https://github.com/shaun0927/claude-chrome-parallel
1•shaun0927•6m ago•0 comments

OpenAI considered alerting Canadian police about school shooting suspect

https://www.theguardian.com/world/2026/feb/21/tumbler-ridge-shooter-chatgpt-openai
1•n1b0m•8m ago•0 comments

Topological Naming Problem

https://wiki.freecad.org/Topological_naming_problem
1•tripdout•13m ago•0 comments

Can we debug a living cell like a running binary?

https://cellhacker.substack.com/p/dna-is-a-self-executing-binary-a
1•efim_bushmanov•21m ago•3 comments

Tiny QR code achieved using electron microscope technology

https://newatlas.com/technology/smallest-qr-code-bacteria-tu-wien/
1•jonbaer•22m ago•0 comments

The Fundamental Limits of LLMs at Scale

https://arxiv.org/abs/2511.12869
2•o4c•23m ago•0 comments

A perceptual-first mobile audio DSP experiment

1•adriel_d•30m ago•0 comments

Saturn's Rings Came from a Two-Moon Collision About 100M Years Ago

https://gizmodo.com/saturns-rings-came-from-a-two-moon-collision-about-100-million-years-ago-stud...
4•mooreds•43m ago•0 comments

A man who triggered the AI explosion(2020) – Alex Krizhevsky [video]

https://www.youtube.com/watch?v=gwzwkv2hO5k
1•o4c•44m ago•0 comments

How to Use Goosetown for Parallel Agentic Engineering

https://block.github.io/goose/blog/2026/02/19/gastown-explained-goosetown/
2•mooreds•44m ago•0 comments

Checkset – a Ruby gem for repeatable verifications using Playwright

https://afomera.dev/posts/2026-02-20-checkset-introduction
1•mooreds•45m ago•0 comments

Understanding LLM from scratch Using middle school math

https://medium.com/data-science/understanding-llms-from-scratch-using-middle-school-math-e602d27e...
2•ilokeshpawar•46m ago•0 comments

Process Isolation on NetBSD with Chroot(2)

https://overeducated-redneck.net/blurgh/netbsd-chroot-isolation.html
1•jaypatelani•47m ago•0 comments

Hardware LLM at 16K Tokens/s

https://taalas.com/products/
1•gcollard-•53m ago•1 comments

With Nvidia's GB10 Superchip, I'm Running Serious AI Models in My Living Room

https://www.pcmag.com/news/nvidia-gb10-superchip-running-ai-models-in-my-living-room
6•the_arun•1h ago•3 comments

Former Debian Project Leader Cautions Against Cover-Up and Censorship in Debian

https://techrights.org/n/2026/02/20/Former_Debian_Project_Leader_Branden_Robinson_Cautions_Agains...
1•amcclure•1h ago•0 comments

TabType – Universal text expansion for macOS for your context

https://tabtype.app
1•enixam•1h ago•1 comments

Show HN: Git uncommit – reset unpushed, committed changes

https://github.com/below43/git-uncommit
1•below43•1h ago•3 comments

The New Digg.com Is Slop

https://techrights.org/n/2026/01/24/Digg_com_Digg_is_a_Censorship_Platform_Just_Another_Social_Co...
5•amcclure•1h ago•1 comments

Show HN: JVBar CIS Benchmark scanner and remediation script generator

https://www.jvbar.com
1•sandadze•1h ago•0 comments

Designing a Document Management System from Scraps

https://www.theolouvel.com/fieldnotes/Small+Stabs/2026-02-20+-+Designing+a+Document+Management+Sy...
1•theolouvel•1h ago•0 comments

OpenAI Employees Raised Alarms About Canada Shooting Suspect Months Ago

https://www.wsj.com/us-news/law/openai-employees-raised-alarms-about-canada-shooting-suspect-mont...
7•caminante•1h ago•1 comments

Show HN: Polya's urn – essays on complexity and emergence

https://www.polyasurn.com/
1•pcarolan•1h ago•0 comments

Open Letter to Tech Companies: Protect Your Users from Lawless DHS Subpoenas

https://www.techdirt.com/2026/02/20/open-letter-to-tech-companies-protect-your-users-from-lawless...
2•cdrnsf•1h ago•0 comments

MCP Servers Reaches 79K GitHub Stars

https://theagenttimes.com/articles/mcp-servers-79017-stars
2•Ross00781•1h ago•1 comments

I made a local AI creature that runs on integers

https://double-star-games.itch.io/feryl/devlog/1393626/introducing-feryl-a-local-desktop-ai-creature
1•pmeade-ds•1h ago•1 comments

Phil Spencer Retires from Microsoft and Xbox

https://twitter.com/i/status/2024951211129254314
2•stevefan1999•1h ago•0 comments

Show HN: Assay – Found 250 bugs in LiteLLM, LobeChat via AI code verification

https://github.com/gtsbahamas/hallucination-reversing-system
2•tywellshn•1h ago•1 comments