frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: An Open-Source Eval Suite That Helps You Fix Postgres-Based Text-to-SQL

https://www.tigerdata.com/blog/text-to-sql-eval-open-source
1•cevian•6h ago
We've been building text-to-SQL at TigerData and kept hitting the same problem: evaluation tools that tell you your accuracy score but nothing about how to improve it.

Getting a 60% pass rate is meaningless if you don't know whether failures are from bad schema retrieval or poor SQL generation. It's the difference between actionable insights and meaningless benchmarketing.

So we built, and are now open-sourcing, text-to-sql-eval with a simple insight: run every query three different ways:

- Normal mode - let the system retrieve schema and generate SQL - Full schema mode - provide all tables to test upper bound accuracy - Golden tables mode - give it the right tables to isolate reasoning issues

The performance delta between modes tells you exactly what's broken.

PostgreSQL-specific because database quirks matter for correctness. Works with any LLM or text-to-SQL system. Includes an LLM-as-judge option because deterministic matching produces too many false negatives on complex queries.

We've been using this internally to improve our (also open-sourced) text-to-sql system.

Open sourcing both the eval suite and a companion tool for generating test datasets from your production schema.

Built with uv for easy setup. TimescaleDB for tracking results over time. Simple Flask UI for exploring failures.

Try it, break it, tell us what's missing.

The Evolution: From Atomic Options to Lygos Credit

https://blog.lygos.finance/the-evolution-from-atomic-options-to-lygos-credit/
1•janandonly•1m ago•0 comments

Diablo Game Developers Join Communications Workers of America

https://cwa-union.org/news/releases/hundreds-diablo-game-developers-join-communications-workers-a...
1•ughitsaaron•3m ago•0 comments

Death by PowerPoint: the slide that killed seven people

https://mcdreeamiemusings.com/blog/2019/4/13/gsux1h6bnt8lqjd7w2t2mtvfg81uhx
2•scapecast•4m ago•0 comments

Apple Releases Xcode 26 Beta 7 with GPT-5 Support and Claude Integration

https://www.macrumors.com/2025/08/28/xcode-gpt-5-claude-integration/
2•tosh•5m ago•0 comments

The Dumbest Phone Is Parenting Genius

https://www.theatlantic.com/family/archive/2025/06/landline-kids-smartphone-alternative/683203/
1•SLHamlet•6m ago•0 comments

Why Particle Size Distribution Matters for Optical PM Sensors

https://www.airgradient.com/blog/when-all-pm25-isnt-the-same/
1•ahaucnx•7m ago•0 comments

Eco-driving measures could significantly reduce vehicle emissions

https://techxplore.com/news/2025-08-eco-significantly-vehicle-emissions.html
1•PaulHoule•7m ago•0 comments

In Boston, Trucks Keep Crashing into Low Bridges

https://www.wsj.com/us-news/in-boston-trucks-keep-crashing-into-low-bridges-a18c5c5c
1•bookofjoe•7m ago•1 comments

We Did Margin-Negative Computing Before It Was Cool, and It Was Silly

https://blog.railway.com/p/free-plan
2•dban•8m ago•0 comments

NXSweep: Using the NX AI Exploit Logic for Blue Teaming

https://www.yashthapliyal.com/blog/nxsweep
1•yash1hi•10m ago•0 comments

Linux Foundation Opens the Door to DocumentDB

https://thenewstack.io/linux-foundation-opens-the-door-to-documentdb/
2•CrankyBear•10m ago•0 comments

Sometimes CPU cores are odd – Anubis

https://anubis.techaro.lol/blog/2025/cpu-core-odd/
3•rbanffy•10m ago•0 comments

Starship Will Reduce Bandwidth Launch Cost by Up to 50x

https://research.33fg.com/analysis/starship-will-reduce-bandwidth-launch-cost-by-up-to-50x
2•bilsbie•11m ago•0 comments

Kick Ass – Destroy the Web

https://kickassapp.com
2•nvahalik•11m ago•0 comments

The Authoritarian Checklist

https://donmoynihan.substack.com/p/the-authoritarian-checklist
1•tastyface•11m ago•0 comments

Expert LSP the official language server implementation for Elixir

https://github.com/elixir-lang/expert
3•pimienta•13m ago•0 comments

Top Down versus Bottom Up AI Adoption

https://substack.com/inbox/post/172208814
1•mathattack•14m ago•0 comments

Show HN: Smart Buildings Powered by SparkplugB, Aklivity Zilla, and Kafka

https://github.com/aklivity/zilla-demos/tree/main/smart-buildings
1•luk212•16m ago•0 comments

The Bitter Lesson Is Misunderstood – By Kushal Chakrabarti

https://obviouslywrong.substack.com/p/the-bitter-lesson-is-misunderstood
1•JnBrymn•17m ago•0 comments

Stop Using Vulnerability Counts to Measure Software Security

https://cacm.acm.org/opinion/stop-using-vulnerability-counts-to-measure-software-security/
1•zdw•19m ago•1 comments

Pompelmi – RFI-safe uploads for Node.js with ZIP inspection, MIME/size checks

https://github.com/pompelmi/pompelmi
2•zdw•23m ago•0 comments

Speed-coding for the 6502 – a simple example

https://www.colino.net/wordpress/en/archives/2025/08/28/speed-coding-for-the-6502-a-simple-example/
2•mmphosis•25m ago•1 comments

Professor debuts AI sidekick in trailblazing course

https://www.sfu.ca/sfunews/stories/2025/08/new-class-of-teacher--sfu-professor-debuts-ai-sidekick...
1•geox•25m ago•0 comments

Capabilities of GPT-5 on Multimodal Medical Reasoning

https://www.alphaxiv.org/abs/2508.08224v2
1•wrayjustin•27m ago•0 comments

Anthropic's Claude Chrome browser extension rolls out – how to get early access

https://www.zdnet.com/article/anthropics-claude-chrome-browser-extension-rolls-out-how-to-get-ear...
1•CrankyBear•29m ago•0 comments

Google Advances Its Layer-1 Blockchain

https://www.coindesk.com/business/2025/08/27/google-advances-its-layer-1-blockchain-here-s-what-w...
1•jsphweid•30m ago•0 comments

Lenin Was a Mushroom

https://en.wikipedia.org/wiki/Lenin_was_a_mushroom
5•lambdaba•34m ago•0 comments

Nvidia's top two mystery customers made up 39% of the chipmaker's Q2 revenue

https://www.cnbc.com/2025/08/28/nvidias-top-two-mystery-customers-made-up-39percent-of-its-q2-rev...
3•ivape•34m ago•3 comments

Hacks at MIT

https://en.wikipedia.org/wiki/Hacks_at_the_Massachusetts_Institute_of_Technology
2•just_human•38m ago•1 comments

ActivityPub Rocks

https://activitypub.rocks/
1•marvinborner•40m ago•0 comments