frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I built a database for AI agents

https://github.com/DinobaseHQ/dinobase
7•Kappa90•1h ago
Hey HN,

I just spent the last few weeks building a database for agents.

Over the last year I built PostHog AI, the company's business analyst agent, where we experimented on giving raw SQL access to PostHog databases vs. exposing tools/MCPs. Needless to say, SQL wins.

I left PostHog 3 weeks ago to work on side-projects. I wanted to experiment more with SQL+agents.

I built an MVP exposing business data through DuckDB + annotated schemas, and ran a benchmark with 11 LLMs (from Kimi 2.5 to Claude Opus 4.6) answering business questions with either 1) per-source MCP access (e.g. one Stripe MCP, one Hubspot MCP) or 2) my annotated SQL layer.

My solution consistently reached 2-3x accuracy (correct vs. incorrect answers), using 16-22x less tokens per correct answer, and being 2-3x faster. Benchmark in the repo!

The insight is that tool calls/MCPs/raw APIs force the agent to join information in-context. SQL does that natively.

What I have today: - 101 connectors (SaaS APIs, databases, file storages) sync to Parquet via dlt, locally or in your S3/GCS/Azure bucket - DuckDB is the query engine — cross-source JOINs across sources work natively, plus guardrails for safe mutations / reverse ETL - After each sync a Claude agent annotates the schema: table descriptions, column docs, PII flags, relationship maps

It works with all major agent frameworks (LangChain, CrewAI, LlamaIndex, Pydantic AI, Mastra), and local agents like Claude Code, Cursor, Codex and OpenClaw.

I love dinosaurs and the domain was available, so it's called Dinobase.

It's not bug free and I'm here to ask for feedback or major holes in the project I can't see, because the results seem almost too good. Thanks!

Comments

federiconitidi•1h ago
Could you give more context on the benchmarks included in the repo?
Kappa90•1h ago
It's an experimental benchmark, I couldn't find any off-the-shelf benchmarks to use this with. There's Spider 2.0 but it's for text-to-SQL. I'm planning to run this [1] next but it's quite expensive.

There's 75 questions, divided in 5 use case groups: revenue ops, e-commerce, knowledge bases, devops, support.

I then generated a synthetic dataset with data mimicking APIs ranging from Stripe to Hubspot to Shopify to Zendesk etc..

I expose all the data through Dinobase vs. having one MCP per source e.g. one MCP for Stripe data, one MCP for Hubspot data etc.

I tested this with 11 models, ranging from Kimi 2.5 to Claude Opus 4.6.

Finally there's an LLM-as-a-judge that decides if the answer is correct, and I log latency and tokens.

[1] https://arxiv.org/abs/2510.02938

peterbuch•1h ago
Nice work. One thinga I'd love to see in the bench mark: a breakdown by question type (aggregations vs. multi-hop joins vs. lookups). My guess is the SQL approach pulls ahead hardest on the join-heavy ones, and showing that explicitly would make the "too good to be true" results feel more grounded. Either way, the token efficiency numbers sounds intruiging.
Kappa90•1h ago
It's not explicitly stated in the benchmarks README, good catch.

80% of the benchmark questions are aggregations, 16% are multi-hop, 4% are lookups/subqueries.

Multi-hop is where LLMs struggle the most (hallucinations, partial answers), and aggregations is where you get the most token efficiency, since you skip on pagination which you need with APIs/MCPs that don't provide filters.

tosh•1h ago
Which llm is best at driving DuckDB currently?
Kappa90•1h ago
DuckDB exposes Postgres SQL, and most coding LLMs have been trained on that.

Of the small models I tested, Qwen 3.5 is the clear winner. Going to larger LLMs, Sonnet and Opus lead the charts.

c6d6•26m ago
How does it handle schema drift (eg saas vendor changes a column)? Does the annotation agent mark breaking changes in some way or just describe the current state of the world? With that many connections, you'll hit a bunch of weird edge cases, especially with things like salesforce custom objects.
Kappa90•16m ago
Right now schema changes create new columns, I’m working on reconciling old columns, which right now are not dropped.

The annotation/semantic layer agent creates a new description of the schema on sync, which represents the current state, but that includes stale columns as of today, data is not dropped.

I’ll implement automated schema migrations in the next week or so!

"Inference Noise", AI slop's older brother

https://uxcontent.com/inference-noise-ai-vs-human-writing/
2•haubey•3m ago•0 comments

System Card: Claude Mythos Preview [pdf]

https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf
6•be7a•4m ago•0 comments

Show HN: I turned the Pong Wars simulation into a multiplayer game

https://github.com/mayerwin/pong-wars-reloaded
2•mayerwin•4m ago•0 comments

CIA used "long-range quantum magnetometry" called "Ghost Murmur" in Iran

https://nypost.com/2026/04/07/us-news/ghost-murmur-a-never-used-secret-tool-deployed-to-find-lost...
4•bhouston•5m ago•1 comments

First criticality for Indian fast breeder reactor

https://www.world-nuclear-news.org/articles/first-criticality-for-indian-fast-breeder-reactor
1•philipkglass•6m ago•0 comments

One async Rust codebase for STM32, Linux and the browser

https://aimdb.dev/blog/building-aimdb-one-async-api
2•sounds-like-lx•6m ago•0 comments

Meet The Hero: Jane Elliott

https://www.lowellmilkencenter.org/programs/projects/view/brown-eyes-blue-eyes/hero
2•thunderbong•7m ago•0 comments

Iranian-Affiliated Cyber Actors Exploit PLCs Across US Critical Infrastructure

https://www.cisa.gov/news-events/cybersecurity-advisories/aa26-097a
5•jaredwiener•7m ago•0 comments

A C++ library that reduces tail latency in RAM

https://twitter.com/lauriewired/status/2041567004814098542
2•mvdwoord•10m ago•0 comments

Assessing Claude Mythos Preview's cybersecurity capabilities

https://red.anthropic.com/2026/mythos-preview/
6•sweis•11m ago•0 comments

Is there any 'media' similar to HN? like the minimalist format

2•elmlabs•12m ago•0 comments

Alize – A daily newsletter that watches YouTube for you

https://alize.me
3•yasintoy•13m ago•0 comments

Project Glasswing: Securing critical software for the AI era

https://www.anthropic.com/glasswing
29•Ryan5453•13m ago•1 comments

Self-Promotion on HN

3•denotes•13m ago•0 comments

Facevitals – Lightweight rPPG vital signs monitoring (No GPU required)

https://github.com/Ninjexxx/facevitals
2•artzeraw•14m ago•0 comments

Reshape, not replace: What AI is changing about our work today

https://www.mjeggleton.com/blog/the-work-to-do-the-work
2•michaelje•15m ago•0 comments

War Is the Best VC Pitch Nobody Wants to Give

https://rawtext.io/signal/war-is-the-best-venture-capital-pitch/
3•just_a_watcher•17m ago•1 comments

AI replacing radiologists: Docs slam Nvidia, Anthropic CEOs for false info

https://timesofindia.indiatimes.com/technology/tech-news/as-ceo-of-americas-largest-public-hospit...
3•rustoo•18m ago•0 comments

Donald Trump is threatening the extinction of an 'entire civilization' tonight

https://isdonaldtrumpalive.com/donald-trump-is-alive-and-threatening-the-extinction-of-an-entire-...
4•only_in_america•18m ago•1 comments

Ask HN: What are you working on? (April 2026) (Non AI)

2•cousin_it•21m ago•4 comments

Show HN: PromptJuggler – A dev env and runner for prompts, workflows, agents

https://promptjuggler.com
2•TamasSzigeti•21m ago•0 comments

AI-powered roasts (and Solutions) for your product

https://roastcraft.app/en
2•techguydiy•21m ago•0 comments

I made Claude Code run on my Apple Watch

https://twitter.com/whosmatu/status/2041341039466971508
2•immatheus•22m ago•0 comments

GPT-5.4 in OpenClaw doesn't suck. Your prompts do

https://skylarbpayne.com/posts/openclaw-gpt-5-4-vs-opus/
2•sbpayne•24m ago•1 comments

Recursive Make Considered Harmful (1997) [pdf]

https://aegis.sourceforge.net/auug97.pdf
1•kaycebasques•24m ago•0 comments

Snapdragon X2 Elite Extreme: Strong Rival to Apple, Major Threat to AMD/Intel

https://www.notebookcheck.net/Qualcomm-Snapdragon-X2-Elite-Extreme-Analysis-Benchmarks-Efficiency...
2•Tuldok•24m ago•0 comments

Re-Thinking Framebuffers in PanVK

https://www.collabora.com/news-and-blog/blog/2026/03/23/re-thinking-framebuffers-in-panvk/
1•mfilion•24m ago•0 comments

The Masters is a smarter business

https://www.ibm.com/case-studies/masters
1•carlos-menezes•25m ago•1 comments

McGridsort: Warping Grids for GPU k-way mergesort

https://winwang.blog/posts/mcgridsort
1•winwang•27m ago•1 comments

Perfdeck (formerly Perfmon): Consolidate CLI monitoring tools into a single TUI

https://github.com/sumant1122/perfdeck
1•paperplaneflyr•28m ago•1 comments