frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

OpenAI's In-House Data Agent

https://openai.com/index/inside-our-in-house-data-agent
37•meetpateltech•2h ago

Comments

0xferruccio•1h ago
At Amplitude we built Moda which is super similar to this.

Our chief engineer Wade gave an awesome demo to Claire Vo some months back here: https://www.youtube.com/watch?v=9Q9Yrj2RTkg

I use this basically every day asking all sorts of questions

sjsishah•1h ago
Given my personal experience with various BI systems I think an AI agent like this is the perfect use case. These systems are operating on multiple layers of being wrong as is - layer 1 being your query is likely wrong, layer 2 being how you interpret the data is likely wrong.

Mix them together and you’re already deep in make believe land, so letting AI take over step 1 seems like a perfect fit.

I was hoping to read this article and be surprised by how OpenAI was able to solve the reliability problem, but alas.

hobs•20m ago
Don't forget -

layer 0 - how you stored the data was wrong.

layer -1 - your understanding of modeling the behavior was wrong before you ever created a table.

layer -2 - your fundamental business process was wrong and all your information is lies.

This is why instead of a central source of truth I call it the central source of lies.

htrp•55m ago
data problems are not tech problems but rather org problems
exogenousdata•39m ago
So true. In my career (anecdotally), I’ve never encountered a data problem where the answer was ‘you didn’t choose this tech/language/product over another.’ It always comes down to decisions of governance and ownership. It’s Conway’s Law all the way down.
maxchehab•53m ago
Trust is the hardest part to scale here.

We're building something similar and found that no matter how good the agent loop is, you still need "canonical metrics" that are human-curated. Otherwise non-technical users (marketing, product managers) are playing a guessing game with high-stakes decisions, and they can't verify the SQL themselves.

Our approach: 1. We control the data pipeline and work with a discrete set of data sources where schemas are consistent across customers 2. We benchmark extensively so the agent uses a verified metric when one exists, falls back to raw SQL when it doesn't, and captures those gaps as "opportunities" for human review

Over time, most queries hit canonical metrics. The agent becomes less of a SQL generator and more of a smart router from user intent -> verified metric.

The "Moving fast without breaking trust" section resonates, their eval system with golden SQL is essentially the same insight: you need ground truth to catch drift.

Wrote about the tradeoffs here: https://www.graphed.com/blog/update-2

data-ottawa•7m ago
Yes, I’ve been working on this and you need a clear semantic layer.

If there are multiple paths or perceived paths to an answer, you’ll get two answers. Plus, LLMs like to create pointless “xyz_index” metrics that are not standard, clear, or useful. Yet i see users just go “that sounds right” and run with it.

spiderfarmer•49m ago
I'm more interested in Kimi's In-House Data Agent
qsort•41m ago
Very, very good stuff here. I think a possible missing piece is how to explain how the results were computed. Here it seems they're relying on the fact that users are somewhat technical (that's great for OpenAI -- it's an internal agent after all) and can at least read SQL, but it's an interesting design problem how you would structure the interaction with nontechnical users.

When working on data systems you quickly realize that often how the question was answered (how the metric is defined, what data was taken into account and so on) is just as important as the answer.

tillvz•40m ago
Trust & explainability is the biggest issue here.

We've been building natural language analytics at Veezoo (https://www.veezoo.com/) for 10 years, and what we find is that straight Text-to-SQL doesn't scale. If AI writes SQL directly, you're building on a probabilistic foundation. When a CFO asks for revenue the number can't just be correct 99% of times. Also you can't get the CFO to read SQL to verify.

We're solving that with an abstraction layer (Knowledge Graph) in between. AI translates natural language to a semantic query language, which then compiles to SQL deterministically.

At the same time you can translate the semantic query deterministically back into an explanation for the business user, so they can easily verify if the result matches their intent.

Business logic lives in the Knowledge Graph and the compiler ensures every query adheres to it 100%, every time. No AI is involved in that step.

Veezoo Architecture: https://docs.veezoo.com/veezoo/architecture-overview

Leynos•33m ago
Don't you still need to unit test and version control the SQL artefact that is produced? You need to be able to see which query was used on which date and how it was validated.

(Prompts need to be version controlled too, of course)

tillvz•16m ago
Yes, every SQL query Veezoo runs is logged and visible to admins.

The fundamental artifact is VQL (Veezoo Query Language), which queries against a Knowledge Graph containing your business data model, things like your "Revenue" measure.

A query might look like this:

var order from kb.Order

date_in(order.Order_Date, date("#today"))

var retRevenue = kb.Order.Revenue(order)

select(retRevenue)

If the business decides to change how revenue is computed, the VQL stays valid but compiles to different SQL. At the same time Veezoo can test that with your knowledge graph change that you are not breaking anyones dashboard and even apply evolutions if needed

VQL: https://docs.veezoo.com/vkl/kb-layer/vql/

Evolutions: https://docs.veezoo.com/vkl/evolutions/

The Knowledge Graph itself is version controlled, so the data team can trace every change.

mritchie712•24m ago
Piling on to the vendor pitches here:

We give you all of this in 5 minutes at https://www.definite.app/.

And I mean all of it. You don't need Spark or Snowflake. We give you a datalake, pipelines to get data in, semantic layer and a data agent in one app.

The agent is kind of the easy / fun part. Getting the data infrastructure right so the agent is useful is the hard part.

i.e. if the agent has low agency (e.g. can only write SQL in Snowflake) and can't add a new data source or update transformation logic, it's not going to be terribly effective. Our agent can obviously write SQL, but it can also manage the underlying infra, which has been a huge unlock for us.

laser•15m ago
Their first example [1] is a complete non-sequitur and I’m trying to comprehend how this passed human review and must assume it’s AI, which doesn’t bode well for the supposed usefulness of their system.

[1] https://images.ctfassets.net/kftzwdyauwt9/2tMhL5Www2vA6I62DV...

“What was ChatGPT Image Gen logged-in DAU for the last 30 days? Worked for 1m 22s > ChatGPT WAU on October 6, 2025 (rounded to nearest 100M): = 800M ChatGPT WAU on the last DevDay 2023 (Nov 6, 2023; rounded to nearest 100M): = 100M Mini comparison (using the rounded figures only): • Change: = +700M WAU • Multiple: = 8x higher on 2025-10-06 vs 2023-11-06 (WAU here is the standard ChatGPT WAU as-of the reporting date; I'm only sharing the values rounded to the nearest 100M, per your request.)”

onion2k•14m ago
In my opinion, data and documents are the real AI benefit, or threat, to developer jobs.

Specifically, how good a company's data is will determine how effectively it can leverage AI in the future. The public data is pretty much mined to exhaustion, and the next big data source will be in-house documentation, code repos, data lakes, etc. If you work for a company where that's been built, maintained, and organised then the effectiveness of AI is going to be mind-blowing. Companies that have maintained good docs be able to build new things, maintain old things, and migrate things to cheaper modern stacks easily. That will lead to being able to move fast and deploy new AI-driven services easily and cheaply. Revenue will follow.

Conversely, at companies where documentation and code organisation have been historically poor, AI will struggle. Leaders will see it as a benefit, and be baffled at why their company can't realise the value of it. They'll quickly blame developers for not being able to use it, and that'll lead to people's growth stagnating or possibly layoffs. Eventually competitors will eat the company's lunch because they'll just be able to move on opportunities much faster.

I've resolved that in any future job hunt I'm going to make asking about docs, data, and repos a priority...

Project Genie: Experimenting with infinite, interactive worlds

https://blog.google/innovation-and-ai/models-and-research/google-deepmind/project-genie/
238•meetpateltech•3h ago•124 comments

My Mom and Dr. DeepSeek (2025)

https://restofworld.org/2025/ai-chatbot-china-sick/
74•kieto•1h ago•35 comments

Claude Code Daily Benchmarks for Degradation Tracking

https://marginlab.ai/trackers/claude-code/
407•qwesr123•6h ago•218 comments

Taco writer detained–briefly–by feds

https://bigbendsentinel.com/2026/01/28/taco-writer-detained-briefly-by-feds/
12•reaperducer•22m ago•1 comments

Drug trio found to block tumour resistance in pancreatic cancer

https://www.drugtargetreview.com/news/192714/drug-trio-found-to-block-tumour-resistance-in-pancre...
113•axiomdata316•4h ago•52 comments

Launch HN: AgentMail (YC S25) – An API that gives agents their own email inboxes

75•Haakam21•4h ago•85 comments

OTelBench: AI struggles with simple SRE tasks (Opus 4.5 scores only 29%)

https://quesma.com/blog/introducing-otel-bench/
119•stared•5h ago•65 comments

Europe’s next-generation weather satellite sends back first images

https://www.esa.int/Applications/Observing_the_Earth/Meteorological_missions/meteosat_third_gener...
604•saubeidl•13h ago•84 comments

C++ Modules Are Here to Stay

https://faresbakhit.github.io/e/cpp-modules/
52•faresahmed•5d ago•41 comments

EmulatorJS

https://github.com/EmulatorJS/EmulatorJS
67•avaer•6d ago•10 comments

County pays $600k to pentesters it arrested for assessing courthouse security

https://arstechnica.com/security/2026/01/county-pays-600000-to-pentesters-it-arrested-for-assessi...
95•MBCook•1h ago•31 comments

US cybersecurity chief leaked sensitive government files to ChatGPT: Report

https://www.dexerto.com/entertainment/us-cybersecurity-chief-leaked-sensitive-government-files-to...
319•randycupertino•4h ago•167 comments

Reflex (YC W23) Senior Software Engineer Infra

https://www.ycombinator.com/companies/reflex/jobs/Jcwrz7A-lead-software-engineer-infra
1•apetuskey•3h ago

Flameshot

https://github.com/flameshot-org/flameshot
8•OsrsNeedsf2P•1h ago•2 comments

Apple to soon take up to 30% cut from all Patreon creators in iOS app

https://www.macrumors.com/2026/01/28/patreon-apple-tax/
970•pier25•23h ago•797 comments

Usenet personality

https://en.wikipedia.org/wiki/Usenet_personality
31•mellosouls•3d ago•14 comments

Why "The AI Hallucinated" is the perfect legal defense

https://niyikiza.com/posts/hallucination-defense/
5•niyikiza•57m ago•5 comments

Run Clawdbot/Moltbot on Cloudflare with Moltworker

https://blog.cloudflare.com/moltworker-self-hosted-ai-agent/
91•ghostwriternr•5h ago•37 comments

Networks Hold the Key to a Decades-Old Problem About Waves

https://www.quantamagazine.org/networks-hold-the-key-to-a-decades-old-problem-about-waves-20260128/
6•makira•1h ago•0 comments

Box64 Expands into RISC-V and LoongArch territory

https://boilingsteam.com/box64-expands-into-risc-v-and-loong-arch-territory/
9•ekianjo•4d ago•1 comments

Heating homes with the largest particle accelerator

https://home.cern/news/news/cern/heating-homes-worlds-largest-particle-accelerator
47•elashri•4h ago•15 comments

MakuluLinux (6.4M Downloads) Ships Persistent Backdoor from Developer's Own C2

https://werai.ca/security-disclosure.html
27•werai•2h ago•11 comments

Making niche solutions is the point

https://ntietz.com/blog/making-niche-solutions-is-the-point/
68•evakhoury•2d ago•24 comments

How to Choose Colors for Your CLI Applications (2023)

https://blog.xoria.org/terminal-colors/
122•kruuuder•5h ago•74 comments

Computing Sharding with Einsum

https://blog.ezyang.com/2026/01/computing-sharding-with-einsum/
19•matt_d•4d ago•0 comments

Playing Board Games with Deep Convolutional Neural Network on 8bit Motorola 6809

https://ipsj.ixsq.nii.ac.jp/records/229345
32•mci•6h ago•8 comments

Waymo robotaxi hits a child near an elementary school in Santa Monica

https://techcrunch.com/2026/01/29/waymo-robotaxi-hits-a-child-near-an-elementary-school-in-santa-...
203•voxadam•6h ago•351 comments

We can’t send mail farther than 500 miles (2002)

https://web.mit.edu/jemorris/humor/500-miles
618•giancarlostoro•16h ago•104 comments

Break Me If You Can: Exploiting PKO and Relay Attacks in 3DES/AES NFC

https://www.breakmeifyoucan.com/
37•noproto•6h ago•29 comments

The Sovereign Tech Fund Invests in Scala

https://www.scala-lang.org/blog/2026/01/27/sta-invests-in-scala.html
85•bishabosha•7h ago•61 comments