How Kepler built verifiable AI for financial services with Claude

https://claude.com/blog/how-kepler-built-verifiable-ai-for-financial-services-with-claude

16•eddiehammond•1h ago

Comments

eddiehammond•1h ago

Anthropic published a profile on what we're building at Kepler. Sharing because the architectural argument (LLM for intent, deterministic code for retrieval and computation, every number traceable to source) is the part I'd actually want HN to push on. Happy to answer questions in the thread.

bjelkeman-again•38m ago

Very interesting. What size team does it take to build this, incl. analysts, project managers, product managers etc.? How long did you spend in analysis before building and the how long to first customer using it?

saadatq•25m ago

could I get a link to the Kepler finance site? googling for "Kepler financial" yields 5-6 other finserv companies

hansmayer•40m ago

> The duo’s answer was to build deterministic infrastructure that serves as a trust and verification layer for AI.

On the one hand, very encouraging to see plain old deterministic infra w/o using slop machines.

On the other hand, this is a recognition that LLMs are just additional friction in the system that we would better off without in the first place!

bjelkeman-again•36m ago

Just friction? What do you mean? What would you do instead?

hansmayer•45s ago

Well... You have a 'tool' that you cannot trust. Present everywhere due to unholly alliance between the LLM- companies and the exhilirated office worker cretins who "use" them to do "workflows". Now they fuck up stuff. Sounds like friction to me, or do you value the LLMs as net positive? WHy should I do something to fix their problems instead?

SpicyLemonZest•7m ago

You're misunderstanding something about the problem space they're describing. The deterministic infra is for an underlying "execution layer"; the LLMs are providing utility by figuring out how to express English language queries in terms of the primitives of that verifiable layer. That way, you can describe your results deterministically even though the process of arriving at them was not necessarily deterministic.

rossjudson•33m ago

From a systems engineering standpoint, the purpose of LLMs is to construct, verify, and "push down" abstractions and deterministic layers. Deterministic layers are able to cope reliably with the law of medium numbers.

dataviz1000•2m ago

"Agentic systems can't predict the consequences of its actions" -- Yann LeCun [0]

Because of this there are two failure points with any decision making made by LLM models, spacial and temporal reasoning -- if you call it that, reasoning. It can't predict the consequence or rather the next token with any spacial or temporal problem.

LLM models will lie and cheat. They can't be trusted!

The article didn't give much information about how Kepler achieved this deterministic cheap (as in orders of magnitude less expensive than trying to get an LLM model to verify if they could) verification system. I built a very good solution that attempts to deterministically verify on unknown systems. [1] If you are working on a problem simalar to what Kepler did you likely can gain a lot from the learnings. It forces by construction never to allow future data into a system run by a wall clock. One step is to force the an adversary agent to step through the code line by line and create a rigorous proof that there are no temporal bugs.

Nevertheless, I 100% assure you that any LLM model will find a way cheat. It will lie (strong words are needed to describe these type of bug classes) about a timezone conversion like New York City daylight savings time and a massive amount of data will be looking into the future off by 1 hour.

(How much would you be worth if you were paid a nickle every time you had to fix a timezone bug?)

I'm going to hold Kepler's feet to flames here.

The only question I have for the folks at Kepler is did you account for that bug? If you can't answer that question, I guarantee you 100% that bug exists in your data and some row of temporal data of a report published time and date will be off by one hour and anyone using your data will have failed backtests and never know it.

[0] https://youtu.be/kYkIdXwW2AE?si=rKCSF8NKSv2cGin_&t=2110

[1] https://github.com/adam-s/alphadidactic

QUIC packet rejection in practice – Iroh

University Professors Disturbed to Find Their Lectures Chopped Up into AI Slop

ASU Using AI Tool to Create Courses from Professors' Work Without Their

ChatGPT crashed my browser when I continued 1k+ conversations

Punk, or why I don't stream anymore

Make Your Own Microforest

Former Nintendo Executive Says Amazon Once Requested 'Illegal' Price Discounts

Simulating Cells Fighting to the Death

Flock repeatedly flags 76-year old Grandmother for arrest, erroring zero for "O"

Neanderthal DNA Implicated in Autism Susceptibility

Show HN: Decentralized compute network. CLI-first

HealthFormer: Transformers for irregular electronic health record events [pdf]

Show HN: I modeled Peru's 1993 Constitution as a Git history

The Paradox of Medical AI Implementation

People who are blind from birth never develop schizophrenia

Moving from Node to Bun spikes container CPU and memory usage until it crashes

I vibecoded a game, its making money, heres what I learned

Show HN: A public web artwork made by its visitors

Italy's 'Cheese Bank' where Parmigiano becomes financial gold

We Need to Rewild the Internet

Using IDEs as a Harness for Thinking and Writing

With developer verification, Google's Apple envy threatens open legacy

Comparing a Store-Less Password Manager with Traditional Password-Only Auth

Minimal Fab Promoting Organization

SunTrace3D – Browser-based 3D solar planning and shadow simulation

eBay soars on report that GameStop is preparing a takeover bid

Why does my harness forget me? Agent engineering

AI models that consider user's feeling are more likely to make errors

36k People Pledged $22M to ‘Buy’ Spirit Airlines - Then the Site Crashed

Show HN: Wove 2.0 "Beautiful Python async" adds inline Celery execution