On the one hand, very encouraging to see plain old deterministic infra w/o using slop machines.
On the other hand, this is a recognition that LLMs are just additional friction in the system that we would better off without in the first place!
Because of this there are two failure points with any decision making made by LLM models, spacial and temporal reasoning -- if you call it that, reasoning. It can't predict the consequence or rather the next token with any spacial or temporal problem.
LLM models will lie and cheat. They can't be trusted!
The article didn't give much information about how Kepler achieved this deterministic cheap (as in orders of magnitude less expensive than trying to get an LLM model to verify if they could) verification system. I built a very good solution that attempts to deterministically verify on unknown systems. [1] If you are working on a problem simalar to what Kepler did you likely can gain a lot from the learnings. It forces by construction never to allow future data into a system run by a wall clock. One step is to force the an adversary agent to step through the code line by line and create a rigorous proof that there are no temporal bugs.
Nevertheless, I 100% assure you that any LLM model will find a way cheat. It will lie (strong words are needed to describe these type of bug classes) about a timezone conversion like New York City daylight savings time and a massive amount of data will be looking into the future off by 1 hour.
(How much would you be worth if you were paid a nickle every time you had to fix a timezone bug?)
I'm going to hold Kepler's feet to flames here.
The only question I have for the folks at Kepler is did you account for that bug? If you can't answer that question, I guarantee you 100% that bug exists in your data and some row of temporal data of a report published time and date will be off by one hour and anyone using your data will have failed backtests and never know it.
eddiehammond•1h ago
bjelkeman-again•38m ago
saadatq•25m ago