I’m Nazim, founders of Koinju.io and I wanted to share here an exploratory option we opened very recently: providing access to our database, which contains all cryptocurrency market data, via SQL. REST give access for direct retrieval but we're thinking more and more that SQL access for analytical work over a unified crypto market data layer could be of something because of llms.
This was partly triggered by Didier Lopes, ceo of OpenBB recent essay on financial firms owning the infrastructure where financial work happens (https://www.linkedin.com/pulse/how-did-we-end-up-here-didier-rodrigues-lopes-hgeqe/ ), especially the runtime where workflows execute and AI inference happens.
Most data APIs were designed for software that already knows what it wants. Call an endpoint, get JSON, parse it, compute somewhere else. That model worked great and still works great. But I’m not sure it maps well to llm-driven workflows, especially with big data.
A language model can call APIs /read JSON or write python to do so (claude code can force json output). But that does not mean the model is efficient in ingesting, reshaping, joining, aggregating, validating, or reasoning over large structured datasets through tokenized rows. At small scale, it fit within context limit. At large scale, it becomes complexe and small details may disappear silently, as if they were outliers...
So the thesis we are testing is: For big datasets, the AI-facing primitive should be switched from “return json” to execute a bounded, inspectable operation over the dataset”, something that you could plan, replay and even trace precisely. In that case, the llm endorse the role of a planner/controller. It should be able to inspect schemas, understand constraints, express an operation, check limits or even ASTs, run the computation through an execution layer, and then reason over a compact typed result.
So SQL is our current attempt at that layer.
This is really not new :-) not even magically “AI-native”. But it is explicit, inspectable, composable, and executable close to the data. REST still makes sense for simple retrieval. But for analytical questions over large market datasets, JSON pagination feels like the wrong unit of work.
And there is also a governance question here: In financial sector, many firms do not want their entire workflow to move into a vendor’s black-box interface. That seems right. Internal context, permissions, model policy, audit logs, and decision workflows should probably live in the firm env, of course. But that does not necessarily mean every external dataset should be copied locally before any question can be asked.
Maybe the better boundary is: -the firm owns the workflow and inference runtime -the data provider exposes a controlled execution surface, -the llm issues bounded operations, -the query engine performs the actual computation -result comes back
I’m interested in any feedback from people working on stuff like that, market data, quant research, analytics... The questions I’m trying to answer: -What is the right interface today for an llm working with bigdata? -Should the model operate on raw, JSON, schemas, SQL, typed tools, semantic layers, or something else? -Where should the boundary be between customer-owned runtime and provider-side data execution?
How should query limits, cost previews, dry runs, permissions, and audit logs work when the caller might be an agent?
I’m not looking only for validation. If the answer is “don’t invent a new AI category; just provide clean data, stable schemas, SQL, docs, and predictable limits”, that would also be useful.