Building an AI agent inside a 7-year-old Rails monolith

https://catalinionescu.dev/ai-agent/building-ai-agent-part-1/

109•cionescu1•1mo ago

Comments

pell•1mo ago

Was there any concern about giving the LLM access to this return data? Reading your article I wondered if there could be an approach that limits the LLM to running the function calls without ever seeing the output itself fully, e.g., only seeing the start of a JSON string with a status like “success” or “not found”. But I guess it would be complicated to have a continuous conversation that way.

aidos•1mo ago

> No model should ever know Jon Snow’s phone number from a SaaS service, but this approach allows this sort of retrieval.

This reads to me like they think that the response from the tool doesn’t go back to the LLM.

I’ve not worked with tools but my understanding is that they’re a way to allow the LLM to request additional data from the client. Once the client executes the requested function, that response data then goes to the LLM to be further processed into a final response.

timrogers•1mo ago

That would be the normal pattern. But you could certainly stop after the LLM picks the tool and provides the arguments, and not present the result back to the model.

simonw•1mo ago

I was confused by that too. I think I've figured it out.

They're saying that a public LLM won't know the email address of Jon Snow, but they still want to be able to answer questions about their private SaaS data which DOES know that.

Then they describe building a typical tool-based LLM system where the model can run searches against private data and round-trip the results through the model to generate chat responses.

They're relying on the AI labs to keep their promises about not training in data from paying API customers. I think that's a safe bet, personally.

aidos•1mo ago

Makes sense. I agree that it’s probably a safe bet too. Not sure how customers would feel about it though.

It’s also funny how these tools push people into patterns by accident. You’d never consider sending a customer’s details to a 3rd party for them just to send them back, right? And there’s nothing stopping someone from just working more directly with the tool call response themselves but the libraries are setup so you lean into the LLM more than is required (I know you more than anyone appreciate that the value they add here is parsing the fuzzy instruction into a tool call - not the call itself).

simonw•1mo ago

> You’d never consider sending a customer’s details to a 3rd party for them just to send them back, right?

I use hosted database providers and APIs like S3 all the time.

Sending customer details to a third party is fine if you trust them and have a financial relationship with them backed by legal agreements.

sidd22•1mo ago

Hey, interesting read. I am working on product in Agent <> Tool layer. Would you be open for a quick chat ?

tovej•1mo ago

If all this does is give you the data from a contact API, why not just let the users directly interact with the API? The LLM is just extra bloat in this case.

Surely a fuzzy search by name or some other field is a much better UI for this.

bitmasher9•1mo ago

By interact directly with the API, are you having the user to make curl calls to your backend?

We build front ends for the API to make our applications easier to use. This is just another type of front end.

tovej•1mo ago

No, obviously not. I mean having a regular web frontend with a fuzzy search form.

simonw•1mo ago

That's effectively what they built. The LLM is an implementation detail for how they got a version of fuzzy search to work.

tovej•1mo ago

Did you read my post? The AI is just expensive extra component that complicates the flow. Why would I want a chat interface for something that should give me a structured response in a clean table UI with customizable columns.

midnightclubbed•1mo ago

What does the end user do with the AI chat? It sounds like they can just use it to do searches of client information… which the existing site would already do.

venturecruelty•1mo ago

But not without using a thousand gallons of water and propping up Nvidia shares.

sgt•1mo ago

And there's still a lot of water left. We're just getting started, boys!

Tiberium•1mo ago

I get that the water stereotype is funny, but it gets tiring after a while (because it's not actually true).

only-one1701•1mo ago

For what it’s worth: yes, it’s not technically true, but the reason it’s sticking around is because it conveys a deeply felt (and actually true) sentiment that many many people have: the output of generative AI isn’t worth the input.

hombre_fatal•1mo ago

Well, it more demonstrates that people will quickly latch on to convenient lies that support what they want to be true, yet impede real discussion of the trade offs if they can’t even get the basic facts right.

It’s not a good thing.

only-one1701•1mo ago

I'm not saying it's "good", I'm just saying that it's worth a qualitative consideration of what it _means_ that this incorrect statement is so persistent beyond "not true, STFU"

simonw•1mo ago

Urgh, I know that it's a solid explanation but I hate the "it may not be true but it captures a truth that people feel" argument so much!

See also "instagram is spying on you through your microphone". It's not, but I've seen people argue that it's OK for people to believe that because it supports their general (accurate) sentiment that targeted ads are creepy.

cyphar•1mo ago

> See also "instagram is spying on you through your microphone". It's not, but I've seen people argue that it's OK for people to believe that because it supports their general (accurate) sentiment that targeted ads are creepy.

I used to be sceptical of this claim but I have found it increasingly difficult to be sceptical after we found out last year that Facebook was exploiting flaws in Android in order to track your browsing history (bypassing the permissions and privilege separation model of Android)[1].

Given they have shown a proclivity to use device exploits to improve their tracking of users, is it really that unbelievable that they would try to figure out a way to use audio data? Does stock Android even show you when an app is using its microphone permission? (GrapheneOS does.) Is it really that unbelievable that they would try to do this if they could?

[1]: https://localmess.github.io/

simonw•1mo ago

If they are using the microphone to target ads, show me the sales pitch that their ad sales people use to get customers to pay more for the benefits of that targeting.

(I have a ton more arguments if that's not convinced enough for you, I collect them here: https://simonwillison.net/tags/microphone-ads-conspiracy/ )

caminanteblanco•1mo ago

I have already experienced the benefits of sending this to several family members, and I'm thankful for the hard work you put into laying everything out so clearly

cyphar•1mo ago

I get your point, but can you point to a sales pitch which included "exploit security flaws in Android to improve tracking"? Probably not, but we know for a fact they did that.

Also, your own blog lists an leak from 2024 about a Facebook partner bragging about this ability[1]. You don't find the claim credible (and you might be right about that, I haven't looked into it), but I find it strange that you are asking for an example that your own website provides?

[1]: https://futurism.com/the-byte/facebook-partner-phones-listen...

simonw•1mo ago

That claim is SO not credible that I think serious outlets that report on it non-critically lose credibility by doing so.

Seriously: the entire idea there is that there was a vast global conspiracy to secretly spy on people to target ads which was blown wide open by THIS deck: https://www.documentcloud.org/documents/25051283-cmg-pitch-d...

aosisbsn•1mo ago

Any amount of water is not really worth it. Building some roads or trains would be 1000x more valuable to 99% of Americans.

array_key_first•1mo ago

AI most definitely uses more water than a traditional full text search because it is much more computationally expensive.

The water figures are very overestimated, but the principle is true: using a super computer to do simple things uses more electricity, compute and therefore water than doing it in a traditional way.

simsla•1mo ago

Yes, and a YouTube video more than a text article. etc. etc.

It's a tool. The main question should be: is it useful? In the case of AI, sometimes yes, sometimes no.

array_key_first•1mo ago

I mean, think of it this way. If I built a web app that took HTTP requests and converted them into a YouTube video, then downloaded and decoded that video in software, and then served the request, you'd say "that's stupid - you're using 10,000x more compute than you need to".

It's a tool, and using the wrong tool for the wrong job is just wasteful. And, usually, overly complicated and frail. So it's only losses.

galdauts•1mo ago

You see, now they can market the tool as AI-powered! I‘m sure the sales department is overjoyed.

Lio•1mo ago

It's interesting the use of RubyLLM here. I'm trying to contrast that with my own use of DSPy.rb, which so far I've been quite happy with for small experiments.

Does anyone have a comparison of the two, or any other libraries?

vicentereig•1mo ago

Maintainer of DSPy.rb here. The key difference is the level of abstraction:

RubyLLM gives you a clean API for LLM calls and tool definitions. You're still writing prompts and managing conversations directly.

DSPy.rb treats prompts as functions with typed signatures. You define inputs/outputs and the framework handles prompt construction, JSON parsing, and structured extraction. Two articles that might help:

1. "Building Your First ReAct Agent" shows how to build tool-using agents with type-safe tool definitions [0].

2. "Building Chat Agents with Ephemeral Memory" demonstrates context engineering patterns (what the LLM sees vs. what you store), cost-based routing between models, and memory management [1].

The article's approach (RubyLLM + single tool) works great for simple cases. DSPy.rb shines when you need to decompose into multiple specialized modules with different concerns. Some examples: separate signatures for classification vs. response generation, each optimized independently with separate context windows and memory to maintain.

Would love to learn how dspy.rb is working for you!

Note that RubyLLM and DSPy.rb aren't mutually exclusive (`gem 'dspy-ruby_llm'`) adapter gives us access to a TON of providers.

[0] https://oss.vicente.services/dspy.rb/blog/articles/react-age... [1] https://oss.vicente.services/dspy.rb/blog/articles/ephemeral...

mark_l_watson•1mo ago

A lot of good info, thanks. I have just lightly experimented with Python DSPy and I will probably give your DSPy.rb gem a try, or at least read your code.

vicentereig•1mo ago

I appreciate your time checking it out! I've used and keep using DSPy a lot for work, and I felt I was missing a limb in my Rails-related projects. Let me know if you have any thoughts or feedback, every person has a different perspective and I always learn something new.

rahimnathwani•1mo ago

The article is dated December 2025, but:

  I checked a few OpenAI models for this implementation: gpt-5, gpt-4o, gpt4.

Seems like a weird list. None of these are current generation models and none are on the Pareto frontier.

kubb•1mo ago

But they precede the knowledge cutoff.

rahimnathwani•1mo ago

You mean you suspect the article itself was written by AI?

kubb•1mo ago

I mean we have evidence for that.

simonw•1mo ago

I was surprised they settled on GPT-4o for performance reasons. I'd expect GPT-5-mini to be as fast and better.

vicentereig•1mo ago

Thanks for sharing your experience! I know there's many of us out there dabbling with LLMs and some solid businesess built on Ruby, lurking in the background without publishing much.

Your single-tool approach is a solid starting point. As it grows, you might hit context window limits and find the prompt getting unwieldy. Things like why is this prompt choking on 1.5MB of JSON coming from this other API/Tool?

When you look at systems like Codex CLI, they run at least four separate LLM subsystems: (1) the main agent prompt, (2) a summarizer model that watches the reasoning trace and produces user-facing updates like "Searching for test files...", (3) compaction and (4) a reviewer agent. Each one only sees the context it needs. Like a function with their inputs and outputs. Total tokens stay similar, but signal density per prompt goes up.

DSPy.rb[0] enables this pattern in Ruby: define typed Signatures for each concern, compose them as Modules/Prompting Techniques (simple predictor, CoT, ReAct, CodeAct, your own, ...), and let each maintain its own memory scope. Three articles that show this:

- "Ephemeral Memory Chat"[1] — the Two-Struct pattern (rich storage vs. lean prompt context) plus cost-based routing between cheap and expensive models.

- "Evaluator Loops"[2] — decompose generation from evaluation: a cheap model drafts, a smarter model critiques, each with its own focused signature.

- "Workflow Router"[3] — route requests to the right model based on complexity, only escalate to expensive LLMs when needed.

And since you're already using RubyLLM, the dspy-ruby_llm adapter lets you keep your provider setup while gaining the decomposition benefits.

Thanks for coming to my TED talk. Let me know if you need someone to bounce ideas off.

[0] https://github.com/vicentereig/dspy.rb

[1] https://oss.vicente.services/dspy.rb/blog/articles/ephemeral...

[2] https://oss.vicente.services/dspy.rb/blog/articles/evaluator...

[3] https://oss.vicente.services/dspy.rb/blog/articles/workflow-...

(edit: minor formatting)

shevy-java•1mo ago

"I was at SF Ruby, in San Francisco, a few weeks ago. Most of the tracks were, of course, heavily focused on AI"

It may be the current "Zeitgeist", but I find the addiction to AI annoying. I am not denying that there are use cases to be had that can be net-positive, but there are also numerous bad examples of AI use. And these, IMO, are more prevalent than the positive ones overall.

nbaugh1•1mo ago

And yet, you clicked the link

nateb2022•1mo ago

> And these, IMO, are more prevalent than the positive ones overall.

If a problem is this widespread, a conference is arguably the best place to address it.

> but there are also numerous bad examples of AI use

which should be discussed publicly. I think we all have a lot to learn from each others' successes and failures, which is where coming together at a conference can really help.

mark_l_watson•1mo ago

I really enjoyed reading the code listings in the article. Many years ago I was a Ruby fanatic, even wrote a book on Ruby, but for work requirements I was pulled to Java and Python (and occasionally Clojure and Common Lisp).

I liked how well designed the monolith application seems to be from the brief description in the article.

Coincidentally I installed Ruby, first time in years, last week and spent a half hour experimenting the same nicely designed RubyLLM gem used in the article. While slop code can be written in any language, it seems like in general many Ruby devs have excellent style. Clojure is another language where I have noticed a preponderance for great style.

As long as I am rambling, one more thing, a plug for monolith applications: I used to get a lot of pleasure from working as a single dev on monoliths in Java and Ruby, eschewing micro-services, really great to share data and code in one huge usually multithreaded process.

zhisme•1mo ago

Could you share the book name? Sounds interesting!

Kerrick•1mo ago

Based on their username, I would guess it's Ruby Quickly: Ruby and Rails for the Real World (ISBN 978-1932394740, Manning, 2006).

Herring•1mo ago

This resembles the "Natural Language to SQL" trend of the early 2010s, which largely failed because business users required 100% accuracy, and the "translation" layer was too brittle.

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

US moves to deport 5-year-old detained in Minnesota

If you lose your passport in Austria, head for McDonald's Golden Arches

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

RFCs vs. READMEs: The Evolution of Protocols

Kanchipuram Saris and Thinking Machines

Chinese chemical supplier causes global baby formula recall

I've used AI to write 100% of my code for a year as an engineer

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

AI-native capabilities, a new API Catalog, and updated plans and pricing

What changed in tech from 2010 to 2020?

From Human Ergonomics to Agent Ergonomics

Advanced Inertial Reference Sphere

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

Show HN: A longitudinal health record built from fragmented medical data

CoreWeave's $30B Bet on GPU Market Infrastructure

Creating and Hosting a Static Website on Cloudflare for Free

"The Stanford scam proves America is becoming a nation of grifters"

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

X (Twitter) is back with a new X API Pay-Per-Use model

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

When Michelangelo Met Titian

Solving NYT Pips with DLX

Baldur's Gate to be turned into TV series – without the game's developers

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

Disablling Go Telemetry

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

US moves to deport 5-year-old detained in Minnesota

If you lose your passport in Austria, head for McDonald's Golden Arches

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

RFCs vs. READMEs: The Evolution of Protocols

Kanchipuram Saris and Thinking Machines

Chinese chemical supplier causes global baby formula recall

I've used AI to write 100% of my code for a year as an engineer

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

AI-native capabilities, a new API Catalog, and updated plans and pricing

What changed in tech from 2010 to 2020?

From Human Ergonomics to Agent Ergonomics

Advanced Inertial Reference Sphere

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

Show HN: A longitudinal health record built from fragmented medical data

CoreWeave's $30B Bet on GPU Market Infrastructure

Creating and Hosting a Static Website on Cloudflare for Free

"The Stanford scam proves America is becoming a nation of grifters"

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

X (Twitter) is back with a new X API Pay-Per-Use model

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

When Michelangelo Met Titian

Solving NYT Pips with DLX

Baldur's Gate to be turned into TV series – without the game's developers

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

Disablling Go Telemetry

Building an AI agent inside a 7-year-old Rails monolith

Comments