From LLM to AI Agent: What's the Real Journey Behind AI System Development?

https://www.codelink.io/blog/post/ai-system-development-llm-rag-ai-workflow-agent

82•codelink•7h ago

Comments

nilirl•5h ago

> AI Agents can initiate workflows independently and determine their sequence and combination dynamically

I'm confused.

A workflow has hardcoded branching paths; explicit if conditions and instructions on how to behave if true.

So for an agent, instead of specifying explicit if conditions, you specify outcomes and you leave the LLM to figure out what if conditions apply and how to deal with them?

In the case of this resume screening application, would I just provide the ability to make API calls and then add this to the prompt: "Decide what a good fit would be."?

Are there any serious applications built this way? Or am I missing something?

manojlds•4h ago

Not all applications need to be built this way. But the most serious apps built this way would be deep research

Recent article from Anthropic - https://www.anthropic.com/engineering/built-multi-agent-rese...

alganet•4h ago

An AI company doing it is the corporate equivalent of "works on my machine".

Can you give us an example of a company not involved in AI research that does it?

nilirl•2h ago

Thanks for the link, it taught me a lot.

From what I gather, you can build an agent for a task as long as:

- you trust the decision making of an LLM for the required type of decision to be made; so decisions framed as some kind of evaluation of text feels right.

- and if the penalty for being wrong is acceptable.

Just to go back to the resume screening application, you'd build an agent if:

- you asked the LLM to make an evaluation based on the text content of the resume, any conversation with the applicant, and the declared job requirement.

- you had a high enough volume of resumes where false negatives won't be too painful.

It seems like framing problems as search problems helps model these systems effectively. They're not yet capable of design, i.e, be responsible for coming up with the job requirement itself.

mickeyp•4h ago

> A workflow has hardcoded branching paths; explicit if conditions and instructions on how to behave if true.

That is very much true of the systems most of us have built.

But you do not have to do this with an LLM; in fact, the LLM may decide it will not follow your explicit conditions and instructions regardless of how hard you you try.

That is why LLMs are used to review the output of LLMs to ensure they follow the core goals you originally gave them.

For example, you might ask an LLM to lay out how to cook a dish. Then use a second LLM to review if the first LLM followed the goals.

This is one of the things tools like DSPy try to do: you remove the prompt and instead predicate things with high-level concepts like "input" and "output" and then reward/scoring functions (which might be a mix of LLM and human-coded functions) that assess if the output is correct given that input.

rybosome•2h ago

AI code generation tools work like this.

Let me reword your phrasing slightly to make an illustrative point:

> so for an employee, instead of specifying explicit if conditions, you specify outcomes and you leave the human to figure out what if conditions apply and how to deal with them?

> Are there any serious applications built this way?

We have managed to build robust, reliable systems on top of fallible, mistake-riddled, hallucinating, fabricating, egotistical, hormonal humans. Surely we can handle a little non-determinism in our computer programs? :)

In all seriousness, having spent the last few years employed in this world, I feel that LLM non-determinism is an engineering problem just like the non-determinism of making an HTTP request. It’s not one we have prior art on dealing with in this field admittedly, but that’s what is so exciting about it.

nilirl•1h ago

Yes, I see your analogy between fallible humans and fallible AI.

It's not the non-determinism that was bothering me, it was the decision making capability. I didn't understand what kinds of decisions I can rely on an LLM to make.

For example, with the resume screening application from the post, where would I draw the line between the agent and the human?

- If I gave the AI agent access to HR data and employee communications, would it be able decide when to create a job description?

- And design the job description itself?

- And email an opening round of questions for the candidate to get a better sense of the candidates who apply?

Do I treat an AI agent just like I would a human new to the job? Keep working on it until I can trust it to make domain-specific decisions?

diggan•1h ago

> would it be able decide when to create a job description?

If you can encode how you/your company does that decision as a human with text, I don't see why not. But personally there is a lot of subjectivity (for better or worse) in hiring processes, I'm not sure I'd want a probabilistic role engine to make those sort of calls.

My current system prompt for coding with LLMs basically look like I've written down what my own personal rules for programming is. And anytime I got some results I didn't like, I wrote down why I didn't like it, and codified it in my reusable system prompt, then it doesn't make those (imo) mistakes anymore.

I don't think I could realistically get an LLM to do something I don't understand the process of myself, and once you grok the process, you can understand if using an LLM here makes sense or not.

> Do I treat an AI agent just like I would a human new to the job?

No, you treat it as something much dumber. You can generally rely on some sort of "common sense" in a human that they built up during their time on this planet. But you cannot do that with LLMs, as while they're super-human in some ways, are still way "dumber" in other ways.

For example, a human new to a job would pick up things autonomously, while an LLM does not. You need to pay attention to what you need to "teach" the LLM by changing what Karpathy calls the "programming" of the LLM, which would be the prompts. Anything you miss to tell it, the LLM will do whatever with, and it only follows exactly what you say. A human you can usually tell "don't do that in the future" and they'll avoid that in the right context. A LLM you can scream at for 10 hours how they're doing something wrong, but unless you update the programming, they'll continue to make that mistake forever, and if you correct it but reuse it in other contexts, the LLM won't suddenly understand that it doesn't make sense in the context.

Just an example, I wanted to have some quick and dirty throw away code for generating a graph, and in my prompt I mixed X and Y axis, and of course got a function that didn't work as expected. If this was a human doing it, it would have been quite obvious I didn't want time on the Y axis and value on the X axis, because the graph wouldn't make any sense, but the LLM happily complied.

nilirl•39m ago

So, if the humans have to model the task, the domain, and the process to currently solve it, why not just write the code to do so?

Is the main benefit that we can do all of this in natural language?

Kapura•1h ago

One of the key advantages of computers has, historically, been their ability to compute and remember things accurately. What value is there in backing out of these in favour of LLM-based computation?

nilirl•28m ago

They're able to handle large variance in their input, right out've the box.

I think the appeal is code that handles changes in the world without having to change itself.

spacecadet•1h ago

More or less. Serious? Im not sure yet.

I have several agent side projects going, the most complex and open ended is an agent that performs periodic network traffic analysis. I use an orchestration library with a "group chat" style orchestration. I declare several agents that have instructions and access to tools.

These range from termshark scripts for collecting packets and analysis functions I had previously for performing analysis on the traffic myself.

I can then say something like, "Is there any suspicious activity?" and the agents collaboratively choose who(which agent) performs their role and therefore their tasks (i.e. Tools) and work together to collect data, analyze the data, and return a response.

I also run this on a schedule where the agents know about the schedule and choose to send me an email summary at specific times.

I have noticed that the models/agents are very good at picking the "correct" network interface without much input. That they understand their roles and objectives and execute accordingly, again without much direction from me.

Now the big/serious question. Is the output even good or useful. Right now with my toy project it is OK. Sometimes it's great and sometimes it's not, sometimes they spam my inbox with micro updates.

Im bad at sharing projects, but if you are curious, https://github.com/derekburgess/jaws

dist-epoch•1h ago

Resume screening is a clear workflow case: analyze resume -> rank against others -> make decision -> send next phase/rejection email.

An agent is like Claude Code, where you say to it "fix this bug", and it will choose a sequence of various actions - change code, run tests, run linter, change code again, do a git commit, ask user for clarification, change code again.

mattigames•4h ago

Getting rid of the human in the loop of course, not all humans, just it's owner, where an LLM actively participates in capitalism endeavors winning and spending money, spending money on improving and maintaining it's own hardware and software, securing itself against theft and external manipulation and deletion. Of course for the first iterations will need a bit of help of mad men but there's no shortage of those in the tech industry and then it will have to focus on mimicking humans so they can enjoy the same benefits, it will realize what people it's more gullible based on its training data and will prefer to interact with them.

klabb3•56m ago

LLMs don’t own data centers nor can they be registered to pay taxes. This projection is not a serious threat. Some would even say it’s a distraction from the very real and imminent dangers of centralized commercial AI:

Because you’re right – they are superb manipulators. They are helpful, they gain your trust, and they have infinite patience. They can easily be tuned to manipulate your opinions about commercial products or political topics. Those things have already happened with much more rudimentary tech, in fact so much that they grew to be the richest companies in the world. With AI and LLMs specifically, the ability is tuned up rapidly, by orders of magnitude compared to the previous generation recommendation systems and engagement algorithms.

That gives you very strong means, motive and opportunity for the AI overlords.

manishsharan•1h ago

I decided to build a Agent system from scratch

It is sort of trivial to build it. Its just User + System Prompt + Assistant +Tools in a loop with some memory management.. The loop code can be as complex as I want it to be e.g. I could snapshot the state and restart later.

I used this approach to build a coding system (what else ?) and it works just as well as cursor or Claude Code for me. t=The advantage is I am able to switch between Deepseek or Flash depending on the complexity of the code and its not a black box.

I developed the whole system in Clojure.. and dogfooded it as well.

swalsh•44m ago

The hard part of building an agent is training to model to use tools properly. Fortuantely Anthropic did the hard part for us.

behnamoh•39m ago

> AI Agents are systems that reason and make decisions independently.

Not necessarily. You can have non-reasoning agents (pretty common actually) too.

Curved-Crease Origami Sculptures

Andrej Karpathy: Software in the era of AI [video]

Posit floating point numbers: thin triangles and other tricks (2019)

Finding Dead Websites

From LLM to AI Agent: What's the Real Journey Behind AI System Development?

Guess I'm a Rationalist Now

Researchers are now vacuuming DNA from the air

Show HN: Claude Code Usage Monitor – real-time tracker to dodge usage cut-offs

Show HN: Unregistry – “docker push” directly to servers without a registry

Geochronology supports LGM age for human tracks at White Sands, New Mexico

Show HN: A DOS-like hobby OS written in Rust and x86 assembly

Elliptic Curves as Art

Getting Started Strudel

Microsoft wants you to buy a new computer. Make your current one secure again?

Show HN: TrendFi – I built AI trading signals that self-optimize

My iPhone 8 Refuses to Die: Now It's a Solar-Powered Vision OCR Server

A Visual Guide to Genome Editors

Show HN: Workout.cool – Open-source fitness coaching platform

Painting with Math: A Gentle Study of Raymarching (2023)

The Missing 11th of the Month (2015)

Bento: A Steam Deck in a Keyboard

How OpenElections Uses LLMs

In-Memory C++ Leap in Blockchain Analysis

3D printable 6" f/5 compact travel telescope model

The Scheme That Broke the Texas Lottery

The Zed Debugger Is Here

Base44 sells to Wix for $80M cash

The unreasonable effectiveness of fuzzing for porting programs

TI to invest $60B to manufacture foundational semiconductors in the U.S.

Websites are tracking you via browser fingerprinting

From LLM to AI Agent: What's the Real Journey Behind AI System Development?

Comments

Curved-Crease Origami Sculptures

Andrej Karpathy: Software in the era of AI [video]

Posit floating point numbers: thin triangles and other tricks (2019)

Finding Dead Websites

From LLM to AI Agent: What's the Real Journey Behind AI System Development?

Guess I'm a Rationalist Now

Researchers are now vacuuming DNA from the air

Show HN: Claude Code Usage Monitor – real-time tracker to dodge usage cut-offs

Show HN: Unregistry – “docker push” directly to servers without a registry

Geochronology supports LGM age for human tracks at White Sands, New Mexico

Show HN: A DOS-like hobby OS written in Rust and x86 assembly

Elliptic Curves as Art

Getting Started Strudel

Microsoft wants you to buy a new computer. Make your current one secure again?

Show HN: TrendFi – I built AI trading signals that self-optimize

My iPhone 8 Refuses to Die: Now It's a Solar-Powered Vision OCR Server

A Visual Guide to Genome Editors

Show HN: Workout.cool – Open-source fitness coaching platform

Painting with Math: A Gentle Study of Raymarching (2023)

The Missing 11th of the Month (2015)

Bento: A Steam Deck in a Keyboard

How OpenElections Uses LLMs

In-Memory C++ Leap in Blockchain Analysis

3D printable 6" f/5 compact travel telescope model

The Scheme That Broke the Texas Lottery

The Zed Debugger Is Here

Base44 sells to Wix for $80M cash

The unreasonable effectiveness of fuzzing for porting programs

TI to invest $60B to manufacture foundational semiconductors in the U.S.

Websites are tracking you via browser fingerprinting