Because you’re right – they are superb manipulators. They are helpful, they gain your trust, and they have infinite patience. They can easily be tuned to manipulate your opinions about commercial products or political topics. Those things have already happened with much more rudimentary tech, in fact so much that they grew to be the richest companies in the world. With AI and LLMs specifically, the ability is tuned up rapidly, by orders of magnitude compared to the previous generation recommendation systems and engagement algorithms.
That gives you very strong means, motive and opportunity for the AI overlords.
It is sort of trivial to build it. Its just User + System Prompt + Assistant +Tools in a loop with some memory management.. The loop code can be as complex as I want it to be e.g. I could snapshot the state and restart later.
I used this approach to build a coding system (what else ?) and it works just as well as cursor or Claude Code for me. t=The advantage is I am able to switch between Deepseek or Flash depending on the complexity of the code and its not a black box.
I developed the whole system in Clojure.. and dogfooded it as well.
Not necessarily. You can have non-reasoning agents (pretty common actually) too.
nilirl•5h ago
I'm confused.
A workflow has hardcoded branching paths; explicit if conditions and instructions on how to behave if true.
So for an agent, instead of specifying explicit if conditions, you specify outcomes and you leave the LLM to figure out what if conditions apply and how to deal with them?
In the case of this resume screening application, would I just provide the ability to make API calls and then add this to the prompt: "Decide what a good fit would be."?
Are there any serious applications built this way? Or am I missing something?
manojlds•4h ago
Recent article from Anthropic - https://www.anthropic.com/engineering/built-multi-agent-rese...
alganet•4h ago
Can you give us an example of a company not involved in AI research that does it?
nilirl•2h ago
From what I gather, you can build an agent for a task as long as:
- you trust the decision making of an LLM for the required type of decision to be made; so decisions framed as some kind of evaluation of text feels right.
- and if the penalty for being wrong is acceptable.
Just to go back to the resume screening application, you'd build an agent if:
- you asked the LLM to make an evaluation based on the text content of the resume, any conversation with the applicant, and the declared job requirement.
- you had a high enough volume of resumes where false negatives won't be too painful.
It seems like framing problems as search problems helps model these systems effectively. They're not yet capable of design, i.e, be responsible for coming up with the job requirement itself.
mickeyp•4h ago
That is very much true of the systems most of us have built.
But you do not have to do this with an LLM; in fact, the LLM may decide it will not follow your explicit conditions and instructions regardless of how hard you you try.
That is why LLMs are used to review the output of LLMs to ensure they follow the core goals you originally gave them.
For example, you might ask an LLM to lay out how to cook a dish. Then use a second LLM to review if the first LLM followed the goals.
This is one of the things tools like DSPy try to do: you remove the prompt and instead predicate things with high-level concepts like "input" and "output" and then reward/scoring functions (which might be a mix of LLM and human-coded functions) that assess if the output is correct given that input.
rybosome•2h ago
Let me reword your phrasing slightly to make an illustrative point:
> so for an employee, instead of specifying explicit if conditions, you specify outcomes and you leave the human to figure out what if conditions apply and how to deal with them?
> Are there any serious applications built this way?
We have managed to build robust, reliable systems on top of fallible, mistake-riddled, hallucinating, fabricating, egotistical, hormonal humans. Surely we can handle a little non-determinism in our computer programs? :)
In all seriousness, having spent the last few years employed in this world, I feel that LLM non-determinism is an engineering problem just like the non-determinism of making an HTTP request. It’s not one we have prior art on dealing with in this field admittedly, but that’s what is so exciting about it.
nilirl•1h ago
It's not the non-determinism that was bothering me, it was the decision making capability. I didn't understand what kinds of decisions I can rely on an LLM to make.
For example, with the resume screening application from the post, where would I draw the line between the agent and the human?
- If I gave the AI agent access to HR data and employee communications, would it be able decide when to create a job description?
- And design the job description itself?
- And email an opening round of questions for the candidate to get a better sense of the candidates who apply?
Do I treat an AI agent just like I would a human new to the job? Keep working on it until I can trust it to make domain-specific decisions?
diggan•1h ago
If you can encode how you/your company does that decision as a human with text, I don't see why not. But personally there is a lot of subjectivity (for better or worse) in hiring processes, I'm not sure I'd want a probabilistic role engine to make those sort of calls.
My current system prompt for coding with LLMs basically look like I've written down what my own personal rules for programming is. And anytime I got some results I didn't like, I wrote down why I didn't like it, and codified it in my reusable system prompt, then it doesn't make those (imo) mistakes anymore.
I don't think I could realistically get an LLM to do something I don't understand the process of myself, and once you grok the process, you can understand if using an LLM here makes sense or not.
> Do I treat an AI agent just like I would a human new to the job?
No, you treat it as something much dumber. You can generally rely on some sort of "common sense" in a human that they built up during their time on this planet. But you cannot do that with LLMs, as while they're super-human in some ways, are still way "dumber" in other ways.
For example, a human new to a job would pick up things autonomously, while an LLM does not. You need to pay attention to what you need to "teach" the LLM by changing what Karpathy calls the "programming" of the LLM, which would be the prompts. Anything you miss to tell it, the LLM will do whatever with, and it only follows exactly what you say. A human you can usually tell "don't do that in the future" and they'll avoid that in the right context. A LLM you can scream at for 10 hours how they're doing something wrong, but unless you update the programming, they'll continue to make that mistake forever, and if you correct it but reuse it in other contexts, the LLM won't suddenly understand that it doesn't make sense in the context.
Just an example, I wanted to have some quick and dirty throw away code for generating a graph, and in my prompt I mixed X and Y axis, and of course got a function that didn't work as expected. If this was a human doing it, it would have been quite obvious I didn't want time on the Y axis and value on the X axis, because the graph wouldn't make any sense, but the LLM happily complied.
nilirl•39m ago
Is the main benefit that we can do all of this in natural language?
Kapura•1h ago
nilirl•28m ago
I think the appeal is code that handles changes in the world without having to change itself.
spacecadet•1h ago
I have several agent side projects going, the most complex and open ended is an agent that performs periodic network traffic analysis. I use an orchestration library with a "group chat" style orchestration. I declare several agents that have instructions and access to tools.
These range from termshark scripts for collecting packets and analysis functions I had previously for performing analysis on the traffic myself.
I can then say something like, "Is there any suspicious activity?" and the agents collaboratively choose who(which agent) performs their role and therefore their tasks (i.e. Tools) and work together to collect data, analyze the data, and return a response.
I also run this on a schedule where the agents know about the schedule and choose to send me an email summary at specific times.
I have noticed that the models/agents are very good at picking the "correct" network interface without much input. That they understand their roles and objectives and execute accordingly, again without much direction from me.
Now the big/serious question. Is the output even good or useful. Right now with my toy project it is OK. Sometimes it's great and sometimes it's not, sometimes they spam my inbox with micro updates.
Im bad at sharing projects, but if you are curious, https://github.com/derekburgess/jaws
dist-epoch•1h ago
An agent is like Claude Code, where you say to it "fix this bug", and it will choose a sequence of various actions - change code, run tests, run linter, change code again, do a git commit, ask user for clarification, change code again.