Building agents without harness engineering

21•rajit•4h ago

Comments

stopachka•58m ago

Interesting idea! Question:

> It is highly unlikely that an AI agent startup becomes wealthy by creating the best harness for a particular use case.

If it's not the harness, what do you think is the thing that will differentiate AI agent startups? Is it mainly data, or something else?

rajit•34m ago

The most valuable pieces of information an AI agent startup can gather is access to their customer's proprietary data and knowledge of their customers preferences (memory + self-learning).

Even as the cost of writing code goes to zero, those two pieces of information are non-commodities.

adamtaylor_13•48m ago

I thought the entire industry is moving toward harness engineering? I read this twice and didn't fully understand what it was telling me.

rajit•37m ago

Thanks for the feedback. The main idea is that today to built a best-in-class agent, developers build the agent loop, session management, tools, memory, skills, automations (cron + trigger-based), sandboxed deployment, and self-learning.

By providing Hermes with a system prompt, custom tools, and skills, developers get the agent loop, session management, automations, sandboxed deployment, and self-learning for free.

usernametaken29•11m ago

But effectively they’re deferring harness engineering onto another developer?? I don’t understand how this is different than any other library, ever

jadar•31m ago

If you re-use the Hermes agent, what are the cost and security implications? One Docker container per-customer sounds like it would be really expensive. Are they started on-demand, or run 24/7? What keeps users from using the agents for general purpose tasks, protects against prompt-injection, etc?

rajit•19m ago

> what are the cost and security implications?

Cost is the token usage and container uptime.

> One Docker container per-customer sounds like it would be really expensive.

The advantage is per-user memory and self-learning. For context, Claude Managed Agents uses one sandbox per session: https://platform.claude.com/docs/en/managed-agents/environme....

> Are they started on-demand, or run 24/7?

24/7 (best for customer-facing chat products).

> What keeps users from using the agents for general purpose tasks, protects against prompt-injection, etc?

Users define their agent with a system prompt, tool definitions, and skills (which separate a media generation agent from a people search agent). We use Openrouter which has a prompt injection detection feature: https://openrouter.ai/docs/guides/features/guardrails/prompt....

HPMOR•23m ago

I'm curious who the ideal customer of this should be. If we're a startup with our own harness, are we a good fit? What would qualify us or disqualify us from being a good user?

rajit•13m ago

Developers with customer-facing chat products are the ideal customer.

If a startup has a specific flow they want the agent to take and their traffic is bursty, then I'd recommend using a framework like Mastra and deploying onto a sandbox.

For long-running always on agents where it's important to learn the users preferences overtime, our approach is the highest ROI.

ayxliu•12m ago

I think startups are a great fit. Getting a really good agent out of the box lets you scale and give your customers value fast. All you need to think about is the business logic: system prompts, tools to give the agent, skills, etc. You won't need to spend time on building the infra layer, orchestration loops, memory, implementing automations, etc.

sidhusmart•21m ago

But isn’t that the same as using Claude agent sdk minus maybe the memory features? What I mean to say is that you could pick the latest one and switch when another better one rolls out?

We’re using Claude agent sdk right now to rollout an internal agent factory. We haven’t hit the memory issue yet but I do use Hermes as a personal agent and can see where it fits you.

Show HN: Homebrew 6.0.0

Shall we play a game? – LLMs use tactical nukes in 95% of simulations

MiMo Code is now released and open-source

I stopped tracking my time. Now I can't focus

Petition to Withdraw Canada's Bill C-22

Ear Training Practice Exercises

The RCE that AMD wouldn't fix

Emacs appearances in pop culture

Travel Locally, Where You Are

Waymo Premier

Software Is Made Between Commits

Developer gets Half-Life running at 30 FPS on a Nokia N95

macOS 27 Beta breaks the ability to boot Asahi Linux

Pokémon Go Scans Trained the Navigation Tech for Military Drones

Show HN: FablePool – pool money behind a prompt, and Fable builds it in public

Open Reproduction of DeepSeek-R1

Lines of code got a better publicist

The Dynamo and the Computer: The Modern Productivity Paradox (1989) [pdf]

Apple didn't revolutionize power supplies; new transistors did (2012)

Claude Fable 5: mid-tier results on coding tasks

Solar generates more energy in US than coal for first time

Discovery of Cold War-era rare Eastern Bloc computers in a German hangar

FPS.cob: A first person shooter in COBOL

Who Runs the Ransomware Group 'The Gentlemen?'

Building agents without harness engineering

Programming a GBA Game on an iPhone

Fully autonomous drones have killed human soldiers for the first time

Show HN: Boo – screen-style terminal multiplexer built on libghostty

Show HN: Claw Patrol, a security firewall for agents

Doing nothing at work

Building agents without harness engineering

Comments

Show HN: Homebrew 6.0.0

Shall we play a game? – LLMs use tactical nukes in 95% of simulations

MiMo Code is now released and open-source

I stopped tracking my time. Now I can't focus

Petition to Withdraw Canada's Bill C-22

Ear Training Practice Exercises

The RCE that AMD wouldn't fix

Emacs appearances in pop culture

Travel Locally, Where You Are

Waymo Premier

Software Is Made Between Commits

Developer gets Half-Life running at 30 FPS on a Nokia N95

macOS 27 Beta breaks the ability to boot Asahi Linux

Pokémon Go Scans Trained the Navigation Tech for Military Drones

Show HN: FablePool – pool money behind a prompt, and Fable builds it in public

Open Reproduction of DeepSeek-R1

Lines of code got a better publicist

The Dynamo and the Computer: The Modern Productivity Paradox (1989) [pdf]

Apple didn't revolutionize power supplies; new transistors did (2012)

Claude Fable 5: mid-tier results on coding tasks

Solar generates more energy in US than coal for first time

Discovery of Cold War-era rare Eastern Bloc computers in a German hangar

FPS.cob: A first person shooter in COBOL

Who Runs the Ransomware Group 'The Gentlemen?'

Building agents without harness engineering

Programming a GBA Game on an iPhone

Fully autonomous drones have killed human soldiers for the first time

Show HN: Boo – screen-style terminal multiplexer built on libghostty

Show HN: Claw Patrol, a security firewall for agents

Doing nothing at work