Launch HN: Terminal Use (YC W26) – Vercel for filesystem-based agents

42•filipbalucha•2h ago

Hello Hacker News! We're Filip, Stavros, and Vivek from Terminal Use (https://www.terminaluse.com/). We built Terminal Use to make it easier to deploy agents that work in a sandboxed environment and need filesystems to do work. This includes coding agents, research agents, document processing agents, and internal tools that read and write files.

Here's a demo: https://www.youtube.com/watch?v=ttMl96l9xPA.

Our biggest pain point with hosting agents was that you'd need to stitch together multiple pieces: packaging your agent, running it in a sandbox, streaming messages back to users, persisting state across turns, and managing getting files to and from the agent workspace.

We wanted something like Cog from Replicate, but for agents: a simple way to package agent code from a repo and serve it behind a clean API/SDK. We wanted to provide a protocol to communicate with your agent, but not constraint the agent logic or harness itself.

On Terminal Use, you package your agent from a repo with a config.yaml and Dockerfile, then deploy it with our CLI. You define the logic of three endpoints (on_create, on_event, and on_cancel) which track the lifecycle of a task (conversation). The config.yaml contains details about resources, build context, etc.

Out of the box, we support Claude Agent SDK and Codex SDK agents. By support, we mean that we have an adapter that converts from the SDK message types to ours. If you'd like to use your own custom harness, you can convert and send messages with our types (Vercel AI SDK v6 compatible). For the frontend, we have a Vercel AI SDK provider that lets you use your agent with Vercel's AI SDK, and have a messages module so that you don't have to manage streaming and persistence yourself.

The part we think is most different is storage.

We treat filesystems as first-class primitives, separate from the lifecycle of a task. That means you can persist a workspace across turns, share it between different agents, or upload / download files independent of the sandbox being active. Further, our filesystem SDK provides presigned urls which makes it easy for your users to directly upload and download files which means that you don't need to proxy file transfer through your backend.

Since your agent logic and filesystem storage are decoupled, this makes it easy to iterate on your agents without worrying about the files in the sandbox: if you ship a bug, you can deploy and auto-migrate all your tasks to the new deployment. If you make a breaking change, you can specify that existing tasks stay on the existing version, and only new tasks use the new version.

We're also adding support for multi-filesystem mounts with configurable mount paths and read/write modes, so storage stays durable and reusable while mount layout stays task-specific.

On the deployment side, we've been influenced by modern developer platforms: simple CLI deployments, preview/production environments, git-based environment targeting, logs, and rollback. All the configuration you need to build, deploy & manage resources for your agent is stored in the config.yaml file which makes it easy to build & deploy your agent in CI/CD pipelines.

Finally, we've explicitly designed our platform for your CLI coding agents to help you build, test, & iterate with your agents. With our CLI, your coding agents can send messages to your deployed agents, and download filesystem contents to help you understand your agent's output. A common way we test our agents is that we make markdown files with user scenarios we'd like to test, and then ask Claude Code to impersonate our users and chat with our deployed agent.

What we do not have yet: full parity with general-purpose sandbox providers. For example, preview URLs and lower-level sandbox.exec(...) style APIs are still on the roadmap.

We're excited to hear any thoughts, insights, questions, and concerns in the comments below!

Comments

verdverm•2h ago

Can you explain why everyone thinks we should use new tools to deploy agents instead of our existing infra?

eg. I already run Kubernetes

jwoq9118•2h ago

Unrelated but your comments on https://news.ycombinator.com/item?id=44736176 related to the Terminal agents coding craze have helped me feel less crazy. People using GitHub Copilot CLI and Claude Code, they either never review the code or end up opening up an IDE to review the code, and I'm sitting here like, why don't you use the terminal in your favorite IDE? You're using a Terminal as a chat interface, so why not just use a chat interface? Or use the terminal in VS Code which actually now integrates very well with Claude Code and GitHub Copilot CLI so you can see what's going on across the many files this thing is editing?

The hype is so large with the CLI coding tools I got FOMO, but as you were saying in that thread, I see no tangible improvement to the value I get out of AI coding tools by using the CLI alone. I use the CLI in VS Code, and I use the chat panel, and the only thing that seems to actually make a difference is the "context engineering" stuff of custom instructions, agent skills, prompt files, hooks, custom agents, all that stuff, which works no matter which interface you use to kick off your AI coding instructions.

Would be curious to hear your thoughts on the topic all these months later.

verdverm•2h ago

Glad to find comradery! I've started the CLI interface to my custom agent since lol

The reasons are (1) it's faster to do admin work like naming or deleting old sessions (2) I have not gotten the remote setup to work yet (haven't tried) but I do want to use it somewhere

But yeah, it's gotten worse, the latest I recall is a new diff viewer for AI in the terminal (I already have git and lazygit)

instalabsai•2h ago

We have also built something custom ourselves (with modal.com serverless containers), running thousands of on-demand coding agents each day and already the assumptions that Terminal Use is making (about using the file system and coding agent support) would not work for our use case.

verdverm•2h ago

It seems like so many of the AI "solutions" are hallucinating the problems. I either don't have them, because I use better AI frameworks, or I have tools at hand that solve them nicely.

We don't need to rebuild everything just for agents, except that people think they can make money by doing so. YC has disappointed me of late with the lack of diversity in their companies. I suspect the change in leadership is central to this.

goosejuice•2h ago

At least on K8s you can control the network policy. That's the harder problem to solve. I suspect we'll see a lot of exfiltration via prompt injection in the next few years.

alexchantavy•1h ago

I think there are some primitives for agents that need to be built out for better security and being able to reason about them.

Agents run on infra, they have network connectivity, they have ACLs and permissions that let them read+write+execute on resources, they can interact with other agents.

To manage them from both an infra and security perspective, we can use the existing underlying primitives, but it's also useful to build abstractions around them for management, kind of like how microservices encapsulate compute+storage+network together.

I think of agents as basically microservices that can act in non-deterministic ways, and the potential "blast radius" of their actions is very wide. So you need to be able to map what an agent can do, and it's much easier to do that if there are abstractions or automatic groupings instead of doing this all ourselves.

verdverm•1h ago

Right, those abstractions and controls already exist in the Kubernetes ecosystem. I can use one set of abstractions for everything, as opposed to having something separate for agents. They are not that different, the tooling I have covers it. There are also CRDs and operators to extend for a more DSL like experience.

tl;dr, I don't think the shovel analogy holds up for most of the Ai submissions and products we see here.

webpolis•33m ago

The blast radius point is right, and I think it points at a design split that's underappreciated.

Most sandboxing approaches — including this one — optimize primarily for isolation from the host: prevent the agent from escaping, limit what it can touch. That solves the runaway agent problem.

But there's a second axis: observable execution for human collaborators. When an agent modifies a codebase or runs a research task, a teammate often needs to watch it happen in real time, intervene before it commits a wrong turn, or audit what actually ran. Async logs and artifact outputs don't cover this well.

We've been building Cyqle - https://cyqle.in -- (disclosure: I work on it) from that angle — cloud desktop sessions where agent runs are shared live with whoever needs visibility. Isolation is at the VM/session level rather than syscall granularity. Different tradeoff: you give up process-level permission mapping, you gain real-time collaborative access to the running environment — watch, intervene, hand off.

The use cases probably don't overlap much with Terminal Use (async batch filesystem agents clearly want deep process isolation + lifecycle APIs). But this thread made me think "agent environments" is actually several distinct problem spaces: async autonomous execution, interactive human-supervised sessions, team-observable debugging runs. The right primitives look very different depending on which you're solving.

debarshri•1h ago

I think Kubernetes is a good candidate to run these sandboxes. It is just that you have to do a lot of annotations, node group management, pod security policies, etc., to name a few. Apply the principle of least privilege for access to mitigate risk.

I think Kata containers with Kubernetes is an even better sandboxing option for these agents to run remotely.

Shameless plugin here but we at Adaptive [1] do something similar.

[1] https://adaptive.live

verdverm•1h ago

We already do those things with k8s, so it's not an issue

The permissions issues you mention are handled by SA/WIF and the ADK framework.

Same question to OP, why do you think I need a special tool for this?

hrmtst93837•22m ago

I think people pick new tooling not because k8s lacks horsepower, but because running per-user filesystem-backed agents on k8s forces you to build and maintain a surprising amount of glue code. Newer platforms put versioned mounts, local-first dev cycles, secure ephemeral runtimes, and opinionated deployment so teams can focus on agent logic instead of writing Helm charts and CSI gymnastics.

If you repurpose k8s with ephemeral volumes or emptyDir, a sidecar, you'll likely get predictable ops and avoid vendor lock-in. Expect more operator work, fragile debugging across PVCs and sidecars, and the need to invest in local emulation or a Firecracker or gVisor sandbox if you want anything like laptop parity.

thesiti92•2h ago

have you guys found any of the existing nfs tools helpful (archil, daytona volumes, ...) or did you have to roll your own? i guess i have the same question for checkpointing/retrying too. it feels like the market of tools is very up in the air right now.

verdverm•2h ago

I'm using Dagger to checkpoint and all the fun stuff that can come after

huntaub•1h ago

howdy! two things on the archil front:

1. we're not NFS, we wrote our own protocol to get much better performance

2. we're planning on coming out with native branching this month, which should make these kinds of workloads much easier to build!

stavrosfil•5m ago

Yep, this whole area still feels pretty unsettled. The thing we've become convinced of is that workspace state needs to be a first-class product primitive instead of something tied to one sandbox. That's why we model filesystems separately from tasks and focus on durable mount/sync semantics.

We're currently rolling our own but we've been meaning to experiment with other tools.

CharlesW•2h ago

> We built Terminal Use to make it easier to deploy agents that work in a sandboxed environment and need filesystems to do work.

When I read this, I think of Fly.io's sprites.dev. Is that reasonable, or do you consider this product to be in a different space? If the latter, can you ELI5?

filipbalucha•19m ago

We overlap at the sandbox layer, but we're focused more on the layer above that: packaging agent code + deploying/versioning it, managing tasks over time, handling message persistence, and attaching durable workspaces to those tasks.

adi4213•1h ago

This is really interesting, congrats on the launch. The use case I’m trying to solve for is building a coding agent platform that reliably sets up our development stack well. Few questions! In my case, I’m trying to build a one-shot coding agent platform that nicely spins up a docker-in-docker Supabase environment, runs a NextJS app, and durably listens to CI and iterates.

1) Can I use this with my ChatGPT pro or Claude max subscription? 2)

oliver236•1h ago

is this a replacement to langgraph?

rodchalski•34m ago

The K8s-vs-agent-infra debate here is interesting. K8s gives you process and network isolation. What it doesn't give you: per-task authorization scope.

An agent container has a credential surface defined at deploy time. That surface doesn't change between task 1 ("read this repo") and task 2 ("process this user upload"). If the agent is prompt-injected during task 1, it carries the same permissions into task 2.

The missing primitives aren't infra — they're policy: what is this agent authorized to do with the data it can reach, on a per-task basis? Can it write, or only read? Can it exfil to an external URL, or only to /output? And crucially: is there an append-only record of what it actually did, so you can audit post-incident?

K8s handles the container boundary. The authorization layer above that — task-scoped grants, observable action ledger, revocation mid-task — isn't solved by existing infra abstractions. That gap is real regardless of whether you use K8s, Modal, or something like this.

messh•14m ago

how does it compare to https://shellbox.dev? (and others like exe.dev, sprites.dev, and blaxel.ai)

Flexible feline spines shed light on "falling cat" problem

Iran Transformed

Agent Skill to Use a Debugger

EU publishers won a piece of a shrinking pie

Fukushima at 15: Living with radioactive hot spots and stigma

Show HN: ChopChopGo – Sigma-based threat hunting for Linux forensic artifacts

Animator Pro (Autodesk Animator) Source Code

We strongly oppose the Unified Attestation initiative

Oscar Pool Ballot, 98th Academy Awards

Advanced Pet Screen Drawing Techniques

The Reviewer Isn't the Bottleneck

Apple in 2025: The Six Colors report card

Show HN: ContextForge now supports Cursor IDE – persistent AI memory

Show HN: A2UI for Elixir/Phoenix/LiveView

Reasoning boosts search relevance 15-30%

Specimen Gallery – CC0 transparent specimen PNGs organized by taxonomy

Show HN: An AI system that pushes political reform

Price-Checking Zerocopy's Zero Cost Abstractions

Uber reported to the state that I was fired for "annoying a coworker."

Things I've Done with AI

Ask HN: What apps have you created for your own use?

Sam Kriss on AI's false starts, doomsday scenarios, and eccentric proponents

Ask HN: How does one review code when most of the code is written by AI?

Code-review-graph: persistent code graph that cuts Claude Code token usage

Latent Context Compilation: Distilling Long Context into Compact Portable Memory

AluminatiAI – per-job GPU cost tracking (Nvidia-smi shows watts, not dollars)

Hono js

Code-review-graph: persistent code graph that cuts Claude Code token usage

Andrew Ng Just Dropped Context Hub – GitHub for AI Agent Knowledg

Toni Schneider (New Bluesky CEO) - Coming Off the Bench for Bluesky

Launch HN: Terminal Use (YC W26) – Vercel for filesystem-based agents

Comments

Flexible feline spines shed light on "falling cat" problem

Iran Transformed

Agent Skill to Use a Debugger

EU publishers won a piece of a shrinking pie

Fukushima at 15: Living with radioactive hot spots and stigma

Show HN: ChopChopGo – Sigma-based threat hunting for Linux forensic artifacts

Animator Pro (Autodesk Animator) Source Code

We strongly oppose the Unified Attestation initiative

Oscar Pool Ballot, 98th Academy Awards

Advanced Pet Screen Drawing Techniques

The Reviewer Isn't the Bottleneck

Apple in 2025: The Six Colors report card

Show HN: ContextForge now supports Cursor IDE – persistent AI memory

Show HN: A2UI for Elixir/Phoenix/LiveView

Reasoning boosts search relevance 15-30%

Specimen Gallery – CC0 transparent specimen PNGs organized by taxonomy

Show HN: An AI system that pushes political reform

Price-Checking Zerocopy's Zero Cost Abstractions

Uber reported to the state that I was fired for "annoying a coworker."

Things I've Done with AI

Ask HN: What apps have you created for your own use?

Sam Kriss on AI's false starts, doomsday scenarios, and eccentric proponents

Ask HN: How does one review code when most of the code is written by AI?

Code-review-graph: persistent code graph that cuts Claude Code token usage

Latent Context Compilation: Distilling Long Context into Compact Portable Memory

AluminatiAI – per-job GPU cost tracking (Nvidia-smi shows watts, not dollars)

Hono js

Code-review-graph: persistent code graph that cuts Claude Code token usage

Andrew Ng Just Dropped Context Hub – GitHub for AI Agent Knowledg

Toni Schneider (New Bluesky CEO) - Coming Off the Bench for Bluesky