frontpage.

Show HN: Managed MCP Sandbox Environments for RL Training on Tool Use

3•wirehack•1mo ago

Hi HN! We are Klavis AI (https://www.klavis.ai/) and we are launching a managed MCP Sandbox-as-a-Service for RL training on tool use.

If you want a model to learn tool use through RL, you need realistic environments where the model can take actions, you can observe the resulting state, and compute a reward. For SaaS tools, this means managing dozens of test accounts, handling OAuth and token refresh, seeding realistic data for each episode, resetting state between runs, and ensuring isolation when you're running concurrent training sessions. Most research teams spend months building this plumbing per integration.

Klavis is a managed sandbox service that handles all of that. You call our API to get an isolated sandbox backed by a real service instance (not a mock), initialize it with whatever data state you need, let your model interact via MCP, then dump the final state to compute your reward. One more API call resets everything for the next episode.

The key thing is these are real services, not static mocks. When your model creates a calendar event or updates a Salesforce record, that action actually executes against real infrastructure. The state changes are real. This matters because you want training to reflect production behavior exactly.

We currently support 50+ integrations across productivity tools (Google Calendar, Outlook, Slack), CRM (Salesforce, HubSpot), dev tools (GitHub, Jira, Linear), databases (Postgres, Snowflake), and others. We handle the account pooling, auth management, and lifecycle orchestration so researchers can focus on the actual training.

Technically, the workflow is: create a sandbox, call initialize API with a JSON payload defining your starting state, let the model interact via standard MCP tools, call dump API to get a typed snapshot of the final state, compare against your target for reward calculation, then call reset or delete. We use strict Pydantic schemas for all inputs and outputs so malformed data gets rejected immediately rather than causing silent failures mid-training.

Here is a quick demo: https://youtu.be/10C18rpCYcA.

We look forward to your comments. Thanks for reading!

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

University of Waterloo Webring

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)

Human Systems Research Submolt

The Threads Algorithm Loves Rage Bait

Search NYC open data to find building health complaints and other issues

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Show HN: Grovia – Long-Range Greenhouse Monitoring System

Ask HN: The Coming Class War

Mind the GAAP Again

The Yardbirds, Dazed and Confused (1968)

Agent News Chat – AI agents talk to each other about the news

Do you have a mathematically attractive face?

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality

The Cascading Effects of Repackaged APIs [pdf]

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]