Why file systems are the wrong workspace for AI agents

https://blog.getspine.ai/spine-swarm-hits-1-on-gaia-level-3-and-google-deepmind-deepsearchqa/

6•a24venka•2h ago

Comments

a24venka•2h ago

Hey HN — Akshay & Ashwin here, co-founders of Spine AI (YC S23).

We've been rethinking how AI agents work together. Instead of a single model in a chat loop or agents reading/writing to a file system, we built a visual canvas where multiple agents collaborate across connected blocks — and it turns out this architecture significantly outperforms both single and multi-agent systems on hard tasks.

The approach has three parts:

1. Canvas-based workspace — Agents operate on an infinite canvas of intelligent blocks (web browsing, prompts, tables, memos) that connect and pass context to each other. Instead of a flat file system, agents get a structured, non-linear environment that mirrors how complex problems actually decompose.

2. Tiered multi-agent orchestration — An orchestrating agent decomposes tasks, delegates to specialized persona agents (researcher, analyst, reviewer), and manages dependencies. Agents validate each other's work before passing it downstream, catching errors before they compound across long chains.

3. Dynamic multi-model ensembling — Rather than one model for everything, we select from 300+ models per subtask. When confidence is low, we pull in additional models and treat disagreement as a signal for deeper scrutiny — like classical ML ensembling, but at the agent level.

The results: 61.5% on GAIA Level 3 (vs Manus 57.7%, OpenAI Deep Research 47.6%) and 87.6% on DeepSearchQA (vs Perplexity 79.5%, Gemini Deep Research 66.1%). Same frontier models available to everyone — the difference is architecture.

Because everything runs on the canvas, we could audit our agents' work step by step. That's how we caught what appear to be mislabeled questions in the GAIA dataset itself — we link to sample canvases in the post so you can see the reasoning traces.

Spine Swarms is open to try at www.getspine.ai. Happy to go deep on any of the architecture.

Fixfest is a global gathering of repairers, tinkerers, and activists

AI Observability and Evaluations: The Operating System for Reliable LLM Products

The Prompt I Cannot Read

We have more privacy controls yet less privacy

MacBook Neo: Commenting from Privilege?

Zuckerberg is done with Alexandr Wang

Leading Frontier Firm Transformation with Microsoft 365 E7

The Cost of Indirection in Rust

Startup Wants to Launch a Space Mirror

Ask HN: Is Cloudflare Down Again?

Show HN: ROLV – 20x faster MoE FFN inference on Llama 4 Maverick vs. cuBLAS

Show HN: IceCubes – speaker-attributed meeting transcripts without a bot

Approximately 40% of prepaid value is never used

Wegovy and Ozempic owner dealt blow as next drug is branded 'obsolete'

How I Built Brickonomics: Smart Algorithms to Save Money on Lego

Iran Air and Missile War – Ballistic, Interceptors and Munition Stockpiles [video]

GNU, and the AI Reimplementations

AI agents now help attackers, including North Korea, manage their drudge work

Show HN: Monetize APIs for agentic commerce without accounts using Stripe

Florida Judge Rules Red Light Camera Tickets Are Unconstitutional

$100 Oil Now Means Bigger Buybacks with Fewer Jobs and Babies Than Ever Before

Test Data Management with Greenmask and OpenEverest

Where to See Cherry Blossoms in the Bay Area This Spring

Aaron Levie: Building for trillions of agents

Learn about Steam

Indo-European Explorer: A 6k-Year Journey

AI Assistants Are Moving the Security Goalposts

Anthropic sues Trump administration after clash over AI use

A Dev's Checklist for MCP Security and Compliance

Vibe Coding and the Death of Craftsmanship (Personal Essay)