We replaced RAG with a virtual filesystem for our AI documentation assistant

https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant

63•denssumesh•1d ago

Comments

softwaredoug•1h ago

The real thing I think people are rediscovering with file system based search is that there’s a type of semantic search that’s not embedding based retrieval. One that looks more like how a librarian organizes files into shelves based on the domain.

We’re rediscovering forms of in search we’ve known about for decades. And it turns out they’re more interpretable to agents.

https://softwaredoug.com/blog/2026/01/08/semantic-search-wit...

whattheheckheck•1h ago

Turns out the millions of people in knowledge work arent librarians and they wing shit everywhere

wielebny•1h ago

Someone simply assumed at some point that RAG must be based on vector search, and everyone followed.

morkalork•56m ago

Doesn't have to be tho, I've had great success letting an agent loose on an Apache Lucene instance. Turns out LLMs are great at building queries.

softwaredoug•51m ago

It’s something of a historical accident

We started with LLMs when everyone in search was building question answering systems. Those architectures look like the vector DB + chunking we associate with RAG.

Agents ability to call tools, using any retrieval backend, call that into question.

We really shouldn’t start RAG with the assumption we need that. I’ll be speaking about the subject in a few weeks

https://maven.com/p/7105dc/rag-is-the-what-agentic-search-is...

TeMPOraL•37m ago

Right. R in RAG stands for retrieval, and for a brief moment initially, it meant just that: any kind of tool call that retrieves information based on query, whether that was web search, or RDBMS query, or grep call, or asking someone to look up an address in a phone book. Nothing in RAG implies vector search and text embeddings (beyond those in the LLM itself), yet somehow people married the acronym to one very particular implementation of the idea.

bluegatty•23m ago

It was the terminology that did that more than anything. The term 'RAG' just has a lot of consequential baggage. Unfortunately.

khalic•1h ago

This kind of circles back to ontological NLP, that was using knowledge representation as a primitive for language processing. There is _a ton_ of work in that direction.

softwaredoug•48m ago

Exactly. And LLMs supervised by domain experts unlock a lot of capabilities to help with these types of knowledge organization problems.

UltraSane•1h ago

Inverted indexes have the major advantages of supporting Boolean operators.

skeptrune•32m ago

I think it's cool that LLMs can effectively do this kind of categorization on the fly at relatively large scale. When you give the LLM tools beyond just "search", it really is effectively cheating.

czhu12•2m ago

Similar effort with PageIndex [1], which basically creates a table of contents like tree. Then an LLM traverses the tree to figure out which chunks are relevant for the context in the prompt.

1: https://github.com/VectifyAI/PageIndex

seanlinehan•1h ago

This is definitely the way. There are good use cases for real sandboxes (if your agent is executing arbitrary code, you better it do so in an air-gapped environment).

But the idea of spinning up a whole VM to use unix IO primitives is way overkill. Makes way more sense to let the agent spit our unix-like tool calls and then use whatever your prod stack uses to do IO.

skeptrune•28m ago

100% agree. However, if there were no resource tradeoffs, then a FUSE mount would probably be the way to go.

maille•1h ago

Let's say I want a free, local or free-tier-llm, simple solution to search information mostly from my emails and a little bit from text, doc and pdf files. Are there any tool I should try to have ollamma or gemini able to reply with my own knowledge base?

pboulos•54m ago

I think this is a great approach for a startup like Mintlify. I do have skepticism around how practical this would be in some of the “messier” organisations where RAG stands to add the most value. From personal experience, getting RAG to work well in places where the structure of the organisation and the information contained therein is far from hierarchical or partition-able is a very hard task.

khalic•39m ago

The use case is well defined here, let’s not jump the gun. Text search, like with code, is a relatively simple problem compared to intrinsic semantic content in a book for example. I think the moral here is that RAG is not a silver bullet, the claude code team came to the same conclusion.

skeptrune•28m ago

Modern OCR tooling is quite good. If the knowledge you are adding into your search database is able to be OCR'd then I think the approach we took here is able to be generalized.

mandeepj•50m ago

> even a minimal setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) would put us north of $70,000 a year based on Daytona's per-second sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM)

$70k?

how about if we round off one zero? Give us $7000.

That number still seems to be very high.

lstodd•21m ago

Hm. I think a dedicated 16-core box with 64 ram can be had for under $1000/year.

It being dedicated there are no limits on session lifetime and it'd run 16 those sessions no problem, so the real price should be around ~$70/year for that load.

dmix•39m ago

This puts a lot of LLM in front of the information discovery. That would require far more sophisticated prompting and guardrails. I'd be curious to see how people architect an LLM->document approach with tool calling, rather than RAG->reranker->LLM. I'm also curious what the response times are like since it's more variable.

skeptrune•30m ago

Hmmm, the post is an attempt to explain that Mintlify migrated from embedding-retrieval->reranker->LLM to an agent loop with access to call POSIX tools as it desires. Perhaps we didn't provide enough detail?

dmix•19m ago

That matches what I'm curious about. Where an LLM is doing the bulk of information discovery and tool calling directly. Most simpler RAGs have an LLM on the frontend mostly just doing simpler query clean up, subqueries and taxonomy, then again later to rerank and parse the data. So I'd imagine the prompting and guardrails part is much more complicated in an agent loop approach, since it's more powerful and open ended.

Galanwe•37m ago

I am not familiar with the tech stack they use, but from an outsider point of view, I was sort of expecting some kind of fuse solution. Could someone explain why they went through a fake shell? There has to be a reason.

skeptrune•34m ago

100% agree a FUSE mount would be the way to go given more time and resources.

Putting Chroma behind a FUSE adapter was my initial thought when I was implementing this but it was way too slow.

I think we would also need to optimize grep even if we had a FUSE mount.

This was easier in our case, because we didn’t need a 100% POSIX compatibility for our read only docs use case because the agent used only a subset of bash commands anyway to traverse the docs. This also avoids any extra infra overhead or maintenance of EC2 nodes/sandboxes that the agent would have to use.

tschellenbach•34m ago

I think generally we are going from vector based search, to agentic tool use, and hierarchy based systems like skills.

skeptrune•31m ago

Vector search has moved from a "complete solution" to just one tool among many which you should likely provide to an agent.

ghywertelling•7m ago

Agents doing retrieval has been around for quite a while

https://huggingface.co/docs/smolagents/en/examples/rag

Agentic RAG: A More Powerful Approach We can overcome these limitations by implementing an Agentic RAG system - essentially an agent equipped with retrieval capabilities. This approach transforms RAG from a rigid pipeline into an interactive, reasoning-driven process.

The innovation of the blogpost is in the retrieval step.

ctxc•33m ago

haha, sweet. One of the cooler things I've read lately

dust42•25m ago

If grep and ls do the trick, then sure you don't need RAG/embeddings. But you also don't need an LLM: a full text search in a database will be a lot more performant, faster and use less resources.

tylergetsay•25m ago

I dont understand the additional complexity of mocking bash when they could just provide grep, ls, find, etc tools to the LLM

kenforthewin•24m ago

I don't get it - everybody in this thread is talking about the death of vector DBs and files being all you need. The article clearly states that this is a layer on top of their existing Chroma db.

bluegatty•23m ago

RAG should have have been represented as a context tool but rather just vector querying ad an variation of search/query - and that's it.

We were bit by our own nomenclature.

Just a small variation in chosen acronym may have wrought a different outcome.

Different ways to find context are welcome, we have a long way to go!

HanClinto•18m ago

> "The agent doesn't need a real filesystem; it just needs the illusion of one. Our documentation was already indexed, chunked, and stored in a Chroma database to power our search, so we built ChromaFs: a virtual filesystem that intercepts UNIX commands and translates them into queries against that same database. Session creation dropped from ~46 seconds to ~100 milliseconds, and since ChromaFs reuses infrastructure we already pay for, the marginal per-conversation compute cost is zero."

Not to be "that guy" [0], but (especially for users who aren't already in ChromaDB) -- how would this be different for us from using a RAM disk?

> "ChromaFs is built on just-bash ... a TypeScript reimplementation of bash that supports grep, cat, ls, find, and cd. just-bash exposes a pluggable IFileSystem interface, so it handles all the parsing, piping, and flag logic while ChromaFs translates every underlying filesystem call into a Chroma query."

It sounds like the expected use-case is that agents would interact with the data via standard CLI tools (grep, cat, ls, find, etc), and there is nothing Chroma-specific in the final implementation (? Do I have that right?).

The author compares the speeds against the Chroma implementation vs. a physical HDD, but I wonder how the benchmark would compare against a Ramdisk with the same information / queries?

I'm very willing to believe that Chroma would still be faster / better for X/Y/Z reason, but I would be interested in seeing it compared, since for many people who already have their data in a hierarchical tree view, I bet there could be some massive speedups by mounting the memory directories in RAM instead of HDD.

[0] - https://news.ycombinator.com/item?id=9224

skeptrune•15m ago

We would also be super interested to see that comparison. I agree that there isn't a specific reason why Chroma would be required to build something like this.

jrm4•18m ago

Is this related to that thing where somehow the entire damn world forgot about the power of boolean (and other precise) searching?

iNaturalist

Show HN: I built a frontpage for personal blogs

We replaced RAG with a virtual filesystem for our AI documentation assistant

Go on Embedded Systems and WebAssembly

Show HN: An evidence-rated encyclopedia of peptides

Samsung Magician disk utility takes 18 steps and two reboots to uninstall

The Technocracy Movement of the 1930s

Understanding young news audiences at a time of rapid change

A School District Tried to Help Train Waymos to Stop for School Buses

Show HN: TurboQuant for vector search – 2-4 bit compression

April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

Build your own Dial-up ISP with a Raspberry Pi

A Recipe for Steganogravy

Big-Endian Testing with QEMU

SSH certificates: the better SSH experience

F-15E jet shot down over Iran

If you're running OpenClaw, you probably got hacked in the last week

Show HN: Apfel – The free AI already on your Mac

What Category Theory Teaches Us About DataFrames

TDF ejects its core developers

ESP32-S31: Dual-Core RISC-V SoC with Wi-Fi 6, Bluetooth 5.4, and Advanced HMI

Solana Drift Protocol drained of $285M via fake token and governance hijack

Mercurial Dyson – a plan for the disassembly of planet Mercury

Category Theory Illustrated – Types

What we learned building 100 API integrations with OpenCode

NHS staff refusing to use FDP over Palantir ethical concerns

Firm boosts H.264 streaming license fees from $100k up to staggering $4.5M

Solar and batteries can power the world

Intel Assured Supply Chain Product Brief

Google releases Gemma 4 open models

We replaced RAG with a virtual filesystem for our AI documentation assistant

Comments

iNaturalist

Show HN: I built a frontpage for personal blogs

We replaced RAG with a virtual filesystem for our AI documentation assistant

Go on Embedded Systems and WebAssembly

Show HN: An evidence-rated encyclopedia of peptides

Samsung Magician disk utility takes 18 steps and two reboots to uninstall

The Technocracy Movement of the 1930s

Understanding young news audiences at a time of rapid change

A School District Tried to Help Train Waymos to Stop for School Buses

Show HN: TurboQuant for vector search – 2-4 bit compression

April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

Build your own Dial-up ISP with a Raspberry Pi

A Recipe for Steganogravy

Big-Endian Testing with QEMU

SSH certificates: the better SSH experience

F-15E jet shot down over Iran

If you're running OpenClaw, you probably got hacked in the last week

Show HN: Apfel – The free AI already on your Mac

What Category Theory Teaches Us About DataFrames

TDF ejects its core developers

ESP32-S31: Dual-Core RISC-V SoC with Wi-Fi 6, Bluetooth 5.4, and Advanced HMI

Solana Drift Protocol drained of $285M via fake token and governance hijack

Mercurial Dyson – a plan for the disassembly of planet Mercury

Category Theory Illustrated – Types

What we learned building 100 API integrations with OpenCode

NHS staff refusing to use FDP over Palantir ethical concerns

Firm boosts H.264 streaming license fees from $100k up to staggering $4.5M

Solar and batteries can power the world

Intel Assured Supply Chain Product Brief

Google releases Gemma 4 open models