Ask HN: How do you build per-user RAG/GraphRAG

2•david1542•9mo ago

Hey all,

I’ve been working on an AI agent system over the past year that connects to internal company tools like Slack, GitHub, Notion, etc, to help investigate production incidents. The agent needs context, so we built a system that ingests this data, processes it, and builds a structured knowledge graph (kind of a mix of RAG and GraphRAG).

What we didn’t expect was just how much infra work that would require.

We ended up:

- Using LlamaIndex's OS abstractions for chunking, embedding and retrieval.

- Adopting Chroma as the vector store.

- Writing custom integrations for Slack/GitHub/Notion. We used LlamaHub here for the actual querying, although some parts were a bit unmaintained and we had to fork + fix. We could’ve used Nango or Airbyte tbh but eventually didn't do that.

- Building an auto-refresh pipeline to sync data every few hours and do diffs based on timestamps. This was pretty hard as well.

- Handling security and privacy (most customers needed to keep data in their own environments).

- Handling scale - some orgs had hundreds of thousands of documents across different tools.

It became clear we were spending a lot more time on data infrastructure than on the actual agent logic. I think it might be ok for a company that interacts with customers' data, but definitely we felt like we were dealing with a lot of non-core work.

So I’m curious: for folks building LLM apps that connect to company systems, how are you approaching this? Are you building it all from scratch too? Using open-source tools? Is there something obvious we’re missing?

Would really appreciate hearing how others are tackling this part of the stack.

Comments

barrenko•9mo ago

I'd say this is normal? There may be some solutions popping up, but I haven't been drinking straight from X.com AI/ML firehose lately so I don't know of one unisolution at the moment.

PaulHoule•9mo ago

If it's hard for you than it's hard for your customers and they have a reason to pay for your product.

After I left a job where I developed a neural search engine for patents (years before BERT) I talked with many of the vendors in the enterprise search and what I found was that few of them did systematic work to improve the relevance of their results [1] and few of them tried to sell their product based on the quality of the results.

What they all promoted was ease of integration with hundreds of data sources, security, privacy, scale, rapid sync, etc. Looking at the way these got sold, I'd say that all of that is the core work and the actual search engine is an afterthought.

[1] See https://trec.nist.gov/

david1542•9mo ago

Yeh I tend to agree. Real value comes from carefully curating the data and applying smart optimizations, which is something few companies focus on. But I also get the sense that a lot of energy ends up being spent elsewhere - on integration, infrastructure, lots of fragmented OS libraries, etc at the expense of iteration speed and relevance-focused experimentation.

PaulHoule•9mo ago

I was frustrated with enterprise search vendors and their customers because they didn't see it my way. Here are some ways of thinking about it.

Most cynically, enterprise software is bought by different people than those who use it. The buyers have a list of items to check and the fastest way to get eliminated is to not have an integration for a data source they have so vendors will put up a comprehensive list of them on their web site. The buyers will never test the relevance of the results against their data, though the users will feel it every day, unless the search engine is so bad that they just don't use it. (Common!)

On the other hand, if the integration doesn't work, you get recall of 0% no matter how smart and well tuned your search engine is.

I think a lot of founders and data scientists believe in a variant of the Pareto principle which comes down to "I want to do the 20% of the work that gets me 80% of the way there". The trouble is that a minimum viable product has to be viable, and you have to get to 100% of that minimum or you are always going to be a bridesmaid and never a bride.

The awful truth about data science, relevance, ML and all that is that data is dirty and takes a huge amount of work to wrangle. If you want "iteration speed and relevance-focused experimentation" you have to make investments in product, people and process to run more cycles in less calendar time. Look up my profile and ping me if you want to hear war stories.

X (Twitter) is back with a new X API Pay-Per-Use model

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

When Michelangelo Met Titian

Solving NYT Pips with DLX

Baldur's Gate to be turned into TV series – without the game's developers

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

Disablling Go Telemetry

Effective Nihilism

The UK government didn't want you to see this report on ecosystem collapse

No 10 blocks report on impact of rainforest collapse on food prices

Seedance 2.0 Is Coming

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Dexterous robotic hands: 2009 – 2014 – 2025

Interop 2025: A Year of Convergence

JobArena – Human Intuition vs. Artificial Intelligence

Concept Artists Say Generative AI References Only Make Their Jobs Harder

Show HN: PaySentry – Open-source control plane for AI agent payments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

The Crumbling Workflow Moat: Aggregation Theory's Final Chapter

Pax Historia – User and AI powered gaming platform

Show HN: I built a RAG engine to search Singaporean laws

Scams, Fraud, and Fake Apps: How to Protect Your Money in a Mobile-First Economy

Porting Doom to My WebAssembly VM

Cognitive Style and Visual Attention in Multimodal Museum Exhibitions

Full-Blown Cross-Assembler in a Bash Script

Logic Puzzles: Why the Liar Is the Helpful One

Optical Combs Help Radio Telescopes Work Together