So you wanna build a local RAG?

https://blog.yakkomajuri.com/blog/local-rag

30•pedriquepacheco•1h ago

Comments

mips_avatar•18m ago

One thing I didn’t see here that might be hurting your performance is a lack of semantic chunking. It sounds like you’re embedding entire docs, which kind of breaks down if the docs contain multiple concepts. A better approach for recall is using some kind of chunking program to get semantic chunks (I like spacy though you have to configure it a bit). Then once you have your chunks you need to append context to how this chunk relates to the rest of your doc before you do your embedding. I have found anthropics approach to contextual retrieval to be very performant in my RAG systems (https://www.anthropic.com/engineering/contextual-retrieval) you can just use gpt oss 20b as the model for generation of context.

Unless I’ve misunderstood your post and you are doing some form of this in your pipeline you should see a dramatic improvement in performance once you implement this.

simonw•10m ago

My advice for building something like this: don't get hung up on a need for vector databases and embedding.

Full text search or even grep/rg are a lot faster and cheaper to work with - no need to maintain a vector database index - and turn out to work really well if you put them in some kind of agentic tool loop.

The big benefit of semantic search was that it could handle fuzzy searching - returning results that mention dogs if someone searches for canines, for example.

Give a good LLM a search tool and it can come up with searches like "dog OR canine" on its own - and refine those queries over multiple rounds of searches.

Plus it means you don't have to solve the chunking problem!

nilirl•10m ago

Why is it implicit that semantic search will outperform lexical search?

Back in 2023 when I compared semantic search to lexical search (tantivy; BM25), I found the search results to be marginally different.

Even if semantic search has slightly more recall, does the problem of context warrant this multi-component, homebrew search engine approach?

By what important measure does it outperform a lexical search engine? Is the engineering time worth it?

Can Dutch universities do without Microsoft?

Bringing Sexy Back. Internet surveillance has killed eroticism

So you wanna build a local RAG?

C++ Web Server on my custom hobby OS

Don't tug on that, you never know what it might be attached to

True P2P Email on Top of Yggdrasil Network

Meta hiding $27B in debt using advanced geometry

Atuin’s New Runbook Execution Engine

JSON Schema Demystified: Dialects, Vocabularies and Metaschemas

Show HN: Glasses to detect smart-glasses that have cameras

Show HN: An LLM-Powered Tool to Catch PCB Schematic Mistakes

AI Adoption Rates Starting to Flatten Out

Petition to formally recognize open source work as civic service in Germany

Tech Titans Amass Multimillion-Dollar War Chests to Fight AI Regulation

Moss: a Rust Linux-compatible kernel in 26,000 lines of code

Pocketbase – open-source realtime back end in 1 file

Stellantis Is Spamming Owners' Screens with Pop-Up Ads for New Car Discounts

Apple and Intel Rumored to Partner on Mac Chips

Lobsters Interview

The Signal Is the Noise

A Tale of Four Fuzzers

Generating 3D Meshes from Text

A Remarkable Assertion from A16Z

Tell HN: Want a better HN? Visit /newest

Swedish publishers file police report against Meta's Zuckerberg for fraud

Playtiles: The Pocket-Sized Gaming Platform

Writing Builds Resilience in Everyday Challenges by Changing Your Brain

A Repository with 44 Years of Unix Evolution

The Math of Why You Can't Focus at Work

Show HN: Spikelog – A simple metrics service for scripts, cron jobs, and MVPs

So you wanna build a local RAG?

Comments

Can Dutch universities do without Microsoft?

Bringing Sexy Back. Internet surveillance has killed eroticism

So you wanna build a local RAG?

C++ Web Server on my custom hobby OS

Don't tug on that, you never know what it might be attached to

True P2P Email on Top of Yggdrasil Network

Meta hiding $27B in debt using advanced geometry

Atuin’s New Runbook Execution Engine

JSON Schema Demystified: Dialects, Vocabularies and Metaschemas

Show HN: Glasses to detect smart-glasses that have cameras

Show HN: An LLM-Powered Tool to Catch PCB Schematic Mistakes

AI Adoption Rates Starting to Flatten Out

Petition to formally recognize open source work as civic service in Germany

Tech Titans Amass Multimillion-Dollar War Chests to Fight AI Regulation

Moss: a Rust Linux-compatible kernel in 26,000 lines of code

Pocketbase – open-source realtime back end in 1 file

Stellantis Is Spamming Owners' Screens with Pop-Up Ads for New Car Discounts

Apple and Intel Rumored to Partner on Mac Chips

Lobsters Interview

The Signal Is the Noise

A Tale of Four Fuzzers

Generating 3D Meshes from Text

A Remarkable Assertion from A16Z

Tell HN: Want a better HN? Visit /newest

Swedish publishers file police report against Meta's Zuckerberg for fraud

Playtiles: The Pocket-Sized Gaming Platform

Writing Builds Resilience in Everyday Challenges by Changing Your Brain

A Repository with 44 Years of Unix Evolution

The Math of Why You Can't Focus at Work

Show HN: Spikelog – A simple metrics service for scripts, cron jobs, and MVPs