frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Self-hosted RAG with MCP support for OpenClaw

https://github.com/2dogsandanerd/ClawRag
2•2dogsanerd•1h ago
I've been using OpenClaw to control my home server via WhatsApp, but it couldn't access my documents. Instead of uploading my private contracts to OpenAI, I built ClawRAG – a self-hosted RAG engine that connects to OpenClaw via MCP (Model Context Protocol). Now I can ask "What did the contract say about liability?" and get cited answers, not hallucinations.

Most RAG systems are either too complex for a solo dev's home setup or they rely on cloud-hosted vector stores. I needed something that runs in a single Docker container, understands messy PDFs (tables!), and integrates natively as a "tool" for agents rather than just another REST endpoint.

## Technical Deep Dive

### Why MCP instead of REST? I chose the Model Context Protocol (MCP) because it provides structured schemas that LLMs understand natively. The MCP server exposes `query_knowledge` as a tool, allowing the agent to decide exactly when to pull from the knowledge base vs. when to use its built-in memory. It prevents "tool-drift" and ensures type-safe responses.

### The Stack - *Parsing*: Docling 2.13.0 (The first parser I've found that doesn't choke on nested tables in legacy PDFs). - *Storage*: ChromaDB (Lightweight, file-based, no Postgres/pgvector overhead needed for personal knowledge bases). - *Search*: Hybrid (Vector similarity + BM25 keyword search) fused using Reciprocal Rank Fusion (RRF) for better retrieval on specific legal jargon. - *Footprint*: Optimized to run under 2GB RAM (excluding the local LLM).

### The tricky part: Citation Preservation Getting citations to work reliably over a WhatsApp round-trip was the biggest challenge. I had to ensure chunk IDs and source metadata survive the transformation from ChromaDB → LlamaIndex → LLM → OpenClaw → WhatsApp without getting "summarized away" or sanitized by the LLM's output formatting.

## Use Case Last week my landlord claimed I signed a clause about garden/snow maintenance. I pulled up my phone, wrote to my OpenClaw bot: "Search my lease for gardening obligations". It found the relevant paragraph in 3 seconds, cited the page/section, and provided the exact quote. Argument closed.

## Quick Start The repo includes a `docker-compose.yml` that spins up everything including the vector store:

```bash # 1. Start ClawRAG docker compose up -d

# 2. Add your documents curl -X POST http://localhost:8080/api/v1/rag/documents/upload \ -F "files=@my_lease.pdf" \ -F "collection_name=personal"

# 3. Connect to your agent openclaw mcp add --transport stdio clawrag npx -y @clawrag/mcp-server ```

## Community & Feedback Code is MIT licensed. I'd love feedback on the MCP implementation – specifically if you see better ways to handle tool schemas for multi-collection search.

*Ask me anything about the architecture or how I handled the citation logic!*

---

### Hidden Technical Details - *Privacy*: Zero external data leaks. Everything stays on your metal. - *LLM Agnostic*: Tested with Ollama (Llama 3.2) and Claude 3.5 via API. - *Context Management*: Explicit context window limiting to prevent GPU crashes on 8GB VRAM cards.

Imaging the Wigner crystal state in a new type of quantum material

https://phys.org/news/2026-01-imaging-wigner-crystal-state-quantum.html
1•rbanffy•1m ago•0 comments

Why Airbnb Never Became a Trillion-Dollar Company

https://gilpignol.substack.com/p/why-airbnb-never-became-a-trillion
1•light_triad•3m ago•0 comments

Crux Ansata

https://gutenberg.net.au/ebooks13/1303501h.html
1•joebig•3m ago•0 comments

Show HN: Subtitle Finder – Find perfectly synced subtitles for your video files

https://subtitlefinder.com
1•lord5et•4m ago•0 comments

SF Telephone Directory (1905)

https://archive.org/details/sanfranciscotele1905paci_0
1•gardnr•6m ago•1 comments

Not a Drill

https://www.nytimes.com/2026/02/01/opinion/midterms-election-influence-trump.html
4•yodon•7m ago•1 comments

Gladys West, GPS pioneer and one of NASA's famed 'Hidden Figures,' dies at 95

https://www.space.com/space-exploration/gladys-west-gps-pioneer-and-one-of-nasas-famed-hidden-fig...
1•bookmtn•9m ago•0 comments

The Case for Greenland

https://www.campbellramble.ai/p/the-case-for-greenland
1•surprisetalk•10m ago•0 comments

Launching CivicMapper: Visualizing Land Values in 3D

https://progressandpoverty.substack.com/p/launching-civicmapper-visualizing
1•surprisetalk•10m ago•0 comments

The natural home for AI agents is your Reminders app

https://interconnected.org/home/2026/01/15/reminders
1•surprisetalk•10m ago•0 comments

Same Radio, Different Citizens

https://blog.cosmos-institute.org/p/same-radio-different-citizens
1•surprisetalk•10m ago•0 comments

Show HN: Bad Apple but it's boids simulation

https://badapple.priyavkaneria.com/
1•diginova•10m ago•0 comments

Triton Bespoke Layouts

https://www.lei.chat/posts/triton-bespoke-layouts/
1•matt_d•14m ago•0 comments

Match, Hinge, OkCupid, and Panera Bread breached by ransomware group

https://www.malwarebytes.com/blog/news/2026/01/match-hinge-okcupid-and-panera-bread-breached-by-r...
1•c420•15m ago•0 comments

Pretty soon, heat pumps will be able to store and distribute heat as needed

https://www.sintef.no/en/latest-news/2026/pretty-soon-heat-pumps-will-be-able-to-store-and-distri...
2•PaulHoule•16m ago•0 comments

When Software Engineers Don't Look at the Software

https://nicoritschel.com/writing/when-software-engineers-dont-look-at-the-software
1•nicoritschel•17m ago•0 comments

Physiology, Resting Potential

https://pubmed.ncbi.nlm.nih.gov/30855922/
1•rolph•19m ago•0 comments

Greenspun's Tenth Rule

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule
1•pinkmuffinere•21m ago•0 comments

GNU Hurd Is "Almost There" with x86_64, SMP and ~75% of Debian Packages Building

https://www.phoronix.com/news/GNU-Hurd-In-2026
2•sergiogdr•22m ago•0 comments

March for Billionaires

https://marchforbillionaires.org/
5•Frondo•22m ago•4 comments

The Gods Are Restless for the Original Fantasy Lit of Lord Dunsany

https://www.printmag.com/daily-heller/the-daily-heller-the-gods-are-restless-for-the-original-fan...
1•bryanrasmussen•24m ago•0 comments

The Gnome Village: Treads fight, gnomes cooperate (2025)

https://happihacking.com/blog/posts/2025/the-gnome-village/
1•rapnie•24m ago•0 comments

Show HN: I analyzed 6 years of Hacker News data and here's what I found

https://app.hex.tech/%22https://app.hex.tech/virtual-hackathon/app/Hacker-News-Demystified-032DXk...
1•Tusharmagar•24m ago•0 comments

Forget Technical Debt

https://www.ufried.com/blog/forget_technical_debt/
1•todsacerdoti•25m ago•0 comments

The Cassandra of 'The Machine'

https://www.thenewatlantis.com/publications/the-cassandra-of-the-machine
1•andsoitis•30m ago•0 comments

Is X Falling Apart? Inside the Latest Outage and What It Means

https://comuniq.xyz/post?t=778
3•01-_-•31m ago•0 comments

Towards a science of scaling agent systems: When and why agent systems work

https://research.google/blog/towards-a-science-of-scaling-agent-systems-when-and-why-agent-system...
1•gmays•31m ago•0 comments

Talos – Universal UI testing agent (works on any stack via Vision)

https://github.com/Talos-Tester-AI/Talos
1•alexst07•32m ago•1 comments

Why Tech (&) Media is complicated – Om

https://om.co/2026/02/01/why-tech-media-is-complicated/
2•MaysonL•32m ago•0 comments

Show HN: DeskTab – See the browser tabs like apps on a phone home screen

1•aiibe•32m ago•0 comments