frontpage.

Show HN: OpenFable – Open-source RAG engine using tree-structured indexes

https://github.com/alainbrown/openfable

1•alainbrown•1h ago

Hi HN, I built OpenFable, an open-source retrieval engine that implements the FABLE algorithm (https://arxiv.org/abs/2601.18116) for RAG pipelines. I'm using it in another project and thought that others might benefit.

  Most RAG systems chunk documents into flat segments and retrieve by vector similarity. This works  
  for simple lookups but breaks when answers span multiple sections, when relevant content is buried
  in a subsection, or when you need to control how many tokens you're sending to an LLM.             
                                                                                                   
  OpenFable takes a different approach: when you ingest a document, it uses an LLM to identify       
  discourse boundaries (not fixed-size windows), then builds a hierarchical tree, root, sections,
  subsections, leaf chunks, with embeddings at every level. Retrieval combines two paths:           
                                                                                                   
  1. LLM-guided path: the LLM reasons about which documents and subtrees are relevant from summaries
  2. Vector path: similarity search with structure-aware score propagation through the tree
                                                                                                     
  Results from both paths are fused, deduplicated, and trimmed to fit a token budget you specify. You
   get the most relevant chunks, in document order, within budget.                                   
                                                                                                     
  From the FABLE paper: the algorithm matches full-context inference (517K tokens) using only 31K    
  tokens, 94% reduction, while hitting 92% completeness vs. Gemini-2.5-Pro at 91% with the full
  document.                                                                                          
                                                                                                   
  Retrieval only; OpenFable returns ranked chunks, not generated answers. Bring your own LLM for    
  generation.
                                                                                                     
  It runs as a Docker stack (FastAPI + PostgreSQL/pgvector) and exposes both a REST API and an MCP   
  server, so LLM agents like Claude Desktop or Cursor can use it directly.
                                                                                                     
  Trade-offs I want to be upfront about:                                                           
  - Ingestion is expensive; every document requires multiple LLM calls for chunking and tree
  construction                                                                                       
  - Retrieval isn't sub-second, the LLM-guided paths add round-trips
  - No built-in auth; designed to sit behind a reverse proxy                                        
  - v0.1.0 — works end to end but the roadmap includes async ingestion, document deletion, and       
  metadata filtering                                                                                 
                                                                                                     
  Stack: Python 3.12, FastAPI, SQLAlchemy, pgvector, LiteLLM, fastMCP. Apache 2.0.                   
                                                                                                     
  Happy to answer questions about the algorithm, implementation choices, or benchmarks.

Cogito: Beautiful AI Markdown Editor for Mac

A rigorous .md specification for AI Daemons

Dario's Weird Race to the Top

Espressif's New ESP32-S31: Dual-Core RISC-V with WiFi 6 and Gbit Ethernet

Show HN: BAREmail ʕ·ᴥ·ʔ – minimalist Gmail client for bad WiFi

This method to reverse cellular ageing is about to be tested in humans

The depths of Neptune and Uranus may be "superionic"

When Agents Have Wallets

He-united-states-is-rewriting-itself

Decentralized AI in 50 Lines of Python

Book Review: Tomorrow, and Tomorrow, and Tomorrow

Every Democrat Who Enabled Trump's Crypto Corruption

Show HN: IDWIW – a YouTube viewer to avoid algorithm traps

"The Talk" by Scott Aaronson and Zach Weinersmith

The Thirty Years' War Is Starting Again

Decoding the MySQL Binary Log: Table_map_event – Table Metadata for RBR

How to Choose the Best AI for Accountants

Show HN: Hoeren – Local-only meeting transcription and voice dictation

Chinese electric truck maker Windrose makes first US delivery

Show HN: Go-Bt: Minimalist Behavior Trees for Go

Neon Vector Animation

BlueHammer – Windows 0day LPE

The AI coding agent is a new kind of contributor, and Git isn't made for it

Adam Jacob – Communication Breakdown

Jitter – Prove human authorship in Google Docs without sharing content

Show HN: Real-time deepfake in the browser, no GPU needed

Private AI Document Analysis (even in the browser)

WordTrail – Daily Word Puzzle

Stellar Broadcast colony ship roguelike with on-device neural net planet naming

How much Brits worry about the automation of their work