Show HN: A Vectorless LLM-Native Document Index Method

https://github.com/VectifyAI/pageindex-mcp

14•mingtianzhang•4mo ago

The word "index" originally came from how humans retrieve info: book indexes and tables of contents that guide us to the right place in documents.

Computers later borrowed the term for data structures: e.g., B-trees, hash tables, and more recently, vector indexes. They are highly efficient for machines; but abstract and unnatural: not something a human, or an LLM, can understand and directly use as a reasoning aid. This creates a gap between how indexes work for computers and how they should work for models that reason like humans.

PageIndex is a new step that "looks back to move forward". It revives the original, human-oriented idea of an index and adapts it for LLMs. Now the index itself (PageIndex) lives inside the LLM's context window: the model sees a hierarchical table-of-contents tree and reasons its way down to the right span, much like a person would retrieve information using a book's index.

PageIndex MCP shows how this works in practice: it runs as a MCP server, exposing a document's structure directly to LLMs/Agents. This means platforms like Claude, Cursor, or any MCP-enabled agent or LLM can navigate the index themselves and reason their way through documents, not with vectors/chunking, but in a human-like, reasoning-based way.

Comments

avereveard•4mo ago

What happen when the TOC is too long? How does the index handles near misses? How do you disambiguate between close titles? What happens if the documents are not in a strict hierarchy?

Seems very situational.

mingtianzhang•4mo ago

Hi, thanks for your inspiring questions.

1. What happens when the TOC is too long? -- This is why we choose the tree structure. If the ToC is too long, it will do a hierarchy search, which means search over the father level nodes first and then select one node, and then search its child nodes.

2. How does the index handle near misses, and how do you disambiguate between close titles? For each node, we generate a description or summary to give more information rather than just titles.

3. For documents that are not in a hierarchy, it will just become a list structure, which you can still look through.

We also write down how it can combine with a reasoning process and give some comparisons to Vector DB, see https://vectifyai.notion.site/PageIndex-for-Reasoning-Based-....

We found our MCP service works well in general financial/legal/textbook/research paper cases, see https://pageindex.ai/mcp for some examples.

We do agree in some cases, like recommendation systems, you need semantic similarity and Vector DB, so I wouldn't recommend this approach. Keen to learn more cases that we haven't thought through!

avereveard•4mo ago

thanks!

The Rise of Spec Driven Development

The first good Raspberry Pi Laptop

Seas to Rise Around the World – But Not in Greenland

Will Future Generations Think We're Gross?

State Department will delete Xitter posts from before Trump returned to office

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?

Hacking up your own shell completion (2020)

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

GLM-OCR: Accurate × Fast × Comprehensive

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

Show HN: AboutMyProject – A public log for developer proof-of-work

Expertise, AI and Work of Future [video]

So Long to Cheap Books You Could Fit in Your Pocket

PID Controller

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

Kubernetes MCP Server

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales