The idea is: instead of scrolling archives, they just ask questions. Answers are pulled only from your original content, with citations.
It’s aimed at writers and researchers who want their work to be more discoverable — but without spinning up vector infra or fiddling with RAG pipelines.
For context: I’ve always gone back to Paul Graham’s essays for startup advice. But there’s no good way to search them semantically or contextually. So I tried indexing a few with Bookshelf.
Asked: “How does PG think about evaluating founders?” and got a clean answer sourced from Do Things That Don’t Scale and a couple other essays — citations included. It was surprisingly useful.
So far, one early test case is AnthropoceneGPT (https://sammatey.substack.com/p/introducing-anthropocenegpt) for Sam Matey’s newsletter. It’s seen ~100+ queries. Readers say it works like a smart librarian. He says it gives him ideas for what to write next.
Rough implementation: Input: HTML/PDF exports Chunks + embeds via OpenAI (or local) Stored in a vector DB Retrieval API is called by the custom GPT GPT is instructed to only use retrieved chunks and cite them Auth Option: for tracking on queries to give writers some telemetry
Here’s a demo GPT trained on Paul Graham’s archive: Paul Graham GPT (https://tinyurl.com/paul-graham-gpt)
Would love thoughts on: What would make this better for writers or readers? Any UX nits on the GPT side? Has anyone tried doing something similar in-house?
korgy•3h ago
sahilkat•2h ago