To succeed on Hacker News (HN), you have to completely drop the "marketing" and "YouTube hook" tone. The HN community heavily downvotes clickbait, sensationalism, and marketing fluff.
They love "Show HN" posts, open-source projects, CLI tools, local LLMs, and clever technical solutions to messy data problems (like parsing poorly scanned government PDFs).
Here are the best titles and the exact description (to use either as a text post or your first comment) tailored specifically for the Hacker News audience.
The Hacker News Titles
Choose one of these. On HN, titles should be strictly factual, descriptive, and avoid emojis.
Option 1 (The Classic HN Format - Recommended):
Show HN: epstein-search – A CLI to query the unsealed court files with local LLMs
Option 2 (Focus on the tech pipeline):
Show HN: I built a local RAG CLI to make the Epstein PDFs searchable
Option 3 (Straight to the point):
Show HN: epstein-search – Query the Epstein document dumps offline via CLI
The Hacker News Description (First Comment or Text Body)
If you submit the GitHub URL directly, immediately post this as the first comment. If you submit a text post, put this in the body. Keep the tone humble, technical, and open to feedback.
Hi HN,
When the Epstein court documents and flight logs were unsealed, they were released the way most legal drops are: thousands of pages of messy, poorly scanned, unsearchable PDFs. Standard Ctrl+F doesn't work well due to OCR errors, and the sheer volume makes manual parsing a nightmare.
To solve this, I built epstein-search, an open-source Python CLI tool that lets you search and synthesize the documents using a Retrieval-Augmented Generation (RAG) pipeline directly in your terminal.
How it works:
It parses and chunks the original unsealed PDF files.
You can run queries against the dataset using API-based models (OpenAI/Anthropic) if you want speed.
Privacy-first: If you don't want your queries logged by a third-party API, you can point it directly to a local model (via Ollama or Llama.cpp) to run the entire search and retrieval process 100% offline.
The goal was to make this data accessible to researchers and OSINT investigators without requiring them to manually read thousands of pages of court dockets or hand over their search queries to OpenAI.
Repo is here:
https://github.com/simulationship/epstein-search