frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Epstein Files Smart Search – AI RAG pipeline, File explorer, Image gallery

https://search.epstein.ninja/
1•whatl3y•1h ago

Comments

whatl3y•1h ago
Smart search engine for the Epstein files from the DOJ built on a RAG pipeline that indexes and makes them searchable via natural language (i.e. LLM style). Documents (PDFs, HTML, images) go through text extraction with OCR as a fallback for scanned pages, then get sliced into ~500-token chunks with 50-token overlap. Each chunk gets a metadata prefix baked in (document title, source section) before being embedded into 1536-dimensional vectors through OpenAI's text-embedding-3-small. Those vectors live in PostgreSQL with pgvector, sitting behind an HNSW index with ef_search cranked to 400 (default is 40, which misses too much).

Queries hit the same embedding model, and the system pulls the top-K most similar chunks by cosine distance. There's a hybrid search mode too: it over-fetches 5x candidates from the vector index in parallel with keyword search (full-text search via a GIN-indexed tsvector column, falling back to trigram ILIKE when FTS returns few results). Results are merged using a slot reservation system: 60% of the final top-K comes from vector results ranked by cosine similarity, with up to 40% reserved for keyword-only matches that the vector search missed. Retrieved chunks get stuffed into a prompt with source metadata and sent to Claude Sonnet or GPT-4o with instructions to cite sources in bracket notation.

On the backend, pub-sub workers handle the indexing pipeline: text extraction, chunking, batch embedding in groups of 100, and firing off face detection through AWS Rekognition on images pulled from PDFs (very good in some cases, not so much in others). The query endpoint is free with some rate limiting, but also sits behind x402 micropayments ($0.50) that bypass rate limits when valid (it's not cheap to run these queries as of now). There's also an MCP server so AI agents can query directly as a tool.

Built with the help of Claude, so some of the tech (RAG via LLM, pgvector, etc.) is newish to me. Was a fun exercise!

Venezuelan oil exports to Israel resume after 6-year gap

https://www.bloomberg.com/news/articles/2026-02-10/venezuela-sending-first-crude-oil-cargo-to-isr...
1•OgsyedIE•37s ago•0 comments

Secretary of War Hegseth Announces End of Support for Harvard University [video]

https://www.youtube.com/watch?v=eh5duiL3MwQ
2•nomilk•2m ago•0 comments

Vibrant Frog Collab – AI-Powered Writing Assistant

https://frogteam.ai/VibrantFrog/default.html
1•am-piazza•3m ago•0 comments

Show HN: Track and analyze AI coding tool usage across your team

https://trackr-bay.vercel.app/welcome
1•usmansidd•6m ago•0 comments

(Rust) Tracking Issue for Generic Constant Arguments MVP

https://github.com/rust-lang/rust/issues/132980
1•anfilt•7m ago•1 comments

How many of the 3,191 billionaires can you name?

https://billionaires.linolevan.com/
1•linolevan•12m ago•1 comments

mrdoob Ported Quake to JavaScript/Three.js

https://mrdoob.github.io/three-quake/
1•davidbarker•12m ago•0 comments

Russians supplied with new satellite internet terminals after Starlink blackout

https://www.pravda.com.ua/eng/news/2026/02/09/8020199/
1•c420•15m ago•0 comments

Block a website in specific countries using Nginx

https://shashanksrivastava.medium.com/block-a-website-in-specific-countries-using-nginx-20a651288795
1•kamaraju•15m ago•0 comments

I am building virtual Bash

https://github.com/everruns/bashkit
1•chalyi•16m ago•1 comments

Show HN: Fyno – Automate repetitive bookkeeping tasks

https://www.meetfyno.com
1•alicele27•16m ago•0 comments

I'm building a clarity-first language (compiles to C++)

https://github.com/taman-islam/rox
1•hedayet•17m ago•0 comments

Spec-Driven Development with Claude Code

https://www.braingrid.ai/blog/building-braingrid-with-braingrid
2•acossta•17m ago•1 comments

Google Nest camera video raises privacy questions

https://www.mynbc5.com/article/nancy-guthrie-fbi-nest-camera-video-raises-privacy-questions/70306538
1•1vuio0pswjnm7•18m ago•0 comments

The First Person Project: How to prove you are a real person online

https://www.firstperson.network/white-paper
1•walterbell•23m ago•0 comments

AI-Driven Low-Fi Prototyping with Balsamiq Cloud

https://balsamiq.com/blog/low-fidelity-prototyping/
1•ilt•23m ago•0 comments

SAIR Foundation

https://sair.foundation/
1•nsoonhui•24m ago•0 comments

Linux 7.0 Review: Major Performance, GPU, CPU, and Networking Upgrades

https://www.youtube.com/watch?v=3s37rDlIemI
1•cable2600•24m ago•0 comments

Show HN: Yan – Glitch Art Photo/Live Editor

https://yan.yichenlab.com/
1•xcc3641•28m ago•0 comments

A simpler way to remove explicit images from Search

https://blog.google/products-and-platforms/products/search/remove-explicit-images/
1•gnabgib•32m ago•0 comments

We're all called Julia, or maybe ChatGPT calls itself Julia

https://solresol.substack.com/p/were-all-called-julia-or-maybe-chatgpt
2•solresol•35m ago•1 comments

5,300-year-old 'bow drill' rewrites story of ancient Egyptian tools

https://www.ncl.ac.uk/press/articles/latest/2026/02/ancientegyptiandrillbit/
4•geox•35m ago•0 comments

Search the public domain through image embeddings

https://faenum.com
1•jlauf•38m ago•0 comments

Beautiful iOS SSH Terminal with GPU Acceleration

https://github.com/eriklangille/clauntty
1•dnw•38m ago•0 comments

Wall Street's anything-but-tech trade shakes up US stock market

https://www.ft.com/content/577b97f6-2416-48b9-9bd3-717bb202ca71
1•petethomas•39m ago•0 comments

Show HN: Obsidian Visual Skills – Generate Canvas, Excalidraw, Mermaid from Text

https://github.com/axtonliu/axton-obsidian-visual-skills
1•axtonliu•41m ago•0 comments

It's Time to Rage Against the AI Music Machine

https://time.com/7338205/rage-against-ai-generated-music/
1•cdrnsf•42m ago•1 comments

Show HN: Askill – A package manager for AI agent skills with AI safety scoring

https://github.com/avibe-bot/askill
1•alex_metacraft•43m ago•1 comments

AI is now a magic decompiler

https://stephenjayakar.com/posts/magic-decomp/
1•stephenjayakar•44m ago•0 comments

Lolong: Largest crocodile ever held in captivity

https://en.wikipedia.org/wiki/Lolong
2•teleforce•48m ago•0 comments