frontpage.

Most APIs don’t return actual content. You get metadata, maybe an abstract, maybe a snippet...never the thing itself. And if you want proper sources like arXiv, PubMed, or major publishers? Good luck. You’re stuck scraping tens of millions PDFs or semantic scholar and building your own ingestion pipeline.

We hit this building agentic workflows and RAG backends. What we needed wasn’t “search”, it was a way to retrieve real, structured full text with enough metadata to plug straight into a reasoning system. So we built a system that could do that: multimodal inputs (text, math, figures), clean citations, reference chaining, and filters that work (by date, by source, etc).

The hard part wasn’t retrieval but preprocessing at scale. Figuring out how to analyse, chunk, structure tens of millions of docs without taking months or breaking the bank. Not to mention dealing with licensed content where formats vary wildly or building retrieval systems at this scale.

Still a work in progress with more updates on the way. But miles better than duct-taping together PDFs, AI search engines etc. and hoping to find the relevant context you need.

Anthropic's team cut ad creation time from 30 minutes to 30 seconds

Show HN: Elysia JIT "Compiler", why it's one of the fastest JavaScript framework

Cache Monet

Chinese Propaganda in Infomaniak's Euria, and a Reflection on Open Source AI

Show HN: A free, browser-only PDF tools collection built with Kimi k2.5

Curating a Show on My Ineffable Mother, Ursula K. Le Guin

Show HN: HackerStack.dev – 49 Curated AI Tools for Indie Hackers

Pensions Are a Ponzi Scheme

Divvy.club – Splitwise alternative that makes sense

Betterment data breach exposes 1.4M customers

MIT Technology Review has confirmed that posts on Moltbook were fake

Epstein Science: the people Epstein discussed scientific topics with

Bambuddy – a free, self-hosted management system for Bambu Lab printers

Every Failed M4 Gun Replacement Attempt

China ramps up energy boom flagged by Musk as key to AI race

Show HN: ClawBox – Dedicated OpenClaw Hardware (Jetson Orin Nano, 67 Tops, 20W)

Ask HN: AI never gets flustered, will that make us better as people or worse?

Show HN: HalalCodeCheck – Verify food ingredients offline

Student makes cosmic dust in a lab, shining a light on the origin of life

In the Australian outback, we're listening for nuclear tests

'Hermès orange' iPhone sparks Apple comeback in China

Show HN: Goxe 19k Logs/S on an I5

The async builder pattern in Rust

(Golang) Self referential functions and the design of options

Show HN: Model Training Memory Simulator

Claude Code Controller

Software design is now cheap

Show HN: Are You Random? – A game that predicts your "random" choices

Poland to probe possible links between Epstein and Russia

Effectiveness of AI detection tools in identifying AI-generated articles