frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
233•theblazehen•2d ago•68 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
694•klaussilveira•15h ago•206 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
6•AlexeyBrin•1h ago•0 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
962•xnx•20h ago•555 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
130•matheusalmeida•2d ago•35 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
67•videotopia•4d ago•6 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
54•jesperordrup•5h ago•24 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
36•kaonwarb•3d ago•27 comments

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
10•matt_d•3d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
236•isitcontent•15h ago•26 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
233•dmpetrov•16h ago•124 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
32•speckx•3d ago•21 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
10•__natty__•3h ago•0 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
335•vecti•17h ago•147 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
502•todsacerdoti•23h ago•244 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
386•ostacke•21h ago•97 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
300•eljojo•18h ago•186 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
361•aktau•22h ago•185 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
425•lstoll•21h ago•282 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
68•kmm•5d ago•10 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
96•quibono•4d ago•22 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
21•bikenaga•3d ago•11 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
19•1vuio0pswjnm7•1h ago•5 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
264•i5heu•18h ago•216 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
33•romes•4d ago•3 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
64•gfortaine•13h ago•28 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1076•cdrnsf•1d ago•460 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
39•gmays•10h ago•13 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
298•surprisetalk•3d ago•44 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
154•vmatsiiako•20h ago•72 comments
Open in hackernews

Show HN: I compressed 10k PDFs into a 1.4GB video for LLM memory

https://github.com/Olow304/memvid
61•saleban1031•8mo ago
While building a Retrieval-Augmented Generation (RAG) system, I was frustrated by my vector database consuming 8GB RAM just to search my own PDFs. After incurring $150 in cloud costs, I had an unconventional idea: what if I encoded my documents into video frames?

The concept sounded absurd—storing text in video? But modern video codecs have been optimized for compression over decades. So, I converted text into QR codes, then encoded those as video frames, letting H.264/H.265 handle the compression.

The results were surprising. 10,000 PDFs compressed down to a 1.4GB video file. Search latency was around 900ms compared to Pinecone’s 820ms—about 10% slower. However, RAM usage dropped from over 8GB to just 200MB, and it operates entirely offline without API keys or monthly fees.

Technically, each document chunk is encoded into QR codes, which become video frames. Video compression handles redundancy between similar documents effectively. Search works by decoding relevant frame ranges based on a lightweight index.

You get a vector database that’s just a video file you can copy anywhere.

GitHub: https://github.com/Olow304/memvid

Comments

copperx•8mo ago
Why does this work so well?
tux3•8mo ago
It does not. It's an indictment of the vector database working so poorly than even deliberately trying to make up something ridiculously inefficient (encoding PDFs as QR codes as H.264 video) is somehow comparable.

It's possible to be less efficient, but it takes real creativity. You could print out the QR codes and scan them again, or encode the QR codes in the waveform of an MP3 and take a video of that.

It's really, really bad.

jonplackett•8mo ago
I feel like this could be a new fun competition though. Like the Japanese art of un-useless inventions.

https://en.m.wikipedia.org/wiki/Chind%C5%8Dgu

jonplackett•8mo ago
How big were the original PDFs? Are they just text or images and other formatting too?
userbinator•8mo ago
If they were less than 140k on average, then this isn't "compression" but "lossy expansion".
Scaevolus•8mo ago
This is an extremely bad method of storing text data. Video codecs are not particularly efficient at compressing QR codes, given the high contrast between the blocks defeating the traditional DCT psychovisual assumptions of smooth gradients. There is little to no redundancy between QR code encodings of similar text.

You'd probably have a smaller database and better results crunching text into a zip file, or compressed rows in a sqlite database, or any other simple random-access format.

mdp2021•8mo ago
I'd say it be bewildering if there were not a more efficient way to store text for the purpose in context, than "QR codes in compressed video frames".

The vector database previously used must have been very inefficient.

duskwuff•8mo ago
> The vector database previously used must have been very inefficient.

Especially if it was taking ~800 ms to do a search. At that speed, you'd probably be better off storing the documents as plain text, without the whole inefficient QR/H264 round-trip.

WhyIsItAlwaysHN•8mo ago
Is this more efficient than putting all of that in say a 7z archive?

I'd expect video frames to be maximally efficient if you sorted the chunks by image similarity somehow.

Also isn't there a risk of losing data by doing this since for example h.265 is lossy?

chatmasta•8mo ago
h.265 is lossy but QR codes are redundant
WhyIsItAlwaysHN•8mo ago
Is the probability of lost data zero across eg. millions of documents?

I see there's a 30% redundancy per document, but I'm not sure every frame in a h265 file is guaranteed to have more than 70% of a qr code being readable. And if it's not readable, then that could mean losing an entire chunk of data.

I'd definitely calculate the probability of losing data if storing text with a lossy compression.

captainregex•8mo ago
Why not just do it locally? Or were the RAM consumption and the cloud cost comments distinct?
rafram•8mo ago
> The results were surprising. 10,000 PDFs compressed down to a 1.4GB video file.

And how big was the total text in those PDFs?

duskwuff•8mo ago
> Video compression handles redundancy between similar documents effectively.

Definitely not. None of the "redundancy" between, or within, texts (e.g. repeated phrases) is apparent in a sequence of images of QR codes.

mrkeen•8mo ago
Cut the cloud vendors out of the picture and build and query your index on a spare linux box.

I've only played with TF-IDF/BM25 as opposed to vector searches, but there's no way your queries should be taking so long on such a small corpus. Querying 10k documents feels like 2-10ms territory, not 900ms.

xnx•8mo ago
April Fools?
outofpaper•8mo ago
June... close enough, just like this and vector DBs o/
kgeist•8mo ago
900 ms sounds like a lot for just 10,000 documents? How many chunks are there per document? Maybe Pinecone's 820 ms includes network latency plus they need to serve other users?

In Go, I once implemented a naive brute-force cosine search (linear scan in memory), and for 1 million 350-dimensional vectors, I got results in under 1 second too IIRC.

I ended up just setting up OpenSearch, which gives you hybrid semantic + full-text search out of the box (BM25 + kNN). In my tests, it gave better results than semantic search alone, something like +15% better retrieval.

jeffcatz•8mo ago
I’m not sure why this is getting so much hate. This could be groundbreaking.
Ayushmishra23•8mo ago
can you provide that mp4
mpaepper•8mo ago
Nope... It IS using a vector database / faiss embedings for the semantic search.

Check the indexer: https://github.com/Olow304/memvid/blob/main/memvid/index.py

kgeist•8mo ago
Indeed, they use FAISS for vector search, and the actual innovation is to store PDFs in a MP4 file. I think they should compare this to just compressing PDFs using ZIP, 7z, or RAR to see if it actually makes things better.
maowtm•8mo ago
what