frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: 22 GB of Hacker News in SQLite

https://hackerbook.dosaygo.com
219•keepamovin•5h ago•69 comments

Show HN: Claude Cognitive – Working memory for Claude Code

https://github.com/GMaN1911/claude-cognitive
4•MirrorEthic•17m ago•1 comments

Show HN: I remade my website in the Sith Lord Theme and I hope it's fun

https://cookie.engineer/index.html
25•cookiengineer•4h ago•12 comments

Show HN: Tidy Baby is a SET game but with words

https://tidy.baby
25•brgross•6h ago•6 comments

Show HN: One clean, developer-focused page for every Unicode symbol

https://fontgenerator.design/symbols
159•yarlinghe•5d ago•62 comments

Show HN: A Claude Code plugin that catch destructive Git and filesystem commands

https://github.com/kenryu42/claude-code-safety-net
51•kenryu•4d ago•55 comments

Show HN: Brainrot Translator – Convert corporate speak to Gen Alpha and back

https://brainrottranslator.com
12•todaycompanies•5h ago•2 comments

Show HN: Replacing my OS process scheduler with an LLM

https://github.com/mprajyothreddy/brainkernel
13•ImPrajyoth•5h ago•6 comments

Show HN: Stop Claude Code from forgetting everything

https://github.com/mutable-state-inc/ensue-skill
180•austinbaggio•1d ago•212 comments

Show HN: Superset – Terminal to run 10 parallel coding agents

https://superset.sh/
93•avipeltz•1w ago•85 comments

Show HN: A 45x45 Connections Puzzle To Commemorate 2025=45*45

https://thomaswc.com/2025.html
72•thomaswc•6d ago•22 comments

Show HN: Slide notes visible only to you during screen sharing

https://cuecard.dev
2•thisisnsh•4h ago•0 comments

Show HN: Aroma: Every TCP Proxy Is Detectable with RTT Fingerprinting

https://github.com/Sakura-sx/Aroma
80•Sakura-sx•5d ago•49 comments

Show HN: See what readers who loved your favorite book/author also loved to read

https://shepherd.com/bboy/2025
125•bwb•1d ago•37 comments

Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB

https://github.com/HarryR/z80ai
490•quesomaster9000•1d ago•114 comments

Show HN: Euclidle – Guess the Coordinates in N‑Dimensional Space

https://euclidle.com/
17•bills-appworks•4d ago•4 comments

Show HN: My not-for-profit search engine with no ads, no AI, & all DDG bangs

https://nilch.org
191•UnmappedStack•1d ago•73 comments

Show HN: Cck ClaudeCode file change tracking and auto Claude.md

4•takawasi•6h ago•0 comments

Show HN: MCP Mesh – one endpoint for all your MCP servers (OSS self-hosted)

https://github.com/decocms/mesh
6•gadr90•6h ago•0 comments

Show HN: Financial calculators with no tracking, no signup, no email gates

https://www.financialaha.com/financial-calculators/
3•stefanneculai•6h ago•0 comments

Show HN: Flipper Zero MCP – Control Your Flipper Using AI via USB or WiFi

https://github.com/busse/flipperzero-mcp
2•busseio•6h ago•0 comments

Show HN: Minimum Viable Parents (MVP)

https://yaz.zone/essays/mvp
3•plawlost•6h ago•0 comments

Show HN: Tetris Time

https://tetris-time.koenvangilst.nl/?mode=countdown&to=2026-01-01T00:00:00.000Z&speed=3
8•vnglst•12h ago•3 comments

Show HN: Lazy-image – Node.js image library with static binaries (Rust/NAPI)

https://github.com/albert-einshutoin/lazy-image
4•einshutoin•13h ago•1 comments

Show HN: Paper Tray – dramatically better file organization for Google Drive

https://www.papertray.ai/
2•affine_variety•9h ago•0 comments

Show HN: Spacelist, a TUI for Aerospace window manager

https://github.com/magicmark/spacelist
41•markl42•3d ago•6 comments

Show HN: Per-instance TSP Solver with No Pre-training (1.66% gap on d1291)

18•jivaprime•1d ago•3 comments

Show HN: Evidex – AI Clinical Search (RAG over PubMed/OpenAlex and SOAP Notes)

https://www.getevidex.com
36•amber_raza•1d ago•33 comments

Show HN: Vibe coding a bookshelf with Claude Code

https://balajmarius.com/writings/vibe-coding-a-bookshelf-with-claude-code/
276•balajmarius•1d ago•206 comments

Show HN: Make 67 – a tiny maths game

https://simondarcyonline.com/67/
4•sidarcy•12h ago•0 comments
Open in hackernews

Show HN: 22 GB of Hacker News in SQLite

https://hackerbook.dosaygo.com
214•keepamovin•5h ago
Community, All the HN belong to you. This is an archive of hacker news that fits in your browser. When I made HN Made of Primes I realized I could probably do this offline sqlite/wasm thing with the whole GBs of archive. The whole dataset. So I tried it, and this is it. Have Hacker News on your device.

Go to this repo (https://github.com/DOSAYGO-STUDIO/HackerBook): you can download it. Big Query -> ETL -> npx serve docs - that's it. 20 years of HN arguments and beauty, can be yours forever. So they'll never die. Ever. It's the unkillable static archive of HN and it's your hands. That's my Year End gift to you all. Thank you for a wonderful year, have happy and wonderful 2026. make something of it.

Comments

asdefghyk•5h ago
How much space is needed? ...for the data .... Im wondering if it would work on a tablet? ....
keepamovin•4h ago
~9GB gzipped.
carbocation•4h ago
That repo is throwing up a 404 for me.

Question - did you consider tradeoffs between duckdb (or other columnar stores) and SQLite?

keepamovin•3h ago
No, I just went straight to sqlite. What is duckdb?
cess11•3h ago
It is very similar to SQLite in that it can run in-process and store its data as a file.

It's different in that it is tailored to analytics, among other things storage is columnar, and it can run off some common data analytics file formats.

fsiefken•3h ago
DuckDB is an open-source column-oriented Relational Database Management System (RDBMS). It's designed to provide high performance on complex queries against large databases in embedded configuration.

It has transparent compression built-in and has support for natural language queries. https://buckenhofer.com/2025/11/agentic-ai-with-duckdb-and-s...

"DICT FSST (Dictionary FSST) represents a hybrid compression technique that combines the benefits of Dictionary Encoding with the string-level compression capabilities of FSST. This approach was implemented and integrated into DuckDB as part of ongoing efforts to optimize string storage and processing performance." https://homepages.cwi.nl/~boncz/msc/2025-YanLannaAlexandre.p...

simonw•3h ago
One interesting feature of DuckDB is that it can run queries against HTTP ranges of a static file hosted via HTTPS, and there's an official WebAssembly build of it that can do that same trick.

So you can dump e.g. all of Hacker News in a single multi-GB Parquet file somewhere and build a client-side JavaScript application that can run queries against that without having to fetch the whole thing.

You can run searches on https://lil.law.harvard.edu/data-gov-archive/ and watch the network panel to see DuckDB in action.

linhns•3h ago
Not the author here. I’m not sure about DuckDB, but SQLite allows you to simply use a file as a database and for archiving, it’s really helpful. One file, that’s it.
cobolcomesback•3h ago
DuckDB does as well. A super simplified explanation of duckdb is that it’s sqlite but columnar, and so is better for analytics of large datasets.
formerly_proven•3h ago
The schema is this: items(id INTEGER PRIMARY KEY, type TEXT, time INTEGER, by TEXT, title TEXT, text TEXT, url TEXT

Doesn't scream columnar database to me.

embedding-shape•3h ago
At a glance, that is missing (at least) a `parent` or `parent_id` attribute which items in HN can have (and you kind of need if you want to render comments), see http://hn.algolia.com/api/v1/items/46436741
agolliver•3h ago
Edges are a separate table
3eb7988a1663•3h ago
While I suspect DuckDB would compress better, given the ubiquity of SQLite, it seems a fine standard choice.
jacquesm•15m ago
Maybe it got nuked by MS? The rest of their repo's are up.
wslh•4h ago
Is this updated regularly? 404 on GitHub as the other comment.

With all due respect it would be great if there is an official HN public dump available (and not requiring stuff such as BigQuery which is expensive).

zX41ZdbW•3h ago
The query tab looks quite complex with all these content shards: https://hackerbook.dosaygo.com/?view=query

I have a much simpler database: https://play.clickhouse.com/play?user=play#U0VMRUNUIHRpbWUsI...

embedding-shape•3h ago
Does your database also runs offline/locally in the browser? Seems to be the reason for the large number of shards.
yupyupyups•3h ago
1 hour passed and it's already nuked?

Thank you btw

abixb•3h ago
Wonder if you could turn this into a .zim file for offline browsing with an offline browser like Kiwix, etc. [0]

I've been taking frequent "offline-only-day" breaks to consolidate whatever I've been learning, and Kiwix has been a great tool for reference (offline Wikipedia, StackOverflow and whatnot).

[0] https://kiwix.org/en/the-new-kiwix-library-is-available/

Barbing•2h ago
Oh this should TOTALLY be available to those who are scrolling through sources on the Kiwix app!
Paul-E•3h ago
That's pretty neat!

I did something similar. I build a tool[1] to import the Project Arctic Shift dumps[2] of reddit into sqlite. It was mostly an exercise to experiment with Rust and SQLite (HN's two favorite topics). If you don't build a FTS5 index and import without WAL (--unsafe-mode), import of every reddit comment and submission takes a bit over 24 hours and produces a ~10TB DB.

SQLite offers a lot of cool json features that would let you store the raw json and operate on that, but I eschewed them in favor of parsing only once at load time. THat also lets me normalize the data a bit.

I find that building the DB is pretty "fast", but queries run much faster if I immediately vacuum the DB after building it. The vacuum operation is actually slower than the original import, taking a few days to finish.

[1] https://github.com/Paul-E/Pushshift-Importer

[2] https://github.com/ArthurHeitmann/arctic_shift/blob/master/d...

s_ting765•2h ago
You could check out SQLite's auto_vacuum which reclaims space without rebuilding the entire db https://sqlite.org/pragma.html#pragma_auto_vacuum
fao_•3h ago
> Community, All the HN belong to you. This is an archive of hacker news that fits in your browser.

> 20 years of HN arguments and beauty, can be yours forever. So they'll never die. Ever. It's the unkillable static archive of HN and it's your hands

I'm really sorry to have to ask this, but this really feels like you had an LLM write it?

rantingdemon•3h ago
Why do you say that?
sundarurfriend•3h ago
Because anything that even slightly differs from the standard American phrasing of something must be "LLM generated" these days.
JavGull•3h ago
With the em dashes I see you. But at this point idrc so long as it reads well. Everyone uses spell check…
naikrovek•2h ago
I add em dashes to everything I write now, solely to throw people who look for them off. Lots of editors add them automatically when you have two sequential dashes between words — a common occurrence, like that one. And this is is Chrome on iOS doing it automatically.

Ooh, I used “sequential”, ooh, I used an em dash. ZOMG AI IS COMING FOR US ALL

Barbing•2h ago
Ya—in fact, globally replaced on iOS (sent from Safari)

Also for reference: “this shortcut can be toggled using the switch labeled 'Smart Punctuation' in General > Keyboard settings.”

deadbabe•2h ago
Sometimes I want to write more creatively, but then worry I’ll be accused of being an LLM. So I dumb it down. Remove the colorful language. Conform.
walthamstow•3h ago
There's a thing in soccer at the moment where a tackle looks fine in realtime but when the video referee shows it to the onpitch referee, they show the impact in slo-mo over and over again and it always looks way worse.

I wonder if there's something like this going on here. I never thought it was LLM on first read, and I still don't, but when you take snippets and point at them it makes me think maybe they are

naikrovek•2h ago
> I'm really sorry to have to ask this, but this really feels like you had an LLM write it?

Ending a sentence with a question mark doesn’t automatically make your sentence a question. You didn’t ask anything. You stated an opinion and followed it with a question mark.

If you intended to ask if the text was written by AI, no, you don’t have to ask that.

I am so damn tired of the “that didn’t happen” and the “AI did that” people when there is zero evidence of either being true.

These people are the most exhausting people I have ever encountered in my entire life.

jacquesm•18m ago
You're right. Unfortunately they are also more and more often right.
jesprenj•2h ago
I doubt it. "hacker news" spelled lowercase? comma after "beauty"? missing "in" after "it's"? i doubt an LLM would make such syntax mistakes. it's just good writing, that's also possible these days.
Insanity•1h ago
Even if so, would it have mattered? The point is showing off the SQLite DB.

But it didn’t read LLM generated IMO.

simonw•3h ago
Don't miss how this works. It's not a server-side application - this code runs entirely in your browser using SQLite compiled to WASM, but rather than fetching a full 22GB database it instead uses a clever hack that retrieves just "shards" of the SQLite database needed for the page you are viewing.

I watched it in the browser network panel and saw it fetch:

  https://hackerbook.dosaygo.com/static-shards/shard_1636.sqlite.gz
  https://hackerbook.dosaygo.com/static-shards/shard_1635.sqlite.gz
  https://hackerbook.dosaygo.com/static-shards/shard_1634.sqlite.gz
As I paginated to previous days.

It's reminiscent of that brilliant SQLite.js VFS trick from a few years ago: https://github.com/phiresky/sql.js-httpvfs - only that one used HTTP range headers, this one uses sharded files instead.

The interactive SQL query interface at https://hackerbook.dosaygo.com/?view=query asks you to select which shards to run the query against, there are 1636 total.

tehlike•2h ago
Vfs support is amazing.
nextaccountic•1h ago
Is there anything more production grade built around the same idea of HTTP range requests like that sqlite thing? This has so much potential
simonw•1h ago
There was a UK government GitHub repo that did something interesting with this kind of trick against S3 but I checked just now and the repo is a 404. Here are my notes about what it did: https://simonwillison.net/2025/Feb/7/sqlite-s3vfs/

Looks like it's still on PyPI though: https://pypi.org/project/sqlite-s3vfs/

You can see inside it with my PyPI package explorer: https://tools.simonwillison.net/zip-wheel-explorer?package=s...

simonw•14m ago
I recovered it from https://archive.softwareheritage.org/browse/origin/directory... and pushed a fresh copy to GitHub here:

https://github.com/simonw/sqlite-s3vfs

This comment was helpful in figuring out how to get a full Git clone out of the heritage archive: https://news.ycombinator.com/item?id=37516523#37517378

ericd•1h ago
This is somewhat related to a large dataset browsing service a friend and I worked on a while back - we made index files, and the browser ran a lightweight query planner to fetch static chunks which could be served from S3/torrents/whatever. It worked pretty well, and I think there’s a lot of potential for this style of data serving infra.
Humphrey•20m ago
Yes — PMTiles is exactly that: a production-ready, single-file, static container for vector tiles built around HTTP range requests.

I’ve used it in production to self-host Australia-only maps on S3. We generated a single ~900 MB PMTiles file from OpenStreetMap (Australia only, up to Z14) and uploaded it to S3. Clients then fetch just the required byte ranges for each vector tile via HTTP range requests.

It’s fast, scales well, and bandwidth costs are negligible because clients only download the exact data they need.

https://docs.protomaps.com/pmtiles/

simonw•12m ago
PMTiles is absurdly great software.
Humphrey•8m ago
I know right! I'd never heard of HTTP Range requests until PMTiles - but gee it's an elegant solution.
sieep•3h ago
What a reminder on how text is so much more efficient than video, its crazy! Could you imagine the same amount of knowledge (or dribble) but in video form? I wonder how large that would be.
ivanjermakov•2h ago
Average high quality 1080p60 video has bitrate of 5Mbps, which is equivalent to 120k English words per second. With average English speech being 150wpm, we end up with text being 50 thousand times more space efficient.

Converting 22GB of uncompressed text into video essay lands us at ~1PB or 1000TB.

fsiefken•2h ago
one could use a video llm to generate the video, diagrams or the stills automatically based on the text. except when it's boardgames playthroughs or programming i just transcribe to text, summarise and read youtube video's.
Barbing•1h ago
Can be nice to pull a raw transcript and have it formatted as HTML (formatting/punctuation fixes applied).

Best locally of course to avoid “I burned a lake for this?” guilt.

deskamess•1h ago
How do you read youtube videos? Very curious as I have been wanting to watch PDF's scroll by slowly on a large TV. I am interested in the workflow of getting a pdf/document into a scrolling video format. These days NotebookLM may be an option but I am curious if there is something custom. If I can get it into video form (mp4) then I can even deliver it via plex.
jacquesm•20m ago
That's what's so sad about youtube. 20 minute videos to encode a hundred words of usable content to get you to click on a link. The inefficiency is just staggering.
sirjaz•3h ago
This would be awesome as a cross platform app.
tevon•3h ago
The link seems to be down, was it taken down?
scsh•2h ago
Probably just forgot to make it public.
zkmon•3h ago
Similar to Single-page applications (SPA), single-table application (STA) might become a thing. Just a shard a table on multiple keys and serve the shards as static files, provided that the data is Ok to share, similar to sharing static html content.
jesprenj•2h ago
do you mean single database? it'd be quite hard if not impossible to make applications using a single table (no relations). reddit did it though, they have a huge table of "things" iirc.
mburns•1h ago
That is a common misconception.

> Next, we've got more than just two tables. The quote/paraphrase doesn't make it clear, but we've got two tables per thing. That means Accounts have an "account_thing" and an "account_data" table, Subreddits have a "subreddit_thing" and "subreddit_data" table, etc.

https://www.reddit.com/r/programming/comments/z9sm8/comment/...

rplnt•25m ago
And the important lesson from that the k/v-like aspect of it. That the "schema" is horizontal (is that a thing?) and not column-based. But I actually only read it on their blog IIRC and never even got the full details - that there's still a third ID column. Thanks for the link.
jhd3•1h ago
[The Baked Data architectural pattern](https://simonwillison.net/2021/Jul/28/baked-data/)
yread•1h ago
I wonder how much smaller it could get with some compression. You could probably encode "This website hijacks the scrollbar and I don't like it" comments into just a few bits.
jacquesm•22m ago
That's at least 45%, then you can leave out all of my comments and you're left with only 5!
Rendello•17m ago
The hard-coded dictionary wouldn't be much stranger than Brotli's:

https://news.ycombinator.com/item?id=27160590

dmarwicke•1h ago
22gb for mostly text? tried loading the site, it's pretty slow. curious how the query performance is with this much data in sqlite
spit2wind•1h ago
This is pretty neat! The calendar didn't work well for me. I could only seem to navigate by month. And when I selected the earliest day (after much tapping), nothing seemed to be updated.

Nonetheless, random access history is cool.

Sn0wCoder•47m ago
Site does not load on Firefox console error says 'Uncaught (in promise) TypeError: can't access property "wasm", sqlite3 is null'

Guess its common knowledge that SharedArrayBuffer (SQLite wasm) does not work with FF due to Cross-Origin Attacks (i just found out ;).

Once the initial chunk of data loads the rest load almost instantly on Chrome. Can you please fix the GitHub link (current 404) would like to peak at the code. Thank you!

coder543•22m ago
Works fine for me on Firefox on macOS
layer8•32m ago
Apparently the comment counts are only the top-level comments?

It would be nice for the thread pages to show a comment count.

dspillett•29m ago
Is there a public dump of the data anywhere that this is based upon, or have they scraped it themselves?

Such as DB might be entertaining to play with, and the threadedness of comments would be useful for beginners to practise efficient recursive queries (more so than the StackExchange dumps, for instance).

thomasmarton•18m ago
While not a dump per se, there is an API where you can get HN data programmatically, no scraping needed.

https://github.com/HackerNews/API

abetusk•11m ago
Alas, HN does not belong to us, and the existence of projects like this are subject to the whims of the legal owners of HN.

From the terms of use [0]:

"""

Commercial Use: Unless otherwise expressly authorized herein or in the Site, you agree not to display, distribute, license, perform, publish, reproduce, duplicate, copy, create derivative works from, modify, sell, resell, exploit, transfer or upload for any commercial purposes, any portion of the Site, use of the Site, or access to the Site. The buying, exchanging, selling and/or promotion (commercial or otherwise) of upvotes, comments, submissions, accounts (or any aspect of your account or any other account), karma, and/or content is strictly prohibited, constitutes a material breach of these Terms of Use, and could result in legal liability.

"""

[0] https://www.ycombinator.com/legal/#tou

m-p-3•4m ago
Looks like the repo was taken down (404).