frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: 22 GB of Hacker News in SQLite

https://hackerbook.dosaygo.com
42•keepamovin•2h ago

Comments

keepamovin•2h ago
Community, All the HN belong to you. This is an archive of hacker news that fits in your browser. When I made HN Made of Primes I realized I could probably do this offline sqlite/wasm thing with the whole GBs of archive. The whole dataset. So I tried it, and this is it. Have Hacker News on your device.

Go to this repo (https://github.com/DOSAYGO-STUDIO/HackerBook): you can download it. Big Query -> ETL -> npx serve docs - that's it. 20 years of HN arguments and beauty, can be yours forever. So they'll never die. Ever. It's the unkillable static archive of HN and it's your hands. That's my Year End gift to you all. Thank you for a wonderful year, have happy and wonderful 2026. make something of it.

carbocation•30m ago
That repo is throwing up a 404 for me.

Question - did you consider tradeoffs between duckdb (or other columnar stores) and SQLite?

keepamovin•21m ago
No, I just went straight to sqlite. What is duckdb?
cess11•7m ago
It is very similar to SQLite in that it can run in-process and store its data as a file.

It's different in that it is tailored to analytics, among other things storage is columnar, and it can run off some common data analytics file formats.

fsiefken•2m ago
DuckDB is an open-source column-oriented Relational Database Management System (RDBMS). It's designed to provide high performance on complex queries against large databases in embedded configuration.

It has transparent compression built-in and has support for natural language queries. https://buckenhofer.com/2025/11/agentic-ai-with-duckdb-and-s...

"DICT FSST (Dictionary FSST) represents a hybrid compression technique that combines the benefits of Dictionary Encoding with the string-level compression capabilities of FSST. This approach was implemented and integrated into DuckDB as part of ongoing efforts to optimize string storage and processing performance." https://homepages.cwi.nl/~boncz/msc/2025-YanLannaAlexandre.p...

linhns•20m ago
Not the author here. I’m not sure about DuckDB, but SQLite allows you to simply use a file as a database and for archiving, it’s really helpful. One file, that’s it.
cobolcomesback•14m ago
DuckDB does as well. A super simplified explanation of duckdb is that it’s sqlite but columnar, and so is better for analytics of large datasets.
formerly_proven•5m ago
The schema is this: items(id INTEGER PRIMARY KEY, type TEXT, time INTEGER, by TEXT, title TEXT, text TEXT, url TEXT

Doesn't scream columnar database to me.

embedding-shape•2m ago
At a glance, that is missing (at least) a `parent` or `parent_id` attribute which items in HN can have (and you kind of need if you want to render comments), see http://hn.algolia.com/api/v1/items/46436741
3eb7988a1663•17m ago
While I suspect DuckDB would compress better, given the ubiquity of SQLite, it seems a fine standard choice.
wslh•23m ago
Is this updated regularly? 404 on GitHub as the other comment.

With all due respect it would be great if there is an official HN public dump available (and not requiring stuff such as BigQuery which is expensive).

yupyupyups•16m ago
1 hour passed and it's already nuked?

Thank you btw

asdefghyk•1h ago
How much space is needed? ...for the data .... Im wondering if it would work on a tablet? ....
keepamovin•1h ago
~9GB gzipped.
zX41ZdbW•18m ago
The query tab looks quite complex with all these content shards: https://hackerbook.dosaygo.com/?view=query

I have a much simpler database: https://play.clickhouse.com/play?user=play#U0VMRUNUIHRpbWUsI...

embedding-shape•3m ago
Does your database also runs offline/locally in the browser? Seems to be the reason for the large number of shards.

Zen Buddhism and Meditation in Japan

https://www.japan.travel/en/guide/meditation/
1•wslh•1m ago•0 comments

The birth of the internet, according to Jon Bois [video]

https://www.youtube.com/watch?v=zmyBSrQodnI
1•hemloc_io•2m ago•0 comments

Moving AI from Emotion Detection to True Understanding

1•buttersmoothAI•3m ago•0 comments

Mutatis – Database that mutates its schema based on semantic patterns

https://github.com/ScooterMageee/mutatis-public
1•Mutatis•4m ago•1 comments

Welcome to "necroprinting"–3D printer nozzle made from mosquito's proboscis

https://arstechnica.com/science/2025/12/welcome-to-necroprinting-3d-printer-nozzle-made-from-mosq...
1•PaulHoule•4m ago•0 comments

Apple's AI Bet: Playing the Long Game or Missing the Moment?

https://philippdubach.com/2025/12/30/apples-ai-bet-playing-the-long-game-or-missing-the-moment/
1•7777777phil•6m ago•0 comments

Ask HN: LLMs and Code Style Compliance

1•blinkbat•6m ago•0 comments

40 Principles of Invention

https://en.wikipedia.org/wiki/40_principles_of_invention
1•tosh•6m ago•0 comments

Now That He Has No Power, Mitt Romney Says "Tax the Rich"

https://jacobin.com/2025/12/romney-tax-rich-op-ed-nyt/
6•robtherobber•7m ago•0 comments

2026 AI: SLMs, Consumer AI, Marketplaces and a Bubble

https://vaibhavm.substack.com/p/2026-ai-startup-predictions-slms
1•vaibhavgeek•8m ago•0 comments

IsoCraft: Isometric Open-Source Minecraft in Three.js

https://isocraft.app/
1•gdss•10m ago•0 comments

Questions for Jill Tarter, Astronomer (2014)

https://www.sciencefriday.com/articles/10-questions-for-jill-tarter-astronomer/
2•georgecmu•10m ago•0 comments

Vecty: Build dynamic web front ends in Go/WASM competing with React/VueJS (2022)

https://github.com/hexops/vecty
1•nateb2022•11m ago•0 comments

Company in a Box – 42 AI agents to run a software house

https://github.com/fom-dev/company-in-a-box
2•fomdev•12m ago•0 comments

Why Is My Website Logo Not Appearing in Google Search Results? Please Tell Me

https://www.google.com/search?q=site%3Avect.pro&oq=s&gs_lcrp=EgZjaHJvbWUqCAgDEEUYJxg7MgYIABBFGDwy...
1•WoWSaaS•12m ago•0 comments

Building an in-memory background job queue in ASP.NET Core

https://abp.io/community/articles/how-to-build-an-in-memory-background-job-queue-in-asp.net-core-...
2•oguzhanagir•12m ago•1 comments

Forge: C's NPM and HTTP framework (Pure C99, zero deps)

3•Subrata_Roy•12m ago•0 comments

Quickly Inspect Your Java Application with JStall

https://mostlynerdless.de/blog/2025/12/30/quickly-inspect-your-java-application-with-jstall/
2•tanelpoder•13m ago•0 comments

Community Tools Bring Lockfile Support to GitHub Actions

https://nesbitt.io/2025/12/30/community-tools-bring-lockfile-support-to-github-actions.html
2•emschwartz•14m ago•0 comments

Intro to Foundation DB via a Distributed Mutex

https://jander.land/20251227_mutex.html
2•janderland•15m ago•0 comments

Failing at Using a Local LLM for Vinyl Record Color Extraction

https://tylergaw.com/blog/failing-at-local-models-for-wax/
1•m-hodges•16m ago•0 comments

Ohara – blockchain verification of non-AI media

https://github.com/phyro/ohara
2•bilegeek•16m ago•0 comments

Russian number station UVB-76 suddenly plays Swan Lake and Erika [video]

https://www.youtube.com/watch?v=DNn3lDEoRv4
3•agentifysh•17m ago•0 comments

TRIZ

https://en.wikipedia.org/wiki/TRIZ
3•tosh•17m ago•0 comments

jless – A Command-Line JSON Viewer

https://jless.io/
1•tambourine_man•18m ago•0 comments

Peacock Code (or: why Claude made my codebase worse)

https://ivelinkozarev.substack.com/p/peacock-code-or-why-claude-made-my
1•lessIsAMess•21m ago•0 comments

3D Expanding Racks [video]

https://www.youtube.com/watch?v=NEJZlGuWGV8
1•gabeyaw•22m ago•0 comments

CCC 25:Build a CPU in Factory Game Engine

https://www.youtube.com/watch?v=FLUeSurkMOI
2•KellyCriterion•24m ago•0 comments

The rise and fall of the OLAP cube

https://www.holistics.io/blog/the-rise-and-fall-of-the-olap-cube/
2•fanf2•25m ago•1 comments

You've been targeted by government spyware. Now what?

https://techcrunch.com/2025/12/29/youve-been-targeted-by-government-spyware-now-what/
3•rbanffy•26m ago•1 comments