frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Gmail to SQLite

https://github.com/marcboeker/gmail-to-sqlite
105•tehlike•5h ago

Comments

terhechte•3h ago
I build something to visualize huge amounts of email (such as from Gmail) some years ago:

https://github.com/terhechte/postsack

andai•3h ago
Hey this is really neat! It's like those disk usage visualizers, except that it seems to focus on the total volume of the mail rather than the disk usage.

Is there a size option too? To see which senders are using most of my storage.

(Also your website's SSL certificate has expired.)

terhechte•2h ago
No currently not. It would be easy to add though. I haven't updated the tool in a while (after using it to clean up my Gmail inbox). Thanks for pointing out the certificate!
TekMol•3h ago
Shouldn't this be "imap to sqlite" or something? Why tie it to one specific email provider?
isaachinman•2h ago
Because _it is_ specific to Gmail. It's using OAuth and presumable API access.

IMAP is much harder, and much slower, and is bound by Google's bandwidth limits.

yread•3h ago
Would be nice to enable fulltext search as well
padjo•2h ago
Yes! I find gmail’s full text search surprisingly bad given it’s run by a search company.
isaachinman•2h ago
Agreed! One of the reasons we started working on Marco.

https://marcoapp.io

porker•59m ago
But not as bad as Outlook 365's search...
jbverschoor•3m ago
Boot as bad as the Mail.app from iOS and macOS
unsnap_biceps•3h ago
I'm curious as to why you choose to break out specific headers in the schema.

For example, you have recipients, subject, and sender as JSON fields, when you could have just a headers field with all of them, and even add the rest of the headers in the message.

If it's performance related, you can still have headers as a single json blob and then use generated columns for the specific fields.

For example

  CREATE TABLE IF NOT EXISTS "messages" (
    "id" INTEGER NOT NULL PRIMARY KEY, -- internal id
    "message_id" TEXT NOT NULL, -- Gmail message id
    "thread_id" TEXT NOT NULL, -- Gmail thread id
    "headers" JSON NOT NULL, -- JSON object of { "header": value },
    "subject" TEXT GENERATED ALWAYS AS (json_extract("headers", '$.Subject')) VIRTUAL NOT NULL)
    ...
  );
  CREATE INDEX subjectidx on messages(subject);
I've found this model really powerful, as it allows users to just alter table to add indexed generated columns as they need for their specific queries. For example, if I wanted to query dkim status, it's as simple as

  ALTER TABLE messages ADD dkim TEXT GENERATED ALWAYS AS (json_extract("headers", '$."Dkim-Signature"')) VIRTUAL NOT NULL);
  CREATE INDEX dkimidx on messages(dkim);
  SELECT dkim, COUNT(0) FROM messages GROUP BY dkim;
or whatever you want.
xearl•2h ago
TIL, thanks a lot!
dotancohen•2h ago
I see that you defined the `dkim` column as NOT NULL. So what happens when an email message does not contain the Dkim-Signature header?
hun3•1h ago
Probably something like

  Error: stepping, NOT NULL constraint failed: messages.dkim (19)
because, unlike MySQL, SQLite apparently returns SQL NULL for JSON null value.
jokoon•3h ago
I would have preferred a script that parses the mail backup Google sends you.

I think it's a big eml file.

ukuina•2h ago
Google Takeout regularly fails to complete for me. Syncing via the API seems like a reasonable alternative.
einpoklum•1h ago
Let us stop using GMail:

* Google collects vast amounts of personal data, specifically through receiving all of your email and analyizing it.

* It builds elaborate user profiles and uses them to target you with ads designed to better influence you.

* Its hold on information (from different sources) has made it excessively powerful economically, and thus also politically.

* Google/Alphabet has long started to affect legislation, including through direct registered lobbying: ~15 Million USD in 2024 (opensecrets.org).

* It has been known to pass, and likely still passes, the information it collects - including copies of your email correspondence - on to the US government (Edward Snowden leaks).

and finally:

* There are multiple email providers, many of them quite good - both for pay and gratis. Naturally most of the gratis ones have their own interests, but nothing like Google.

the_mitsuhiko•1h ago
I really lament that you cannot sign in even with an application specific password any more and you need to get an oauth client and go through an oauth flow. It’s my email, but Google takes away an open standard even for myself to access it.
pdyc•54m ago
this is great if only there was a tool for whatsapp to sqlite it would make my data so much more useful
philipwhiuk•15m ago
https://medium.com/@Med1um1/extracting-whatsapp-messages-fro...

Vision Now Available in Llama.cpp

https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md
236•redman25•6h ago•65 comments

Not a three-year-old chimney sweep (2022)

https://fakehistoryhunter.net/2022/07/26/not-a-3-year-old-chimney-sweep/
62•nixass•3h ago•38 comments

Private Japanese lunar lander enters orbit around moon ahead of a June touchdown

https://phys.org/news/2025-05-private-japanese-lunar-lander-orbit.html
64•pseudolus•3d ago•0 comments

Slow Software for a Burning World

https://bonfirenetworks.org/posts/slow_software_for_a_burning_world/
61•todsacerdoti•3h ago•28 comments

Gmail to SQLite

https://github.com/marcboeker/gmail-to-sqlite
105•tehlike•5h ago•20 comments

Business books are entertainment, not strategic tools

https://theorthagonist.substack.com/p/why-reading-business-books-is-a-waste
328•ZeroTalent•13h ago•138 comments

CT scans don't lie–cigarettes are harder on the lungs than marijuana

https://healthimaging.com/topics/medical-imaging/computed-tomography-ct/ct-scans-dont-lie-cigarettes-are-harder-lungs-marijuana
6•XzetaU8•1h ago•0 comments

Why should I care? Or why punks are correct and old wise philosophers are wrong

https://abuseofnotation.github.io/moral-law/
23•boris_m•2h ago•10 comments

Internet Roadtrip: Vote to steer

https://neal.fun/internet-roadtrip/
136•memalign•3d ago•25 comments

Detect and crash Chromium bots with one weird trick (bots hate it)

https://blog.castle.io/detect-and-crash-chromium-bots-with-one-weird-trick-bots-hate-it/
19•avastel•2d ago•2 comments

Ash (Almquist Shell) Variants

https://www.in-ulm.de/~mascheck/various/ash/
10•thefilmore•2h ago•0 comments

ALICE detects the conversion of lead into gold at the LHC

https://www.home.cern/news/news/physics/alice-detects-conversion-lead-gold-lhc
552•miiiiiike•19h ago•276 comments

Brandon's Semiconductor Simulator

https://brandonli.net/semisim/
147•dominikh•9h ago•15 comments

A simple 16x16 dot animation from simple math rules

https://tixy.land
59•andrewrn•7h ago•10 comments

WebGL Water (2010)

https://madebyevan.com/webgl-water/
157•gaws•10h ago•47 comments

Fleurs du Mal

https://fleursdumal.org
121•Frummy•11h ago•43 comments

Charles Bukowski, William Burroughs, and the Computer (2009)

https://realitystudio.org/bibliographic-bunker/charles-bukowski-william-burroughs-and-the-computer/
65•zdw•9h ago•17 comments

Cosmos 482 Descent Craft tracker

http://astria.tacc.utexas.edu/AstriaGraph/
8•Kaibeezy•1h ago•4 comments

6502 Illegal Opcodes in the Siemens PC 100 Assembly Manual

https://www.pagetable.com/?p=1798
28•ingve•1d ago•1 comments

Sofie: open-source web based system for automating live TV news production

https://nrkno.github.io/sofie-core/
324•rjmunro•20h ago•41 comments

Rust’s dependencies are starting to worry me

https://vincents.dev/blog/rust-dependencies-scare-me/?
309•chaosprint•1d ago•372 comments

How much information is in DNA?

https://dynomight.substack.com/p/dna
3•crescit_eundo•1d ago•3 comments

Stratolaunch Successfully Completes Reusable Hypersonic Flight and Recovery

https://www.stratolaunch.com/news/stratolaunch-successfully-completes-reusable-hypersonic-flight-and-recovery-with-talon-a2-vehicle/
46•speckx•2d ago•4 comments

21 GB/s CSV Parsing Using SIMD on AMD 9950X

https://nietras.com/2025/05/09/sep-0-10-0/
288•zigzag312•20h ago•146 comments

Launch HN: Nao Labs (YC X25) – Cursor for Data

143•ClaireGz•17h ago•54 comments

After 16 years, we're renewing the StackOverflow Brand

https://meta.stackexchange.com/questions/408823/after-16-years-we-re-renewing-the-brand
27•walterbell•2h ago•24 comments

What’s new in Swift 6.2

https://www.hackingwithswift.com/articles/277/whats-new-in-swift-6-2
155•ingve•13h ago•157 comments

PlainBudget – Minimalist Plain Text Budgeting

https://plainbudget.com/
67•jgalvez•10h ago•16 comments

Itter.sh – Micro-Blogging via Terminal

https://www.itter.sh/
243•rrr_oh_man•20h ago•67 comments

Show HN: Hyvector – A fast and modern SVG editor

https://www.hyvector.com
290•jansan•23h ago•81 comments