frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Gmail to SQLite

https://github.com/marcboeker/gmail-to-sqlite
87•tehlike•5h ago

Comments

terhechte•3h ago
I build something to visualize huge amounts of email (such as from Gmail) some years ago:

https://github.com/terhechte/postsack

andai•3h ago
Hey this is really neat! It's like those disk usage visualizers, except that it seems to focus on the total volume of the mail rather than the disk usage.

Is there a size option too? To see which senders are using most of my storage.

(Also your website's SSL certificate has expired.)

terhechte•1h ago
No currently not. It would be easy to add though. I haven't updated the tool in a while (after using it to clean up my Gmail inbox). Thanks for pointing out the certificate!
TekMol•2h ago
Shouldn't this be "imap to sqlite" or something? Why tie it to one specific email provider?
isaachinman•1h ago
Because _it is_ specific to Gmail. It's using OAuth and presumable API access.

IMAP is much harder, and much slower, and is bound by Google's bandwidth limits.

yread•2h ago
Would be nice to enable fulltext search as well
padjo•2h ago
Yes! I find gmail’s full text search surprisingly bad given it’s run by a search company.
isaachinman•1h ago
Agreed! One of the reasons we started working on Marco.

https://marcoapp.io

porker•15m ago
But not as bad as Outlook 365's search...
unsnap_biceps•2h ago
I'm curious as to why you choose to break out specific headers in the schema.

For example, you have recipients, subject, and sender as JSON fields, when you could have just a headers field with all of them, and even add the rest of the headers in the message.

If it's performance related, you can still have headers as a single json blob and then use generated columns for the specific fields.

For example

  CREATE TABLE IF NOT EXISTS "messages" (
    "id" INTEGER NOT NULL PRIMARY KEY, -- internal id
    "message_id" TEXT NOT NULL, -- Gmail message id
    "thread_id" TEXT NOT NULL, -- Gmail thread id
    "headers" JSON NOT NULL, -- JSON object of { "header": value },
    "subject" TEXT GENERATED ALWAYS AS (json_extract("headers", '$.Subject')) VIRTUAL NOT NULL)
    ...
  );
  CREATE INDEX subjectidx on messages(subject);
I've found this model really powerful, as it allows users to just alter table to add indexed generated columns as they need for their specific queries. For example, if I wanted to query dkim status, it's as simple as

  ALTER TABLE messages ADD dkim TEXT GENERATED ALWAYS AS (json_extract("headers", '$."Dkim-Signature"')) VIRTUAL NOT NULL);
  CREATE INDEX dkimidx on messages(dkim);
  SELECT dkim, COUNT(0) FROM messages GROUP BY dkim;
or whatever you want.
xearl•2h ago
TIL, thanks a lot!
dotancohen•1h ago
I see that you defined the `dkim` column as NOT NULL. So what happens when an email message does not contain the Dkim-Signature header?
hun3•1h ago
Probably something like

  Error: stepping, NOT NULL constraint failed: messages.dkim (19)
because, unlike MySQL, SQLite apparently returns SQL NULL for JSON null value.
jokoon•2h ago
I would have preferred a script that parses the mail backup Google sends you.

I think it's a big eml file.

ukuina•1h ago
Google Takeout regularly fails to complete for me. Syncing via the API seems like a reasonable alternative.
einpoklum•1h ago
Let us stop using GMail:

* Google collects vast amounts of personal data, specifically through receiving all of your email and analyizing it.

* It builds elaborate user profiles and uses them to target you with ads designed to better influence you.

* Its hold on information (from different sources) has made it excessively powerful economically, and thus also politically.

* Google/Alphabet has long started to affect legislation, including through direct registered lobbying: ~15 Million USD in 2024 (opensecrets.org).

* It has been known to pass, and likely still passes, the information it collects - including copies of your email correspondence - on to the US government (Edward Snowden leaks).

and finally:

* There are multiple email providers, many of them quite good - both for pay and gratis. Naturally most of the gratis ones have their own interests, but nothing like Google.

the_mitsuhiko•1h ago
I really lament that you cannot sign in even with an application specific password any more and you need to get an oauth client and go through an oauth flow. It’s my email, but Google takes away an open standard even for myself to access it.
pdyc•9m ago
this is great if only there was a tool for whatsapp to sqlite it would make my data so much more useful

GRPO experiment: I trained a Language Model to schedule events

https://github.com/anakin87/qwen-scheduler-grpo
1•anakin87•4m ago•1 comments

Chinese jacket maker is now the biggest company, according to Bloomberg

https://www.ft.com/content/bf5fd95f-115a-4770-820e-4ead24eb05ab
2•KnuthIsGod•24m ago•0 comments

Show HN: The Internet Rich List

https://www.theinternetrichlist.com/
1•sixpoundham•25m ago•0 comments

Bold linker v0.2.0 release – bold just got faster

https://github.com/kubkon/bold/releases/tag/v0.2.0
1•todsacerdoti•27m ago•0 comments

Rules-based world order in retreat and violence on the rise. Has WW3 begun?

https://www.theguardian.com/news/ng-interactive/2025/may/10/are-we-heading-for-another-world-war-or-has-it-already-started
4•prmph•30m ago•1 comments

Bret Victor – The Future of Programming (2013) [video]

https://vimeo.com/71278954
2•rapnie•32m ago•1 comments

RPG in a Box

https://rpginabox.com/
2•skibz•38m ago•0 comments

Show HN: Voice Cloning from American English to Indian Langs

https://sarypslabs.com
1•siddhartha_soma•39m ago•0 comments

Countries could use forests to 'mask' needed emission cuts: Report

https://phys.org/news/2025-04-countries-forests-mask-emission.html
1•PaulHoule•39m ago•0 comments

Pakistan/India accuse each other of striking military bases in major escalation

https://www.theguardian.com/world/live/2025/may/10/pakistan-says-three-air-bases-attacked-by-indian-missiles-live-updates
2•spzx•39m ago•0 comments

Factorio lets fix video #1 – kovarex

https://www.youtube.com/watch?v=AmliviVGX8Q
1•sensanaty•41m ago•0 comments

Dia – Executive Assistant Waitlist is now live

1•AnshulT•41m ago•0 comments

Cosmos 482 Descent Craft tracker

http://astria.tacc.utexas.edu/AstriaGraph/
4•Kaibeezy•45m ago•2 comments

CT scans don't lie–cigarettes are harder on the lungs than marijuana

https://healthimaging.com/topics/medical-imaging/computed-tomography-ct/ct-scans-dont-lie-cigarettes-are-harder-lungs-marijuana
2•XzetaU8•48m ago•0 comments

The US has 1,001 measles cases and 11 states with active outbreaks

https://www.independent.co.uk/news/montana-new-mexico-ohio-oklahoma-pennsylvania-b2748266.html
3•amarcheschi•48m ago•0 comments

Google is bringing automatic passkey upgrades to Android (APK teardown)

https://www.androidauthority.com/google-automatic-passkey-upgrades-android-apk-teardown-3555935/
1•unigiri•50m ago•0 comments

Quiz Machine – Simple and quick indie AI live quiz generator

https://www.quiz-machine.com/en
1•kurgandev•51m ago•1 comments

Unveiling the Future of Graphics: 'Zorah' UE5 Path-Tracing Demo with RTX 5090

https://www.mmfooty.com/nvidia-108gb-ue5-path-tracing-demo-zorah/
1•lamariexa•52m ago•0 comments

A conversation with Jony Ive [video]

https://www.youtube.com/watch?v=wLb9g_8r-mE
1•tosh•53m ago•0 comments

Show HN: ToolRegistry – A Python Library for Structured Tool Integration

https://github.com/Oaklight/ToolRegistry
2•Oaklight•1h ago•1 comments

After 16 years, we're renewing the StackOverflow Brand

https://meta.stackexchange.com/questions/408823/after-16-years-we-re-renewing-the-brand
18•walterbell•1h ago•9 comments

Everything You Need to Know About Sparkle's Upcoming 24GB Arc Battlemage GPU

https://old.reddit.com/r/intel_arc_graphics/comments/1kj5906/sparkle_teases_arc_battlemage_gpu_with_24gb/
1•lamariexa•1h ago•0 comments

End of History – Francis Fukuyama (1992)

https://www.jstor.org/stable/24027184
2•thunderbong•1h ago•1 comments

5 Steps to N-Body Simulation

https://alvinng4.github.io/grav_sim/5_steps_to_n_body_simulation/
1•dargscisyhp•1h ago•0 comments

Ash (Almquist Shell) Variants

https://www.in-ulm.de/~mascheck/various/ash/
5•thefilmore•1h ago•0 comments

'Just Write Great Content' Is the Worst SEO Advice You'll Hear

https://news.seoforfounders.com/p/seo-myth-busting-1-great-content-ranks-itself
4•contctlink•1h ago•2 comments

So Yeah – I hate useEffect

https://substack.com/inbox/post/163261746
1•hmontazeri•1h ago•0 comments

15 Years of Shader Minification

https://www.ctrl-alt-test.fr/2025/15-years-of-shader-minification/
1•laurentlb•1h ago•0 comments

Apple hits back at US judge's 'extraordinary' contempt order

https://www.bbc.com/news/articles/c3r8rg4w2v0o
4•chrisjj•1h ago•0 comments

EM-LLM: Human-Inspired Episodic Memory for Infinite Context LLMs

https://github.com/em-llm/EM-LLM-model
2•jbotz•1h ago•0 comments