frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

New budget financial API, based on EDGAR data

7•jgfriedman1999•20h ago
Hey everyone,

I'm the developer of an open-source (MIT License) python package to convert SEC submissions into useful data. I've recently put a bunch of stuff in the cloud for a nominal convenience fee.

Cloud:

1. SEC Websocket - notifies you of new submissions as they come out. (Free)

2. SEC Archive - download SEC submissions without rate limits. ($1/100,000 downloads)

3. MySQL RDS ($1/million rows returned)

- XBRL

- Fundamentals

- Institutional Holdings

- Insider Transactions

- Proxy Voting Records

Posting here, in case someone finds it useful.

Links:

Datamule (Package) GitHub: https://github.com/john-friedman/datamule-python

Documentation: https://john-friedman.github.io/datamule-python/datamule-python/sheet/sheet/

Get an API Key: https://datamule.xyz/dashboard2.html

Comments

jgfriedman1999•20h ago
How it works:

Websocket:

1. Two AWS ec2 t4g.nano instances polling the SEC's RSS and EFTS endpoints. (RSS is faster, EFTS is complete). 2. When new submissions are detected, they are sent to the Websocket (t4g.micro websocket, using Go for greater concurrency). 3. Websocket sends signal to consumers.

Archive:

1. One t4g.micro instance. Receives notifications from websocket, then gets submissions SGML from the SEC. 2. If submission is over size threshold, compresses with zstandard. 3. Uploads submissions to Cloudflare R2 bucket. (Zero egress fee, just class A / B operations). 4. Cloudflare R2 bucket is proxied behind my domain, with caching.

RDS

1. ECS Fargate instances set to run daily at 9 AM UTC. 2. Downloads data from archive, then parses them, and uploads them into AWS dbt.medium MySQL RDS. 3. Also handles reconciliation for the archive in case any filings were missed.

conditionnumber•14h ago
Cool, EDGAR is an amazing public service. I think they use Akamai as their CDN so the downloads are remarkably fast.

A few years ago I wrote an SGML parser for the full SEC PDS specification (super tedious). But I have trouble leveraging my own efforts for independent research because I don't have a reliable securities master to link against. I can't take a historical CUSIP from 13F filings and associate it to a historical ticker/return. Or my returns are wrong because of data errors so I can't fit a factor model to run an event study using Form 4 data.

I think what's missing is a serious open source effort to integrate/cleanse the various cheapo data vendors into something reasonably approximating the quality you get out of a CRSP/Compustat.

jgfriedman1999•15m ago
Yep! Pretty sure it is still Akamai. Via testing I've noticed they cap downloads at ~6mbps from e.g. home internet, but not GitHub or AWS.

SGML parsing is fun! - I've opensourced a sgml parser here https://github.com/john-friedman/secsgml

Securities master to link against - Interesting. Here's a pipeline off the top of my head 1. Get CUSIP, nameOfIssuer, titleOfClass using the Institutional Holdings database 2. Use the company metadata crosswalk to link CUSIP + titleOfClass to nameOfIssuer to get cik https://github.com/john-friedman/datamule-data/blob/master/d... (recompiled daily using GH actions) 3. Get e.g. us-gaap:EarningsPerShareBasic from the XBRL database. Link using cik. Types of stock might be a member - so e.g. Class A, Class B? Not sure there.

For form 4, not sure what you mean by event study. Would love to know!

Updating Microsoft Secure Boot Keys

https://techcommunity.microsoft.com/blog/windows-itpro-blog/updating-microsoft-secure-boot-keys/4055324
1•slartibardfast0•1m ago•0 comments

Reveal – Read Eval Visualize Loop for Clojure – Adds Graphviz Viewer

https://vlaaad.github.io/reveal/feature/graphviz
1•vlaaad•2m ago•0 comments

Show HN: Tendly – Ephemeral, privacy-first sharing for files and notes

https://tendly.xyz
1•omojo•3m ago•0 comments

Show HN: Friend's pen-and-calendar tracking → simple staff app

https://apps.apple.com/us/app/simple-staff-tracker/id6749170735
2•Nitishshah700•4m ago•0 comments

Silicon Valley's New Strategy: Move Slow and Build Things

https://www.wsj.com/tech/ai/silicon-valley-ai-infrastructure-capex-cffe0431
2•kjhughes•12m ago•0 comments

Wait, why are we paying more for VSCode forks again?

https://trunk.io/blog/in-defense-of-vscode-why-are-we-paying-for-cursor
3•samgutentag•12m ago•0 comments

NSF suspends nearly 300 UCLA grants (Terence Tao, Plasma Science Facility, etc.)

https://grant-witness.us/nsf-data.html
3•dargscisyhp•15m ago•0 comments

Remove AI Summaries

https://github.com/orjahren/remove-ai-summaries
2•Bogdanp•17m ago•0 comments

Spotify used to seem like a necessary evil for musicians. Now it just seems evil

https://www.theguardian.com/music/2025/jul/31/spotify-musicians-david-bridie-ntwnfb
5•nickcotter•19m ago•1 comments

Live-Action Assassin's Creed Series Coming to Netflix

https://www.ubisoft.com/en-us/company/careers/locations/articles/live-action-assassin-s-creed-series-coming-to-netflix
1•andsoitis•19m ago•0 comments

Qwen3 Coder 480B is Live on Cerebras

https://www.cerebras.ai/blog/qwen3-coder-480b-is-live-on-cerebras
2•retreatguru•21m ago•1 comments

2026: A Tech Odyssey

https://jergling.com/2025/01/21/2026-a-tech-odyssey/
1•BallsInIt•22m ago•0 comments

Why MCP's Disregard for 40 Years of RPC Best Practices Will Burn Enterprises

https://julsimon.medium.com/why-mcps-disregard-for-40-years-of-rpc-best-practices-will-burn-enterprises-8ef85ce5bc9b
2•jmsgwd•22m ago•0 comments

Is Information a Fundamental Force of the Universe? [video]

https://www.youtube.com/watch?v=WqYRMmlZmhM
1•doctoboggan•22m ago•0 comments

How to be a wise optimist about science and technology?

https://michaelnotebook.com/optimism/index.html
1•kiyanwang•23m ago•0 comments

Palantir lands $10B Army software and data contract

https://www.cnbc.com/2025/08/01/palantir-lands-10-billion-army-software-and-data-contract.html
2•rntn•26m ago•0 comments

Rust and Go vs. everything else – Bitfield Consulting

https://bitfieldconsulting.com/posts/rust-and-go
2•chautumn•26m ago•0 comments

Lessons from 10 Years at GitHub

https://rickwinfrey.com/writings/2025/07/04/10-lessons-from-github.html
2•kurinikku•28m ago•0 comments

Google Shifts goo.gl Policy: Inactive Links Deactivated, Active Links Preserved

https://blog.google/technology/developers/googl-link-shortening-update/
25•shuuji3•28m ago•12 comments

Telnyx Voice AI Agents now support inbound MMS during live calls

1•maevesentner•30m ago•0 comments

Lidarts – a free, open-source [scoring] website for darts games

https://github.com/mischkadb/lidarts
1•indigodaddy•30m ago•0 comments

The AI age is the "age of no consent"

https://productpicnic.beehiiv.com/p/the-ai-age-is-the-age-of-no-consent-7559
3•BallsInIt•30m ago•0 comments

Organic Amendments Enhance Maize Growth in Coastal Saline-Alkali Soils

https://www.mdpi.com/2223-7747/14/14/2217
1•PaulHoule•32m ago•0 comments

HTTP Ranges Are Broken for Firefox on GitHub Pages

https://github.com/bdon/ghpages-firefox-range-bug
1•uneekname•33m ago•0 comments

Listening to Ethernet via Eurorack

https://hackaday.com/2025/07/26/listening-to-ethernet-via-eurorack/
1•barnacl437•33m ago•0 comments

The NNCPNET email network

https://lwn.net/SubscriberLink/1031208/a71b294bf7ac1c40/
4•chmaynard•33m ago•1 comments

Show HN

https://onlyusedtesla.ai/assistant
1•adamqureshi•34m ago•0 comments

Show HN: AI system for quantum security analysis (rivals €500k hardware)

1•QuantumSpirit•40m ago•4 comments

Google spends more on capital like datacentres than the entire UK defense budget

https://twitter.com/robertwiblin/status/1951248197881393235
1•bko•40m ago•0 comments

Ask HN: Anyone know how to reach Cloudflare support?

3•OhMeadhbh•40m ago•4 comments