New budget financial API, based on EDGAR data

7•jgfriedman1999•20h ago

Hey everyone,

I'm the developer of an open-source (MIT License) python package to convert SEC submissions into useful data. I've recently put a bunch of stuff in the cloud for a nominal convenience fee.

Cloud:

1. SEC Websocket - notifies you of new submissions as they come out. (Free)

2. SEC Archive - download SEC submissions without rate limits. ($1/100,000 downloads)

3. MySQL RDS ($1/million rows returned)

- XBRL

- Fundamentals

- Institutional Holdings

- Insider Transactions

- Proxy Voting Records

Posting here, in case someone finds it useful.

Links:

Datamule (Package) GitHub: https://github.com/john-friedman/datamule-python

Documentation: https://john-friedman.github.io/datamule-python/datamule-python/sheet/sheet/

Get an API Key: https://datamule.xyz/dashboard2.html

Comments

jgfriedman1999•20h ago

How it works:

Websocket:

1. Two AWS ec2 t4g.nano instances polling the SEC's RSS and EFTS endpoints. (RSS is faster, EFTS is complete). 2. When new submissions are detected, they are sent to the Websocket (t4g.micro websocket, using Go for greater concurrency). 3. Websocket sends signal to consumers.

Archive:

1. One t4g.micro instance. Receives notifications from websocket, then gets submissions SGML from the SEC. 2. If submission is over size threshold, compresses with zstandard. 3. Uploads submissions to Cloudflare R2 bucket. (Zero egress fee, just class A / B operations). 4. Cloudflare R2 bucket is proxied behind my domain, with caching.

RDS

1. ECS Fargate instances set to run daily at 9 AM UTC. 2. Downloads data from archive, then parses them, and uploads them into AWS dbt.medium MySQL RDS. 3. Also handles reconciliation for the archive in case any filings were missed.

conditionnumber•14h ago

Cool, EDGAR is an amazing public service. I think they use Akamai as their CDN so the downloads are remarkably fast.

A few years ago I wrote an SGML parser for the full SEC PDS specification (super tedious). But I have trouble leveraging my own efforts for independent research because I don't have a reliable securities master to link against. I can't take a historical CUSIP from 13F filings and associate it to a historical ticker/return. Or my returns are wrong because of data errors so I can't fit a factor model to run an event study using Form 4 data.

I think what's missing is a serious open source effort to integrate/cleanse the various cheapo data vendors into something reasonably approximating the quality you get out of a CRSP/Compustat.

jgfriedman1999•15m ago

Yep! Pretty sure it is still Akamai. Via testing I've noticed they cap downloads at ~6mbps from e.g. home internet, but not GitHub or AWS.

SGML parsing is fun! - I've opensourced a sgml parser here https://github.com/john-friedman/secsgml

Securities master to link against - Interesting. Here's a pipeline off the top of my head 1. Get CUSIP, nameOfIssuer, titleOfClass using the Institutional Holdings database 2. Use the company metadata crosswalk to link CUSIP + titleOfClass to nameOfIssuer to get cik https://github.com/john-friedman/datamule-data/blob/master/d... (recompiled daily using GH actions) 3. Get e.g. us-gaap:EarningsPerShareBasic from the XBRL database. Link using cik. Types of stock might be a member - so e.g. Class A, Class B? Not sure there.

For form 4, not sure what you mean by event study. Would love to know!

Updating Microsoft Secure Boot Keys

Reveal – Read Eval Visualize Loop for Clojure – Adds Graphviz Viewer

Show HN: Tendly – Ephemeral, privacy-first sharing for files and notes

Show HN: Friend's pen-and-calendar tracking → simple staff app

Silicon Valley's New Strategy: Move Slow and Build Things

Wait, why are we paying more for VSCode forks again?

NSF suspends nearly 300 UCLA grants (Terence Tao, Plasma Science Facility, etc.)

Remove AI Summaries

Spotify used to seem like a necessary evil for musicians. Now it just seems evil

Live-Action Assassin's Creed Series Coming to Netflix

Qwen3 Coder 480B is Live on Cerebras

2026: A Tech Odyssey

Why MCP's Disregard for 40 Years of RPC Best Practices Will Burn Enterprises

Is Information a Fundamental Force of the Universe? [video]

How to be a wise optimist about science and technology?

Palantir lands $10B Army software and data contract

Rust and Go vs. everything else – Bitfield Consulting

Lessons from 10 Years at GitHub

Google Shifts goo.gl Policy: Inactive Links Deactivated, Active Links Preserved

Telnyx Voice AI Agents now support inbound MMS during live calls

Lidarts – a free, open-source [scoring] website for darts games

The AI age is the "age of no consent"

Organic Amendments Enhance Maize Growth in Coastal Saline-Alkali Soils

HTTP Ranges Are Broken for Firefox on GitHub Pages

Listening to Ethernet via Eurorack

The NNCPNET email network

Show HN

Show HN: AI system for quantum security analysis (rivals €500k hardware)

Google spends more on capital like datacentres than the entire UK defense budget

Ask HN: Anyone know how to reach Cloudflare support?