I'm the developer of an open-source (MIT License) python package to convert SEC submissions into useful data. I've recently put a bunch of stuff in the cloud for a nominal convenience fee.
Cloud:
1. SEC Websocket - notifies you of new submissions as they come out. (Free)
2. SEC Archive - download SEC submissions without rate limits. ($1/100,000 downloads)
3. MySQL RDS ($1/million rows returned)
- XBRL
- Fundamentals
- Institutional Holdings
- Insider Transactions
- Proxy Voting Records
Posting here, in case someone finds it useful.
Links:
Datamule (Package) GitHub: https://github.com/john-friedman/datamule-python
Documentation: https://john-friedman.github.io/datamule-python/datamule-python/sheet/sheet/
Get an API Key: https://datamule.xyz/dashboard2.html
jgfriedman1999•20h ago
Websocket:
1. Two AWS ec2 t4g.nano instances polling the SEC's RSS and EFTS endpoints. (RSS is faster, EFTS is complete). 2. When new submissions are detected, they are sent to the Websocket (t4g.micro websocket, using Go for greater concurrency). 3. Websocket sends signal to consumers.
Archive:
1. One t4g.micro instance. Receives notifications from websocket, then gets submissions SGML from the SEC. 2. If submission is over size threshold, compresses with zstandard. 3. Uploads submissions to Cloudflare R2 bucket. (Zero egress fee, just class A / B operations). 4. Cloudflare R2 bucket is proxied behind my domain, with caching.
RDS
1. ECS Fargate instances set to run daily at 9 AM UTC. 2. Downloads data from archive, then parses them, and uploads them into AWS dbt.medium MySQL RDS. 3. Also handles reconciliation for the archive in case any filings were missed.