How it works: 1. The SEC accepts a filing, this is recorded as e.g. <ACCEPTANCE-DATETIME>20220204201127 2. The SEC then generates an index page for the filing, with filing metadata. This is publicly accessible. Typically the Last Modified Tag is the same as acceptance datetime. 3. The SEC then releases the filing's original sgml upload, and extracted documents. This is publicly accessibly. e.g. 10-K. 4. The SEC then updates RSS and PDS.
URL format A typical index page is expressed publicly as: https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/0000950170-22-000796-index.html
It turns out that you don't need the cik {1318605} for the url. https://www.sec.gov/Archives/edgar/data/95017022000796/0000950170-22-000796-index.html
This means that you can predict the index page using just the accession number. An accession number has format: {cik of entity submitting the filing NOT necessarily the actual company}-{2d year}-{typically sequential count of submissions that year}
So all you have to do is take the last accession, increment the count, and poll!
Once you match an index page, you can extract cik from that page, and construct the url for the filing information and poll that. https://www.sec.gov/Archives/edgar/data/1318605/0000950170-22-000796.txt
What's great about this approach is that a few entities file on behalf of most companies and individuals. If you only monitor ten entity accessions, you monitor 42% of the corpus, 100 and you get 68%. Numbers taken from 2024.
GitHub Link https://github.com/john-friedman/The-fastest-way-to-get-SEC-filings
This should be much faster than the papers which sparked government investigations! https://www.wsj.com/articles/sec-plans-to-fix-flaw-in-electronic-distribution-system-1419621428?gaa_at=eafs&gaa_n=AWEtsqd6-X8ylp_BlpWHYpFoJqrLMDwYUu3m1QBJhoRtCHDIHraLrD3tMHPXaw57JW4%3D&gaa_ts=693e2fd3&gaa_sig=noGkpoMh6OXa0MqFPgj5kFe9kx7vbkSpB1vFceqW8LtXzD2wWC2vkGLKJwnvkJO-sq7q93qKbX_rs7ULReZIwA%3D%3D