I got tired of missing material corporate events buried in SEC filings, so I built
SEC Whisperer - a system that monitors, downloads, and summarizes 8-K filings
using Gemini 2.5 Flash.
Technical Stack:
- Python pipeline polling SEC EDGAR API every 2 hours
- Cloud Run jobs for serverless processing (avoiding cold starts with batch processing)
- 98% noise reduction on HTML filings before LLM analysis
- Firebase for real-time publishing to Next.js frontend
- Gemini with structured JSON output + post-processing to prevent hallucination
The interesting technical challenges:
1. SEC filings are massive (40KB+ exhibits). Had to build a sectionizer that
identifies item boundaries and caps exhibit text at 5KB (770x speedup)
2. LLMs hallucinate quarters and M&A tags. Solution: deterministic post-processing
that strips anything not in source text
3. Filing amendments create tricky supersedes/superseded_by relationships in Firestore
Live site: https://secwhisperer.com
Code: Not open source yet, but happy to discuss architecture
Example output: The site caught Nvidia's $5B Intel deal within minutes of the
8-K filing and had AI analysis published before most financial news sites.
Would love feedback from the HN community - especially on the LLM hallucination
prevention patterns. What other techniques are you all using?
borxtrk•2h ago