I built an AI pipeline to analyze every SEC 8-K filing in real-time

11•borxtrk•4mo ago

Comments

borxtrk•4mo ago

I got tired of missing material corporate events buried in SEC filings, so I built SEC Whisperer - a system that monitors, downloads, and summarizes 8-K filings using Gemini 2.5 Flash.

  Technical Stack:
  - Python pipeline polling SEC EDGAR API every 2 hours
  - Cloud Run jobs for serverless processing (avoiding cold starts with batch processing)
  - 98% noise reduction on HTML filings before LLM analysis
  - Firebase for real-time publishing to Next.js frontend
  - Gemini with structured JSON output + post-processing to prevent hallucination

  The interesting technical challenges:
  1. SEC filings are massive (40KB+ exhibits). Had to build a sectionizer that
     identifies item boundaries and caps exhibit text at 5KB (770x speedup)
  2. LLMs hallucinate quarters and M&A tags. Solution: deterministic post-processing
     that strips anything not in source text
  3. Filing amendments create tricky supersedes/superseded_by relationships in Firestore

  Live site: https://secwhisperer.com
  Code: Not open source yet, but happy to discuss architecture

  Example output: The site caught Nvidia's $5B Intel deal within minutes of the 
  8-K filing and had AI analysis published before most financial news sites.

  Would love feedback from the HN community - especially on the LLM hallucination 
  prevention patterns. What other techniques are you all using?

golden-face•4mo ago

Can you share any details or samples of the code/prompts especially with regards to "Gemini with structured JSON output" and "LLMs hallucinate quarters and M&A tags. Solution: deterministic post-processing"?

I recently started using Gemini to perform perform classification tasks and I have been struggling with 2 things:

1) Documentation on the input/prompt schema when you want to require structured output 2) How to enforce outputs like "this key-value output must come from the supplied list of key-values"

It is really fascinating to work with this tool if only because it works well on 90% of tasks and then decides to go full stream of consciousness "hello good day, I know this is a horse race on TV but I cannot find the horse race in the list of car manufacturers you supplied" with a random schema for the output.

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

Show HN: PalettePoint – AI color palette generator from text or images

Robust and Interactable World Models in Computer Vision [video]

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist

Notes for February 2-7

Study confirms experience beats youthful enthusiasm

The Big Hunger by Walter J Miller, Jr. (1952)

The Genus Amanita

We have broken SHA-1 in practice

Ask HN: Was my first management job bad, or is this what management is like?

Ask HN: How to Reduce Time Spent Crimping?

KV Cache Transform Coding for Compact Storage in LLM Inference

A quantitative, multimodal wearable bioelectronic device for stress assessment

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

How to shoot yourself in the foot – 2026 edition

Eight More Months of Agents

From Human Thought to Machine Coordination

The new X API pricing must be a joke

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

Python Only Has One Real Competitor