frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Was going to share my work

1•hiddenarchitect•3m ago•0 comments

Pitchfork: A devilishly good process manager for developers

https://pitchfork.jdx.dev/
1•ahamez•3m ago•0 comments

You Are Here

https://brooker.co.za/blog/2026/02/07/you-are-here.html
2•mltvc•7m ago•0 comments

Why social apps need to become proactive, not reactive

https://www.heyflare.app/blog/from-reactive-to-proactive-how-ai-agents-will-reshape-social-apps
1•JoanMDuarte•8m ago•1 comments

How patient are AI scrapers, anyway? – Random Thoughts

https://lars.ingebrigtsen.no/2026/02/07/how-patient-are-ai-scrapers-anyway/
1•samtrack2019•8m ago•0 comments

Vouch: A contributor trust management system

https://github.com/mitchellh/vouch
1•SchwKatze•8m ago•0 comments

I built a terminal monitoring app and custom firmware for a clock with Claude

https://duggan.ie/posts/i-built-a-terminal-monitoring-app-and-custom-firmware-for-a-desktop-clock...
1•duggan•9m ago•0 comments

Tiny C Compiler

https://bellard.org/tcc/
1•guerrilla•10m ago•0 comments

Y Combinator Founder Organizes 'March for Billionaires'

https://mlq.ai/news/ai-startup-founder-organizes-march-for-billionaires-protest-against-californi...
1•hidden80•11m ago•1 comments

Ask HN: Need feedback on the idea I'm working on

1•Yogender78•11m ago•0 comments

OpenClaw Addresses Security Risks

https://thebiggish.com/news/openclaw-s-security-flaws-expose-enterprise-risk-22-of-deployments-un...
1•vedantnair•12m ago•0 comments

Apple finalizes Gemini / Siri deal

https://www.engadget.com/ai/apple-reportedly-plans-to-reveal-its-gemini-powered-siri-in-february-...
1•vedantnair•12m ago•0 comments

Italy Railways Sabotaged

https://www.bbc.co.uk/news/articles/czr4rx04xjpo
3•vedantnair•13m ago•0 comments

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•fanf2•14m ago•0 comments

Nintendo Wii Themed Portfolio

https://akiraux.vercel.app/
1•s4074433•18m ago•1 comments

"There must be something like the opposite of suicide "

https://post.substack.com/p/there-must-be-something-like-the
1•rbanffy•21m ago•0 comments

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

2•amichail•22m ago•0 comments

Show HN: Engineering Perception with Combinatorial Memetics

1•alan_sass•28m ago•2 comments

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

https://steamdaily.xyz
1•itshellboy•30m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
1•spenvo•30m ago•0 comments

Just Started Using AmpCode

https://intelligenttools.co/blog/ampcode-multi-agent-production
1•BojanTomic•31m ago•0 comments

LLM as an Engineer vs. a Founder?

1•dm03514•32m ago•0 comments

Crosstalk inside cells helps pathogens evade drugs, study finds

https://phys.org/news/2026-01-crosstalk-cells-pathogens-evade-drugs.html
2•PaulHoule•33m ago•0 comments

Show HN: Design system generator (mood to CSS in <1 second)

https://huesly.app
1•egeuysall•33m ago•1 comments

Show HN: 26/02/26 – 5 songs in a day

https://playingwith.variousbits.net/saturday
1•dmje•34m ago•0 comments

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

https://github.com/Paraxiom/topological-coherence
1•slye514•36m ago•1 comments

Top AI models fail at >96% of tasks

https://www.zdnet.com/article/ai-failed-test-on-remote-freelance-jobs/
5•codexon•36m ago•2 comments

The Science of the Perfect Second (2023)

https://harpers.org/archive/2023/04/the-science-of-the-perfect-second/
1•NaOH•37m ago•0 comments

Bob Beck (OpenBSD) on why vi should stay vi (2006)

https://marc.info/?l=openbsd-misc&m=115820462402673&w=2
2•birdculture•41m ago•0 comments

Show HN: a glimpse into the future of eye tracking for multi-agent use

https://github.com/dchrty/glimpsh
1•dochrty•42m ago•0 comments
Open in hackernews

New budget financial API, based on EDGAR data

7•jgfriedman1999•6mo ago
Hey everyone,

I'm the developer of an open-source (MIT License) python package to convert SEC submissions into useful data. I've recently put a bunch of stuff in the cloud for a nominal convenience fee.

Cloud:

1. SEC Websocket - notifies you of new submissions as they come out. (Free)

2. SEC Archive - download SEC submissions without rate limits. ($1/100,000 downloads)

3. MySQL RDS ($1/million rows returned)

- XBRL

- Fundamentals

- Institutional Holdings

- Insider Transactions

- Proxy Voting Records

Posting here, in case someone finds it useful.

Links:

Datamule (Package) GitHub: https://github.com/john-friedman/datamule-python

Documentation: https://john-friedman.github.io/datamule-python/datamule-python/sheet/sheet/

Get an API Key: https://datamule.xyz/dashboard2.html

Comments

jgfriedman1999•6mo ago
How it works:

Websocket:

1. Two AWS ec2 t4g.nano instances polling the SEC's RSS and EFTS endpoints. (RSS is faster, EFTS is complete). 2. When new submissions are detected, they are sent to the Websocket (t4g.micro websocket, using Go for greater concurrency). 3. Websocket sends signal to consumers.

Archive:

1. One t4g.micro instance. Receives notifications from websocket, then gets submissions SGML from the SEC. 2. If submission is over size threshold, compresses with zstandard. 3. Uploads submissions to Cloudflare R2 bucket. (Zero egress fee, just class A / B operations). 4. Cloudflare R2 bucket is proxied behind my domain, with caching.

RDS

1. ECS Fargate instances set to run daily at 9 AM UTC. 2. Downloads data from archive, then parses them, and uploads them into AWS dbt.medium MySQL RDS. 3. Also handles reconciliation for the archive in case any filings were missed.

conditionnumber•6mo ago
Cool, EDGAR is an amazing public service. I think they use Akamai as their CDN so the downloads are remarkably fast.

A few years ago I wrote an SGML parser for the full SEC PDS specification (super tedious). But I have trouble leveraging my own efforts for independent research because I don't have a reliable securities master to link against. I can't take a historical CUSIP from 13F filings and associate it to a historical ticker/return. Or my returns are wrong because of data errors so I can't fit a factor model to run an event study using Form 4 data.

I think what's missing is a serious open source effort to integrate/cleanse the various cheapo data vendors into something reasonably approximating the quality you get out of a CRSP/Compustat.

jgfriedman1999•6mo ago
Yep! Pretty sure it is still Akamai. Via testing I've noticed they cap downloads at ~6mbps from e.g. home internet, but not GitHub or AWS.

SGML parsing is fun! - I've opensourced a sgml parser here https://github.com/john-friedman/secsgml

Securities master to link against - Interesting. Here's a pipeline off the top of my head 1. Get CUSIP, nameOfIssuer, titleOfClass using the Institutional Holdings database 2. Use the company metadata crosswalk to link CUSIP + titleOfClass to nameOfIssuer to get cik https://github.com/john-friedman/datamule-data/blob/master/d... (recompiled daily using GH actions) 3. Get e.g. us-gaap:EarningsPerShareBasic from the XBRL database. Link using cik. Types of stock might be a member - so e.g. Class A, Class B? Not sure there.

For form 4, not sure what you mean by event study. Would love to know!

conditionnumber•6mo ago
Event study: A way to measure how returns respond to events. Popularized by Fama in "The Adjustment of Stock Prices to New Information" but ubiquitous in securities litigation, academic financial economics, and equity L/S research. The canonical recipe is MacKinlay's "Event Studies in Economics and Finance". Industry people tend to just use residual returns from Axioma / Barra / in house risk model.

So let's say your hypothesis is "stock go up on insider buy". Event studies help you test that hypothesis and quantify how much up / when.

Cool metadata table, I'm curious about the ticker source (Form4, 10K, some SEC metadata publications?).

My comment about CUSIP linking was trying to illustrate a more general issue: it's difficult to use SEC data extractions to answer empirical questions if you don't have a good securities master to link against (reference data + market data).

Broadly speaking a securities master will have 2 kinds of data: reference data (identifiers and dates when they're valid) and market data (price / volume / corporate actions... all the stuff you need to accurately compute total returns). CRSP/Compustat (~$40k/year?) is the gold standard for daily frequency US equities. With a decent securities master you can do many interesting things. Realistic backtests for the kinds of "use an LLM to code a strategy" projects you see all over the place these days. Or (my interest) a "papers with code" style repository that helps people learn the field.

What you worry about with bad data is getting a high tstat on a plausible sounding result that later fails to replicate when you use clean data (or worse, try to trade it). Let's say your securities master drops companies 2 weeks before they're delisted... just holding the market is going to have serious alpha. Ditto if your fundamental data reflects restatements.

On the reference data front, the Compustat security table has (from_date, thru_date, cusip, ticker, cik, name, gics sector/industry, gvkey, iid) etc all lined up and ready to go. I don't think it's possible to generate this kind of time-series from cheap data vendors. I think it could be possible to do it using some of the techniques you described, and maybe others. Eg get (company-name, cik, ticker) time-series from Form4 or 10K. Then get (security-name, cusip) time-series from the 13F security lists SEC publishes quarterly (pdfs). Then merge on date/fuzzy-name. Then validate. To get GICS you'd need to do something like extract industry/sector names from a broad index ETF's quarterly holdings reports, whose format will change a lot over the years. Lots of tedious but valuable work. Also a lot of surface area to leverage LLMs. I dunno, at this point it may be feasible to use LLMs to extract all this info (annually) from 10Ks.

On the market data front, the vendors I've seen have random errors. They tend to be worst for dividends/corporate-actions. But I've seen BRK.A trade $300 trillion on a random Wednesday. Haven't noticed correlation across vendors, so I think this one might be easy to solve. Cheap fundamental data tends to have similar defects to cheap market data.

Sorry for the long rant, I've thought about this problem for a while but never seriously worked on it. One reason I haven't undertaken the effort: validation is difficult so it's hard to tell if you're actually making progress. You can do things like make sure S&P500 member returns aggregate to SPY returns to see if you're waaay off. But detailed validation is difficult without a source of ground truth.

jgfriedman1999•6mo ago
Love the long rant.

re: metadata table - it's constructed from the SEC's submissions.zip, which they update daily. What my script does is download the zip, decompress just the bytes where the information (ticker, sic code, etc) is stored, then convert into a csv.

And yep! Agree with most of this. Currently, I'd say my data is in the stage where it's useful for startups / phd research and some hedge funds / quant stuff (at least that's who is using it so far!)

I've seen the trillion dollar trades, and they're hilarious! You see it every so often in Form 3,4,5 disclosures.

re: LLMs, this is something I'm planning to move into in a month or two. I'm mostly planning to use older NLP methods which are cheaper and faster, while using LLMs for specific stuff like structured output. e.g. wrds boardex data can be constructed from 8-k item 5.02s.

I think the biggest difficulty wrt to data is just the raw data ingest is annoying AF. My approach has been to make each step easy -> use it to build the next step.