I spent the last week analyzing a proprietary dataset of 49,315 delisted US stocks to understand "survivorship bias" from a microstructure perspective. Standard backtests usually ignore these companies, but I wanted to see what the order book looks like right before a firm goes to zero.
I built a pipeline to index 84GB of minute-level OHLCV data and cluster the failures using K-Means.
Key Finding: "Type III: The Zombie Churn." Stocks that have already lost 90%+ of their value, but volume explodes to 48x normal levels while price stays flat. It looks like a distinct signature of retail bag-holding vs. institutional exit.
The repo has the indexer script, the clustering logic, and the "Death Metrics" CSV for the top 1,000 failures (including Lehman Brothers and Enron).
Happy to answer questions about the parquet engineering or the metrics used!
New_Person•2h ago
I spent the last week analyzing a proprietary dataset of 49,315 delisted US stocks to understand "survivorship bias" from a microstructure perspective. Standard backtests usually ignore these companies, but I wanted to see what the order book looks like right before a firm goes to zero.
I built a pipeline to index 84GB of minute-level OHLCV data and cluster the failures using K-Means.
Key Finding: "Type III: The Zombie Churn." Stocks that have already lost 90%+ of their value, but volume explodes to 48x normal levels while price stays flat. It looks like a distinct signature of retail bag-holding vs. institutional exit.
The repo has the indexer script, the clustering logic, and the "Death Metrics" CSV for the top 1,000 failures (including Lehman Brothers and Enron).
Happy to answer questions about the parquet engineering or the metrics used!