Most systems treat a −5% move the same regardless of context. My hypothesis was that where a company sits in the market’s structure matters more than the price move itself.
The engineering idea
I built a knowledge graph of the U.S. public markets with ~207k edges across ~21 relationship types, organized into four layers:
Operational: supply-chain relationships (SUPPLIES_TO, PRODUCES)
Flow: ETF and institutional ownership plumbing
Social: board interlocks (SHARES_DIRECTOR_WITH)
Environmental: geography / competition
For each layer, I compute centrality scores using PageRank-style methods (with inverse-degree weighting to avoid ETF super-nodes dominating).
These structural features are then combined with basic price/volume context and fed into a tree-based model (XGBoost) to rank stocks after sharp drawdowns
What surprised me
When I validated the rankings out-of-sample (2024–2025, using Alphalens to avoid look-ahead issues): * Operational and Flow edges provided most of the lift * Social edges (board interlocks) added much less than I expected * Graph features roughly doubled ranking quality versus price-only baselines This wasn’t obvious to me going in — I expected “social” connections to matter more.
Why I’m posting
I’m in the process of turning this from a research notebook into a production dashboard, and before I lock in the graph schema I’d love feedback from people who’ve built large graphs in other domains. In particular: * Have you seen board-interlock / social edges be predictive elsewhere? * Are there graph normalization tricks you’ve found essential at this scale? * Any pitfalls you’ve hit when mixing heterogeneous edge types?
Happy to answer questions about the graph construction, centrality calculations, or validation setup.