Most systems treat a −5% move the same regardless of context. My hypothesis was that where a company sits in the market’s structure matters more than the price move itself.
The engineering idea
I built a knowledge graph of the U.S. public markets with ~207k edges across ~21 relationship types, organized into four layers:
Operational: supply-chain relationships (SUPPLIES_TO, PRODUCES)
Flow: ETF and institutional ownership plumbing
Social: board interlocks (SHARES_DIRECTOR_WITH)
Environmental: geography / competition
For each layer, I compute centrality scores using PageRank-style methods (with inverse-degree weighting to avoid ETF super-nodes dominating).
These structural features are then combined with basic price/volume context and fed into a tree-based model (XGBoost) to rank stocks after sharp drawdowns
What surprised me
When I validated the rankings out-of-sample (2024–2025, using Alphalens to avoid look-ahead issues): * Operational and Flow edges provided most of the lift * Social edges (board interlocks) added much less than I expected * Graph features roughly doubled ranking quality versus price-only baselines This wasn’t obvious to me going in — I expected “social” connections to matter more.
Why I’m posting
I’m in the process of turning this from a research notebook into a production dashboard, and before I lock in the graph schema I’d love feedback from people who’ve built large graphs in other domains. In particular: * Have you seen board-interlock / social edges be predictive elsewhere? * Are there graph normalization tricks you’ve found essential at this scale? * Any pitfalls you’ve hit when mixing heterogeneous edge types?
Happy to answer questions about the graph construction, centrality calculations, or validation setup.
x______________•1mo ago
gano•1mo ago
I didn’t include visuals initially since this is still research code, but I added two high-level, conceptual artifacts to make the work more concrete (no implementation details):
Architecture overview: https://gist.githack.com/rahuludacity/d787343ca72be97ea1ae51... Illustrative case study (signal vs price divergence): https://gist.githack.com/rahuludacity/be9fd41193b96c4061bf00...
The goal of both is just to show the shape of the system and the kind of signal it surfaces, not to make trading claims. I posted early because I’m still deciding which graph layers/edges are actually worth keeping before locking in the visualization layer. Very open to feedback on whether these visuals make the problem clearer or if there’s a better way to “show” this kind of system.