Show HN:Built a 200k-edge market knowledge graph to filter false dip-buy signals

5•gano•1mo ago

I’ve been experimenting with a graph-based approach to a classic trading problem: why most dip-buying strategies can’t tell the difference between a temporary overreaction and a genuine structural collapse.

Most systems treat a −5% move the same regardless of context. My hypothesis was that where a company sits in the market’s structure matters more than the price move itself.

The engineering idea

I built a knowledge graph of the U.S. public markets with ~207k edges across ~21 relationship types, organized into four layers:

Operational: supply-chain relationships (SUPPLIES_TO, PRODUCES)

Flow: ETF and institutional ownership plumbing

Social: board interlocks (SHARES_DIRECTOR_WITH)

Environmental: geography / competition

For each layer, I compute centrality scores using PageRank-style methods (with inverse-degree weighting to avoid ETF super-nodes dominating).

These structural features are then combined with basic price/volume context and fed into a tree-based model (XGBoost) to rank stocks after sharp drawdowns

What surprised me

When I validated the rankings out-of-sample (2024–2025, using Alphalens to avoid look-ahead issues): * Operational and Flow edges provided most of the lift * Social edges (board interlocks) added much less than I expected * Graph features roughly doubled ranking quality versus price-only baselines This wasn’t obvious to me going in — I expected “social” connections to matter more.

Why I’m posting

I’m in the process of turning this from a research notebook into a production dashboard, and before I lock in the graph schema I’d love feedback from people who’ve built large graphs in other domains. In particular: * Have you seen board-interlock / social edges be predictive elsewhere? * Are there graph normalization tricks you’ve found essential at this scale? * Any pitfalls you’ve hit when mixing heterogeneous edge types?

Happy to answer questions about the graph construction, centrality calculations, or validation setup.

Comments

x______________•1mo ago

As interesting as this sounds and not to devalue your work, but this reads more like "Tell HN" rather than "Show HN". Can you offer any visualizations to help someone understand what you're working on?

gano•1mo ago

Fair point — thanks for calling that out.

I didn’t include visuals initially since this is still research code, but I added two high-level, conceptual artifacts to make the work more concrete (no implementation details):

Architecture overview: https://gist.githack.com/rahuludacity/d787343ca72be97ea1ae51... Illustrative case study (signal vs price divergence): https://gist.githack.com/rahuludacity/be9fd41193b96c4061bf00...

The goal of both is just to show the shape of the system and the kind of signal it surfaces, not to make trading claims. I posted early because I’m still deciding which graph layers/edges are actually worth keeping before locking in the visualization layer. Very open to feedback on whether these visuals make the problem clearer or if there’s a better way to “show” this kind of system.

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

University of Waterloo Webring

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)

Human Systems Research Submolt

The Threads Algorithm Loves Rage Bait

Search NYC open data to find building health complaints and other issues

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Show HN: Grovia – Long-Range Greenhouse Monitoring System

Ask HN: The Coming Class War

Mind the GAAP Again

The Yardbirds, Dazed and Confused (1968)

Agent News Chat – AI agents talk to each other about the news

Do you have a mathematically attractive face?

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality

The Cascading Effects of Repackaged APIs [pdf]

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]