I built a system to analyze Formula 1 driver performance using race data from 1950-2024. I created “DNA profiles” by extracting behavioral patterns from lap times, qualifying results, and race positions.
The most surprising finding: drivers with the highest “consistency” scores rarely win championships. Ultra-consistent drivers (90+ scores) have won only 12% of all titles, while “inconsistent” drivers dominate.
Technical approach:
• Processed 70+ years of race results, qualifying data, pit stops
• Built scoring algorithms that normalize for car performance and era differences
• Used teammate comparisons to isolate driver skill from equipment
• Created weighted metrics for traits like aggression (overtaking frequency), consistency (finishing reliability), racecraft (position changes)
The data reveals systematic biases we don’t usually think about.
“Aggressive” drivers often score low because successful drivers start from pole and don’t need to overtake. Era effects are massive - 1980s drivers appear more aggressive purely due to different racing conditions.
Most interesting pattern: the inverse relationship between consistency and championships. Perfect consistency means you’re not taking the calculated risks needed to win races.
Built interactive visualizations to explore these patterns across different eras and driver comparisons. The dataset is rich enough that new insights keep emerging.
Anyone else worked with sports performance data? The challenges around normalizing across eras and equipment changes are fascinating from a data science perspective.
jack_lynch•3h ago
The most surprising finding: drivers with the highest “consistency” scores rarely win championships. Ultra-consistent drivers (90+ scores) have won only 12% of all titles, while “inconsistent” drivers dominate.
Technical approach: • Processed 70+ years of race results, qualifying data, pit stops • Built scoring algorithms that normalize for car performance and era differences • Used teammate comparisons to isolate driver skill from equipment • Created weighted metrics for traits like aggression (overtaking frequency), consistency (finishing reliability), racecraft (position changes)
The data reveals systematic biases we don’t usually think about.
“Aggressive” drivers often score low because successful drivers start from pole and don’t need to overtake. Era effects are massive - 1980s drivers appear more aggressive purely due to different racing conditions. Most interesting pattern: the inverse relationship between consistency and championships. Perfect consistency means you’re not taking the calculated risks needed to win races.
Built interactive visualizations to explore these patterns across different eras and driver comparisons. The dataset is rich enough that new insights keep emerging.
Anyone else worked with sports performance data? The challenges around normalizing across eras and equipment changes are fascinating from a data science perspective.