Building the F1 Driver Performance Index

Ask any Formula 1 fan who the greatest driver of all time is and you will get a different answer depending on who you ask, and usually a heated argument to go along with it. Someone throws out Hamilton's record 103 wins. Someone counters with Schumacher's five consecutive titles. Someone else brings up Senna's raw qualifying speed or Fangio's five championships in an era where finishing a race alive was genuinely not guaranteed. Then there is Max Verstappen, who just won three consecutive titles faster than anyone in history.

The problem with all of these arguments is that they rely on cherry picked statistics, nostalgia, recency bias, or pure vibes. I wanted to see if I could do better and build something rigorous enough to hold up to scrutiny but honest enough to admit what it cannot answer.

The result is the F1 Driver Performance Index (DPI): a composite statistical index that scores every driver across every season from 1950 to 2025 using 16 indicators, era adjusted z scores, Bayesian shrinkage, and bootstrap confidence intervals. Below is how it is built, what each indicator measures, and who lands on top.

The Core Problem: Comparing Across Eras

The biggest challenge in any cross era comparison is that the sport looks completely different depending on which decade you are looking at. In 1955, Fangio competed in roughly eight races. In 2024, Verstappen competed in twenty four. Point systems have changed multiple times. Car reliability has gone from "will it finish the race" to nearly guaranteed barring strategy errors. Dominant constructor eras including Ferrari in the early 2000s, Red Bull from 2010 to 2013 and again 2021 to 2023, and Mercedes through most of the hybrid era mean that raw performance statistics tell you as much about the car as they do the driver.

My solution was to never compare a 1955 driver directly to a 2024 driver on raw numbers. Instead, every indicator is calculated as a rate or a percentile, then z scored within season. A z score answers the question: how many standard deviations above or below their peers was this driver in this specific season? That way, Fangio's 1954 dominance and Verstappen's 2023 dominance are measured on the same scale relative to the field they were actually racing against.

Bayesian Shrinkage: Handling Small Samples

One issue with z scores is that they are unreliable for small samples. A driver who wins their only two races of a season has a 100% win rate, but that is almost certainly noise rather than signal. Bayesian shrinkage fixes this by pulling small sample estimates toward the mean. The weight applied to a driver's observed score scales with how many races they ran:

K_SMOOTHING = 10

w = races_entered / (races_entered + K_SMOOTHING)
z_shrunk = w * z_observed

With K=10, a driver with 2 races gets a weight of 0.17, so their score is pulled 83% toward zero. A driver with 20 races gets 0.67. By a full modern season the shrinkage is mild enough to barely matter, which is exactly the behavior you want.

PCA vs. Expert Weights

I combined the 16 indicators into a single season DPI using two methods: Principal Component Analysis (PCA), which lets the data decide the weights automatically, and expert weights I set myself based on what I believed should matter most in F1.

The correlation between the two methods came back at 0.937, meaning they broadly agree on the hierarchy. But the PCA tended to reward drivers on dominant cars more than expert weights did, because raw performance indicators like wins, podiums, and points are highly correlated and PCA amplifies that. Drivers like Bottas 2019 or Pérez 2021 ranked much higher under PCA than under expert weights because they accumulated strong numbers in top tier machinery while consistently being outperformed by their teammates. Expert weights, which emphasize car adjusted metrics like WAR and teammate H2H, are the primary method for exactly that reason.

Bootstrap Confidence Intervals

I ran 1,000 bootstrap simulations per driver: randomly resampling their season DPIs with replacement and recalculating career scores each time to generate 95% confidence intervals. If two drivers' ranges overlap, we cannot confidently say one is better than the other.

Quality-Weighted H2H: An Iterative Correction

The initial DPI treated all teammate H2H comparisons equally. Verstappen outperforming Pérez and Hamilton outperforming Alonso contributed the same amount to their scores, which is methodologically weak. Beating a historically great teammate should count for more than beating a backmarker.

The fix uses an iterative approach borrowed from Elo ratings in chess: your rating gain depends on who you beat, not just that you won. Each driver is assigned a quality multiplier based on their current career DPI relative to the field average. That multiplier scales the H2H indicator values for every driver who faced them as a teammate. The updated DPIs then become the new quality weights for the next iteration, and the process repeats until the scores stop moving. In this dataset it converged in four iterations. We also ran a sensitivity analysis on the minimum weight floor (0.25, 0.35, 0.45, and 0.50) — all four values produced the same final rankings, which is a good sign that the result is not sensitive to that design choice.

The effect was meaningful. Verstappen dropped from first to sixth. His H2H record, while dominant, was built largely against teammates like Pérez, Gasly, and Lawson, none of whom score highly in the final DPI. Senna's H2H record, by contrast, includes sustained head to head competition against Prost at McLaren, arguably the strongest intra-team rivalry in F1 history. When you weight those wins by opponent quality, Senna moves to the top.

Fangio also moved significantly, from seventh to second. Across five different constructors and five championships, he consistently dominated teammates at competitive teams, several of whom were genuinely strong drivers. The quality weighting rewarded that history in a way the raw H2H scores did not.

The 16 Indicators

Each indicator below is a season level score: one value per driver per year. They are z scored within season and Bayesian shrunk before being combined into the final DPI. Top 10 lists show the best single season performances in the dataset (1950 to 2025) unless otherwise noted.

win_rate = wins / sum(race_weight)

The most intuitive measure of dominance. Rather than raw wins, we divide by a DNF adjusted race count. Mechanical retirements reduce the denominator while driver fault crashes do not. This prevents penalizing drivers for unreliable machinery while still holding them accountable for their own errors.

#	Driver	Season	Win Rate
1	Ascari	1952	92.3%
2	Clark	1965	88.9%
3	Verstappen	2023	86.4%
4	Fangio	1955	76.2%
5	Clark	1963	75.7%
6	Fangio	1950	75.0%
7	Fangio	1954	75.0%
8	Brabham	1960	74.1%
9	Rindt	1970	74.1%
10	Schumacher	2004	73.2%

podium_rate = top_3_finishes / sum(race_weight)

Same structure as win rate but counting top 3 finishes. Captures consistent front running performance. A driver who finishes P2 or P3 every weekend is still performing at an elite level even without winning. Uses the same DNF adjusted denominator.

#	Driver	Season	Podium Rate
1	Schumacher	2002	100.0%
2	Clark	1963	97.3%
3	Fangio	1957	96.0%
4	Verstappen	2023	95.5%
5	Fangio	1955	95.2%
6	González	1951	95.2%
7	Prost	1988	94.9%
8	Hamilton	2014	94.1%
9	Hamilton	2015	93.2%
10	Ascari	1952	92.3%

points_pct = points_scored / max_points_that_race

Raw points are meaningless across eras because the scoring system has changed multiple times. This indicator converts each race's points to a percentage of the maximum possible (the winner's score) to remove the scale difference between point systems, then z scores within season to adjust for field competitiveness. A driver who scored full points every race would have a score of 1.0.

#	Driver	Season	Points/Race
1	Verstappen	2023	94.9%
2	Ascari	1952	92.3%
3	Fangio	1955	91.3%
4	Clark	1965	88.9%
5	Clark	1963	87.7%
6	Vettel	2013	87.0%
7	Fangio	1957	86.7%
8	Fangio	1954	86.1%
9	Hamilton	2020	85.6%
10	Vettel	2011	84.8%

fastest_lap_pct = 1 − (rank − 1) / (field_size − 1)

How a driver's single lap race pace compares to the field. Rank 1 (fastest lap of the race) becomes a percentile of 1.0 and last place becomes 0.0. Weighted lower than other indicators because fastest laps are sometimes tactical — teams occasionally pit a driver for soft tyres at the end of a race specifically to steal the fastest lap bonus point. Data is available from 1994 onward only.

#	Driver	Season	FL Percentile
1	Schumacher	2004	97.1%
2	Räikkönen	2008	95.2%
3	Räikkönen	2005	94.0%
4	Schumacher	2006	93.1%
5	Verstappen	2023	92.3%
6	Alonso	2007	92.2%
7	Hamilton	2015	91.7%
8	Hamilton	2020	91.4%
9	Räikkönen	2007	90.2%
10	Verstappen	2021	90.0%

quali_h2h = qualifying_wins / pairwise_comparisons

What percentage of the time did a driver out qualify their teammate? This is one of the purest car controlled signals in the data: both drivers in a team run identical machinery, so if you consistently out qualify your teammate that difference is almost entirely down to the driver. For teams with three or more drivers (common pre 1986), all pairwise comparisons are calculated so the metric is consistent across different team structures.

Many drivers achieved a perfect 100% H2H rate in qualifying for a given season. The table below shows seasons with at least 10 pairwise comparisons. Notable perfect seasons include Häkkinen 1994 against Brundle, Schumacher 1994 against Herbert and Lehto, and Verstappen's dominant 2020 season against Albon.

#	Driver	Season	H2H Rate
1	Häkkinen	1994	100.0%
2	Schumacher	1994	100.0%
3	Verstappen	2020	100.0%
4	Verstappen	2025	100.0%
5	Alonso	2008	100.0%
6	Alonso	2018	100.0%
7	Alonso	2025	100.0%
8	Frentzen	1995	100.0%
9	Herbert	1997	100.0%
10	Albon	2023	100.0%

race_h2h = race_wins_vs_teammate / pairwise_comparisons

Same concept as qualifying H2H but for race finishing positions. Controls for car performance across full race distances rather than a single flying lap. Race H2H is weighted more heavily than qualifying H2H because race pace includes tyre management, racecraft, and decision making under pressure and not just outright speed.

#	Driver	Season	H2H Rate
1	Ascari	1952	100.0%
2	Fangio	1954	100.0%
3	Fangio	1958	100.0%
4	Button	2005	100.0%
5	Hamilton	2016	93.8%
6	Senna	1989	92.9%
7	Verstappen	2022	92.3%
8	Schumacher	2002	91.7%
9	Hamilton	2020	91.7%
10	Clark	1965	90.0%

The dataset includes Indianapolis 500 results from 1950 to 1960, which counted toward the World Championship. Some early era perfect records reflect very small teammate samples from Indy only competitors rather than full season dominance. The table above excludes those entries and shows the most meaningful results.

WAR = driver_avg_finish − constructor_season_avg_finish

Borrowed from baseball analytics. The core idea: what would a typical driver have scored in the same car? We calculate each constructor's average finishing position by season and then measure how much each driver exceeded or fell short of that baseline. Positive WAR means the driver finished better than their car would predict on average. This is one of the most heavily weighted indicators because it most aggressively strips out car performance from the result.

#	Driver	Season	WAR (positions)
1	Clark	1963	+8.94
2	McLaren	1961	+7.76
3	Hill	1962	+7.56
4	Salvadori	1961	+7.11
5	Gurney	1961	+6.80
6	Peterson	1971	+6.65
7	Fangio	1957	+6.52
8	Clark	1961	+6.17
9	Fittipaldi	1972	+6.17
10	Moss	1961	+6.15

The 1961 season appears four times because the Sharknose Ferrari was so dominant that any driver in non Ferrari machinery who finished well outperformed their constructor baseline by a wide margin. Clark's 1963 season in the Lotus 25 stands out as the single greatest car adjusted performance in the dataset.

dominance_score = avg_finish_position_percentile

Measures not just whether a driver won, but how thoroughly they dominated the field on average. A score of 1.0 means finishing first every race and 0.0 means finishing last every race. This differs from win rate because it captures the full distribution of results: a driver who finishes P2 every race scores very high even without winning. Weighted lower than other indicators because dominant margins reflect car advantage as much as driver skill.

#	Driver	Season	Dominance
1	Verstappen	2023	98.6%
2	Schumacher	2002	98.0%
3	Fangio	1954	97.7%
4	Clark	1963	95.7%
5	Hamilton	2020	95.4%
6	Hamilton	2019	92.7%
7	Vettel	2011	92.4%
8	Vettel	2013	92.2%
9	Hill	1961	92.2%
10	Hamilton	2015	91.7%

consistency = 1 / (CV + ε) where CV = std(finish_pct) / mean(finish_pct)

Measures how predictable a driver's performance is race to race. We use the Coefficient of Variation (CV) of finishing position percentiles: standard deviation divided by the mean. CV normalizes for performance level so a front runner who varies between P1 and P3 can score the same as a midfield driver who varies between P8 and P10. Only races where the driver finished in the top 70% of the field are counted, preventing backmarkers from gaming the metric by being consistently slow.

#	Driver	Season	Consistency
1	Ascari	1952	50.0
2	Schumacher	2002	32.2
3	González	1951	28.1
4	Hamilton	2014	27.7
5	Surtees	1964	27.4
6	Vettel	2011	26.2
7	McLaren	1960	24.0
8	Fangio	1954	23.2
9	Berger	1991	21.2

The score is capped at 50 to prevent infinity when variance is near zero. Ascari 1952 won 6 of 7 races and finished on the podium in the other, giving him a near zero coefficient of variation — a legitimately perfect season of consistency.

clean_driving_score = 1 − (driver_fault_dnfs / races_entered)

Captures error proneness: how often does a driver crash out due to their own mistakes versus mechanical failures beyond their control? Status codes classified as driver fault include Accident, Collision, Spun off, Collision damage, Stalled, and Fatal accident. Collisions are weighted at 0.5 since fault is often shared. The score is inverted so higher is better. Because many drivers complete full seasons without a single driver fault DNF, the career level average is a more meaningful comparison than any single season.

#	Driver	Career Avg
1	Fangio	98.3%
2	Jabouille	98.0%
3	Salvadori	98.0%
4	Gurney	97.7%
5	Norris	97.4%
6	Ricciardo	97.3%
7	Clark	97.3%
8	Moss	97.3%
9	Ocon	97.2%
10	Piastri	97.2%

Career averages shown here require a minimum of 50 races to filter out drivers with short careers. Fangio's 98.3% across 66 races is remarkably clean for an era when racing was genuinely dangerous and mechanical failures were common.

championship_score = championship_pct × competitiveness_weight

Championship position matters: it is what every driver races for. But not all championships are equal. A title won by 1 point in a season long fight is more impressive than a dominant wire to wire campaign. This indicator calculates where a driver finished in the standings as a percentile of the full field, then multiplies by a competitiveness weight based on how tight the P1 to P2 gap was at the end of the season.

#	Driver	Season	Champ Score
1	Norris	2025	0.995
2	Lauda	1984	0.993
3	Räikkönen	2007	0.991
4	Hamilton	2008	0.990
5	Vettel	2012	0.989
6	Schumacher	1994	0.989
7	Rosberg	2016	0.987
8	Vettel	2010	0.984
9	Piquet	1981	0.980
10	Verstappen	2021	0.980

These are the closest championship finishes where the eventual champion finished P1. Norris 2025 and Lauda's half point 1984 title sit at the top because the gap to P2 was razor thin relative to the total points available that season.

longevity_score = races_weighted_avg(WAR) across full career

Rewards sustained excellence over time. A driver who performs at an elite level for 15 seasons should score higher than an equally dominant driver who only competed for 5. Longevity is calculated as a races weighted average of WAR across all seasons: each season's contribution is proportional to how many races the driver ran that year. This separates great careers from great individual seasons.

#	Driver	Career Avg WAR
1	McLaren	+1.57
2	Verstappen	+1.56
3	Gurney	+1.48
4	Hill	+1.45
5	Stewart	+1.33
6	Prost	+1.32
7	Piquet	+1.28
8	Peterson	+1.27
9	Fittipaldi	+1.11
10	Salo	+1.06

Minimum 80 career races. Bruce McLaren at the top is a genuine result: across his entire career he consistently outperformed the cars he drove, particularly in the early 1960s when Cooper was fading from competitiveness. Verstappen at 254 races maintaining a +1.56 average is the strongest modern result.

peak_performance = best_3yr_rolling_avg(WAR)

While longevity measures the full career arc, peak performance measures the absolute best stretch. We look at rolling 3 year windows of WAR scores and take the highest average. This captures drivers who may have had a shorter career but an absolutely dominant peak, which is central to the argument for Senna, Clark, or Rindt against drivers with longer but more uneven careers. The indicator rewards both height and consistency of a driver's best years rather than their career total.

Single season peak data is strongly influenced by Indianapolis 500 participants from the 1950s who only ran in Indianapolis (which counted toward the World Championship). Their results came from a very different race format with unusual field dynamics. The rolling 3 year window reduces but does not fully eliminate this effect for drivers who only competed in one or two seasons.

adaptability = (0.6 × early_h2h) + (0.4 × max(slope, 0))

Measures how quickly a driver adapts when placed in a new environment, either after a team change or in a major regulation change year. For adaptation seasons, the driver's races are split into 5 race chunks and a linear regression is fit to their teammate H2H rate over time. The slope measures improvement. A driver who started poorly against their new teammate but improved rapidly scores higher than one who was immediately competitive but stagnant. Non adaptation seasons receive a neutral score of 0.5.

#	Driver	Season	Adaptability
1	Fangio	1957	96.1%
2	Gurney	1961	95.0%
3	Modena	1991	93.8%
4	Hill	1994	93.3%
5	Häkkinen	1999	91.7%
6	Senna	1993	90.5%
7	Schumacher	2000	89.3%
8	Prost	1987	88.6%
9	Alonso	2014	87.5%
10	Hamilton	2013	85.7%

Filtered to team change and regulation change seasons with at least 8 pairwise teammate comparisons, and excluding early era Indianapolis only competitors. Fangio's 1957 season at Maserati after leaving Ferrari is the top result: he dominated his teammates immediately in a team change year and never looked back.

grid_improvement = (grid_position − finish_position) / field_size

How many positions does a driver gain from qualifying to the race finish, normalized by field size? A positive number means the driver finished ahead of where they started on average. This rewards racecraft: the ability to overtake, defend, and manage tyres better than qualifying pace would suggest. Most meaningful for drivers who consistently qualified outside their natural performance window.

#	Driver	Season	Avg Gain
1	Ginther	1964	+20.3%
2	Pérez	2023	+17.0%
3	Beltoise	1969	+15.7%
4	Siffert	1965	+13.9%
5	Heidfeld	2008	+12.4%
6	Ocon	2022	+12.4%
7	Barrichello	2005	+11.6%
8	Schumacher	1999	+11.2%
9	Hulme	1970	+11.1%
10	Watson	1982	+11.0%

Filtered to drivers with an average qualifying position of 12th or better and at least 10 races, so the table reflects genuine race craft rather than backmarkers who had nowhere to go but forward. Pérez 2023 is a notable modern result: despite being Verstappen's teammate and often struggling in qualifying, he consistently moved through the field on race day.

pole_rate = poles / qualifying_sessions

How often does a driver put the car on pole? Pole position reflects raw one lap pace and shows how often the driver was the fastest person on track on any given race weekend. Available from 1994 onward using qualifying data; estimated from race results for earlier seasons where qualifying records are incomplete.

#	Driver	Season	Pole Rate
1	Vettel	2011	78.9%
2	Häkkinen	1998	71.4%
3	Hamilton	2020	62.5%
4	Villeneuve	1997	60.0%
5	Verstappen	2023	59.1%
6	Hamilton	2015	57.9%
7	Rosberg	2014	57.9%
8	Hill	1996	57.1%
9	Hamilton	2016	57.1%
10	Hamilton	2017	55.0%

Final DPI: Top 25 All Time

Each driver's season DPIs are combined into a career score using a Bayesian estimate: seasons are weighted by races entered and drivers with fewer seasons are pulled toward the population average. The confidence intervals come from 1,000 bootstrap simulations. Where intervals overlap, the ranking difference is not statistically meaningful.

Tier 1 — Statistically inseparable at the top (overlapping CIs)

#	Driver	Seasons	Races	Peak Season	Career DPI	95% CI
1	Senna	11	161	1991	162.6	143.4 to 183.7
2	Fangio	9	66	1954	159.3	151.3 to 166.4
3	Hamilton	19	380	2018	158.8	141.6 to 175.5
4	Prost	13	202	1993	158.3	144.8 to 169.6
5	Schumacher	20	314	2002	152.9	131.4 to 173.2
6	Verstappen	12	254	2023	152.2	133.6 to 172.3
7	Clark	11	92	1963	146.0	134.3 to 159.6
8	Stewart	9	100	1971	144.7	125.5 to 163.9
9	Alonso	22	428	2006	136.7	123.7 to 150.4
10	Hunt	7	93	1976	136.5	100.7 to 169.9
11	Vettel	17	308	2013	136.4	117.7 to 155.6
12	Norris	7	152	2025	135.0	107.1 to 163.3
13	Scheckter	9	112	1979	134.9	116.9 to 149.3
14	Hill	8	116	1994	133.4	104.8 to 162.7
15	Moss	18	123	1958	131.7	126.9 to 136.2
16	Häkkinen	11	163	1998	130.5	112.0 to 149.2
17	Fittipaldi	12	156	1972	128.4	112.6 to 144.9
18	Rosberg	11	206	2016	126.2	112.1 to 140.3
19	Leclerc	8	173	2022	126.1	107.5 to 143.6
20	Mansell	15	190	1992	125.0	104.7 to 147.2
21	Andretti	14	128	1978	122.9	97.9 to 148.3
22	Hulme	13	145	1968	122.1	119.1 to 124.6
23	Rindt	7	60	1970	120.8	102.9 to 140.3
24	Laffite	13	176	1979	120.3	107.9 to 133.5
25	Farina	6	37	1950	120.1	112.7 to 128.1

The Answer: We Don't Know

Ayrton Senna sits at number one by DPI. But the correct headline is not "Senna is the greatest F1 driver of all time." The top six — Senna, Fangio, Hamilton, Prost, Schumacher, and Verstappen — all have overlapping 95% confidence intervals. We cannot statistically separate them. They are operating in a tier of their own, and the data does not give us enough precision to order them definitively.

Verstappen's drop from first to sixth after quality-weighted H2H is the most notable single movement in the index. It is not a knock on his ability. It is a statement about his competition. He has been dominant in ways that are genuinely historic, but his H2H record has not yet included the kind of elite intra-team rivalry — a Prost, an Alonso in equal machinery — that would cement his advantage over drivers like Senna or Fangio. If he spends a few seasons against a teammate of that caliber, this ranking could shift again.

I find that more interesting than a clean answer would be. It is what the data actually says.

What I Would Do Differently

The biggest limitation is data availability before 1994. Qualifying data only starts then, which means Indicators 4, 5, and 16 are missing for the sport's first four decades. Drivers from the 1950s and 60s are being evaluated on fewer indicators than modern drivers, which creates a real gap in the comparison.

I would also love to add race pace telemetry if it ever becomes accessible. The gap between a driver's fastest lap and their average race lap is a much cleaner measure of pace than what is currently available. And eventually I want to run the PCA weighted DPI through the same bootstrap test to see if those confidence intervals are tighter or wider, which would tell us something interesting about which weighting method is more stable.

For now, the index lives in a local PostgreSQL database and updates at the end of each season. The full Jupyter notebook walks through every step of the methodology in detail. Thanks for reading, and if you have thoughts on the methodology or disagree with any of the indicator choices, reach out.

View the full analysis notebook on GitHub →

Building the F1 Driver Performance Index

The Core Problem: Comparing Across Eras

Bayesian Shrinkage: Handling Small Samples

PCA vs. Expert Weights

Bootstrap Confidence Intervals

Quality-Weighted H2H: An Iterative Correction

The 16 Indicators

Win Rate

Podium Rate

Points Per Race — Era Normalized

Fastest Lap Performance

Teammate H2H — Qualifying

Teammate H2H — Race

WAR — Wins Above Replacement

Field Dominance

Consistency Index

Driver Fault DNF Rate

Championship Score

Career Longevity

Peak Performance Window

Adaptability

Grid to Finish Improvement

Pole Position Rate

Final DPI: Top 25 All Time

The Answer: We Don't Know

What I Would Do Differently