I Don't Watch Football. I Built a Model Instead.

A public forecasting experiment — what building a probabilistic prediction system teaches you about forecasting, calibration, and uncertainty

I'll be direct about two things upfront.

First: I don't particularly follow football — which is precisely why I wanted a model rather than intuition. I know the broad shape of the sport, recognise a handful of names, and have no tribal allegiances pulling me in any direction. Enthusiasm can create bias. Indifference removes one source of it.

Second: this project started as a coding challenge — someone bet me I couldn't build a working prediction system before the first ball was kicked. The underlying codebase has been growing over two years; the World Cup-specific layer came together over the past few weeks. I made the architectural decisions and reviewed everything. The AI handled a large share of the implementation, for both code and text. That is worth saying plainly.

What I do care about is the forecasting problem itself. The World Cup happens to be a near-perfect laboratory for it: 104 matches over six weeks, outcomes that are unambiguous and public, results that arrive quickly and cannot be retroactively adjusted, and a global audience creating a rich external signal against which a model's predictions can be measured. It could have been tennis or election results. The sport is incidental. The public accountability is not.

The correct test of this exercise is whether a model built on free data produces predictions worth making publicly. Testing that — publishing every prediction before the match is played and never revising it — is what makes this interesting rather than arbitrary.

The Setup

Over the past few weeks, I built a public prediction system for the 2026 FIFA World Cup, on top of a codebase that has been developing for the past two years. It covers all 104 matches — from the 48-team group stage through the knockout rounds to the Final on July 19th.

You don't need to understand the mathematics to follow this series. The short version: a system that assigns probabilities to match outcomes, predicts scorelines, simulates the full bracket, and adjusts its assumptions nightly as results come in. In slightly more detail:

Predicts match outcomes — win/draw/loss probabilities for every fixture
Generates tipps — the most likely scoreline for each match
Projects the bracket — who advances, who gets eliminated, who wins
Adapts its parameters nightly — six internal calibration values adjust as results come in; by the knockout rounds, predictions reflect what the model has actually learned from the group stage, not just its starting assumptions

All predictions are published before each match and never revised retroactively. New articles appear here every Sunday throughout the tournament.

One deliberate constraint shaped the entire design: zero data costs. The API I use (football-data.org) provides WC 2026 match data on its free tier. Historical World Cup data — 2014, 2018, 2022 — would have required a paid subscription. I chose not to pay, and that decision directly shaped the model architecture.

The model is Elo-based. Elo encodes the cumulative effect of historical match results into a single number per team. It needs no training phase and no historical scorelines — just the accumulated win-loss record, pre-loaded from eloratings.net on June 10, 2026. Win probabilities are derived from the Elo gap between teams via a logistic formula, then adjusted for draws; tipps are computed via a Poisson model for goal expectation. Both steps are explained with worked examples in Article 7.

The Data Foundation

The starting point is a set of 48 static Elo ratings seeded on June 10, 2026 — the day before the tournament. The source is eloratings.net, a public site running Elo calculations on international football results since 1872.¹ The ratings encode historical information — decades of matches, including 2014, 2018, and 2022 — but through a third-party calculation, not one this system performed.

There are no WC 2014 scorelines in here, no group stage results from 2018, no Qatar 2022 performance data. The model cannot learn from how Brazil performed across three tournaments, or measure Morocco's 2022 overperformance relative to their Elo at the time. This is the most important structural fact about the model.

The spread runs from 1,421 (Qatar) to 2,157 (Spain), a gap of 736 points. To illustrate what that gap means in probability terms:

Match	Ratings	Elo Δ	Home win	Draw	Away win
Spain vs New Zealand	2157 vs 1562	+595	90.8%	6.1%	3.1%
Colombia vs Ecuador	1982 vs 1938	+44	50.2%	20.9%	28.8%
Sweden vs Uruguay	1712 vs 1892	−180	24.9%	19.6%	55.5%

(Elo Δ = home team rating minus away team rating. Negative means the away team is stronger.)

As a scale reference: a 400-point gap between two teams corresponds roughly to a 10:1 outcome expectation — the stronger team is expected to win about nine times in ten. The Spain vs New Zealand example above is close to that mark.

To keep the model honest throughout the tournament, the system runs a daily data quality check: it re-fetches the current eloratings.net ratings, compares them against its own internally computed values, and flags any team where the gap exceeds 30 points. This check is visible on the prediction page.

A Known Blind Spot: Africa and Asia

There is a structural bias in this model worth naming before the tournament starts — though it is smaller than it was four years ago.

Elo is only as good as the matches it can observe. African and Asian teams play most of their competitive football within their own confederation, against opponents who share the same underrepresentation. Without regular matches against top European or South American sides, inter-confederation strength comparisons rest on a thin base of data.

The 2022 World Cup delivered a partial correction: Morocco reached the semi-finals; Japan and South Korea knocked out European nations. Those results fed back into the ratings. Morocco, Japan, and South Korea enter this tournament with Elo values that already reflect 2022 — they are not underestimated in the way they were before Qatar.

The residual bias applies to teams that have grown since 2022 but haven't yet tested that growth against top opposition. Their improvement shows up in confederation results, not yet in cross-confederation data. For those teams, the ratings are probably conservative. Upsets from the African and Asian brackets remain more likely than the headline numbers imply.

I am not correcting for this. I am flagging it.

What the Model Says

The top-rated teams going into the tournament, and their projected champion probability:

Team	Elo Rating	Champion probability
Spain	2,157	24.5%
Argentina	2,115	16.6%
France	2,063	11.2%
England	2,021	7.2%
Brazil	1,991	5.5%
Portugal	1,986	4.9%
Colombia	1,982	4.7%
Netherlands	1,948	2.9%

Ratings as of June 10, 2026 (eloratings.net seed). Championship probabilities from 10,000 Monte Carlo simulations. Full table at the live prediction site.

A few honest limitations before the first ball is kicked:

Draw probability peaks at 28% for equal teams and tapers off as the Elo gap widens — falling to ~15% at 260-point gaps and ~4–5% at extreme mismatches. The 28% is a ceiling, not a constant.
The model has no injury awareness. If a key player is ruled out the morning of a match, the prediction doesn't change.
The Poisson tipp is the single most-likely scoreline — not a distribution. A 1:0 tipp with 14% probability is still the most likely single result, even if 86% of outcomes are something else.
The model can tip 0:0 when the goal expectation for both teams is very low — technically optimal for the formula, but goalless draws are rarer in practice than the Poisson distribution implies. Treat scoreline tips as the model's best guess given its assumptions, not an unconditional forecast.

The First Match, By the Numbers

The opening fixture — Mexico vs South Africa on June 11 — illustrates how the chain works in practice.

Mexico enters with an Elo rating of 1,875. South Africa arrives at 1,518. That is a gap of 357 points — wide enough to make Mexico a clear favourite, but not so extreme that the outcome is a formality.

From that gap, the logistic formula produces raw win probabilities, which a draw-taper layer then adjusts:

Outcome	Probability	Implied odds
Mexico win	78%	1.28
Draw	15%	6.67
South Africa win	8%	12.50

(In brief: the logistic formula converts the Elo gap into a home/away win split. The draw-taper layer then shifts probability mass into the draw bucket — more for closely-matched teams, less as the gap widens. The Poisson goal model translates the same gap into expected goals per team, from which the most likely scoreline follows. Both are worked through in detail in Article 7.)

The Poisson goal model — which translates the same Elo gap into expected goals per team — produces a mode of 2:0 to Mexico. That is the single most likely scoreline. It carries roughly 14% probability, meaning 86% of outcomes are something else.

That gap between "most likely" and "likely" is the central thing to hold on to throughout this series. A prediction that says 78% does not mean Mexico are certain to win. It means that, across a large number of matches with this Elo configuration, Mexico wins in roughly 78 out of every 100 such matches. The times they don't is not an error — it is the 22% expressing itself.

This same chain — Elo gap → probabilities → Poisson tipp → public record — repeats for all 104 fixtures.

A Different Forecast, A Different Method

This is not the only model making public predictions before the tournament.

Joachim Klement, a German economist at Panmure Liberum in London, has correctly forecast the last three World Cup winners — Germany in 2014, France in 2018, Argentina in 2022 — using a model built on economic and demographic fundamentals: GDP per capita, population size, climate, and FIFA rankings. No Elo. No match data. His full report is available here.

For 2026, Klement's model picks the Netherlands.

The contrast with this model is almost maximally clean. Klement uses economics and demographics as proxies for long-term football strength; this model uses Elo, which encodes the same long-term information but derives it from match outcomes directly. Both approaches try to measure the same underlying thing — they just measure it differently. And they do not disagree wildly: both place the Netherlands and Spain among the serious contenders. The disagreement is on who wins. Klement says the Netherlands; this model gives Spain a 24.5% championship probability, with the Netherlands reaching the final in roughly 7% of simulations.

The honest answer is that we will not know which framework is better calibrated until the tournament is over. Klement has three correct picks in a row — an impressive record, though one he is the first to note could be mostly luck. "It's like tossing a coin. You might predict heads four times in a row, and that might well happen. But that doesn't guarantee it will happen again next time." The same caveat applies here.

This is, ultimately, what makes the exercise worth running.

Before the First Ball Is Kicked: Championship Probabilities

Based on 10,000 Monte Carlo simulations of the full bracket — run before the tournament starts — here is where the model thinks the title goes.

(Why Monte Carlo? A single deterministic bracket would pick one winner and ignore all the uncertainty that exists between now and the final. Running 10,000 random simulations — each match resolved by drawing from the predicted probabilities — gives a championship probability that reflects both team strength and bracket path. Early upsets change everything downstream. Full methodology in Article 7.)

Team	Pre-tournament Elo	Champion %	Reach Final %	Reach SF %
Spain	2,157	24.5%	34.5%	47.4%
Argentina	2,115	16.6%	26.1%	39.8%
France	2,063	11.2%	19.7%	32.0%
England	2,021	7.2%	13.9%	25.1%
Brazil	1,991	5.5%	11.8%	22.1%
Portugal	1,986	4.9%	10.6%	20.5%
Colombia	1,982	4.7%	10.2%	19.8%
Netherlands	1,948	2.9%	6.9%	15.2%
Ecuador	1,938	2.7%	6.7%	14.9%
Germany	1,932	2.5%	6.2%	14.5%
Mexico	1,875	1.9%	5.4%	12.9%
Croatia	1,912	1.9%	5.3%	12.7%

Based on 10,000 Monte Carlo simulations run before the tournament. Champion % = share of simulations in which each team lifts the trophy. Reach Final / Reach SF = share reaching those stages. The full 48-team table is live at the prediction site.

The full table is available at the live prediction site. A few observations on the shape of it:

The top two or three teams hold a combined probability well above 50%. That concentration is typical — even in a 48-team field, Elo tends to compress the realistic winner pool to a handful of elite nations.
The host nations (USA, Mexico, Canada) carry elevated probabilities compared to their raw Elo would suggest, due to the explicit host-bonus in the model. Whether that bonus is correctly calibrated is one of the empirical questions this series will answer.
The long-tail probability — the chance that a team outside the top ten wins the tournament — is not zero. It is usually between 10 and 20 percent across the full field. Upsets happen.
Path dependency matters more than it looks. A team ranked seventh or eighth by Elo can show a disproportionately high final probability if the bracket puts them in a softer half — meaning they face the top-rated teams only in the final, not in the quarter-finals. This is not a model artefact to be corrected; it is a real structural feature of seeded knockout tournaments. A team with a 20% chance of winning any given match against a top opponent has a very different championship probability depending on whether they meet that opponent in the last 16 or the final. The 48-team format with Elo-based seeding makes this effect especially pronounced.

These probabilities will shift after every match. The first update publishes here on June 14th.

A Framework for Decisions Under Uncertainty

The Mexico example above illustrates the mechanics. What it does not yet explain is why publishing probabilistic predictions in public, before each match, and leaving them permanent is the correct test. Three ideas from decision theory make the answer concrete.

First: commit to numbers and track the full record (Tetlock). Philip Tetlock's Superforecasting research identified the distinguishing characteristic of genuine forecasters: they attach explicit numerical probabilities to their predictions and track their full record against outcomes — wins and losses. Vague directional language — "I think Spain will probably win" — is unfalsifiable. A published 64% is not. The scoring metric Tetlock's research used to evaluate thousands of forecasters — the Brier Score — penalises both overconfidence and underconfidence in a mathematically precise way. It is applied to all 104 predictions in this series, reported in Articles 4 and 6, and tracked live on the prediction website throughout the tournament.

Second: judge the decision, not just the result (Duke). Annie Duke, in Thinking in Bets, names a specific cognitive trap: resulting — evaluating the quality of a decision by its outcome rather than the process that produced it. A poker player who bets correctly on a statistically favourable hand and still loses made a right decision; the result was bad, the reasoning was not. The same applies here. If the model gives Spain a 24.5% championship probability and Spain loses in the Round of 32, that is not evidence the model was wrong — it is one of the 75.5% of scenarios playing out. A model that said 70% home win and the home team lost is only failing if 70%-events never happen. The correct test is calibration over many predictions, not accuracy on any single one.

Third: treat the future as a set of scenarios, not a single line (Shell/Wack). Royal Dutch Shell pioneered scenario planning in the 1970s — the insight attributed to Pierre Wack is that the future is not one thing that will happen but a range of plausible futures, each with a different probability weight. Planning means preparing for the distribution of outcomes, not just the most likely one. The 10,000 Monte Carlo simulations in this model are exactly that: a structured exploration of the plausible futures this tournament could produce. Spain wins in roughly 2,450 of them. The Netherlands wins in roughly 290. Brazil wins in 550. Every one of those is a scenario that the model considers real and weighted — not a noise event to be discarded, but information about what could happen and how likely it is. The championship probability table above is not a prediction of one future; it is a probability-weighted map of many.

These three ideas — numerical commitment, process over outcome, and scenario thinking — are the intellectual framework behind this series. The tournament will not confirm or refute them in any single match. It will test them across 104.

That distinction — between getting it right once and being well-calibrated across many events — is what this experiment is actually testing.

What Comes Next

Every Sunday throughout the tournament I'll publish an update here: predictions versus results, where the model was right, where it was confidently wrong, and what the numbers say about the week ahead. Running alongside that, I'll check in on Klement's bracket prediction — whether the team he's backing is holding its projected path, and where the two forecasts diverge in the actual data.

The predictions are public, timestamped, and don't get revised retroactively. The model updates its ratings after each result — that's the point — but past predictions stand as originally made.

Article 1 — June 14: Four days of results. First accuracy scorecard, the biggest upset of opening week framed correctly as a probability event, and an early look at whether the host-nation bonus for USA, Mexico, and Canada is showing up in the data.

The tournament starts tomorrow. Let's see if the numbers hold.

Live predictions at https://christians-world-cup-predictions.replit.app — updated nightly.

¹ eloratings.net has tracked international football Elo ratings continuously since 1872, covering all senior international matches with results. It is maintained independently and widely used as a reference for national team strength. The ratings it publishes are the starting point for this model's seed values.

The predictions on this website and in this article are probabilistic model outputs based on statistical methods. They do not constitute betting tips or investment recommendations. This website and article series are intended for entertainment and educational purposes.

This article and the prediction website were developed with AI assistance. Topics, editorial direction, factual framing, and code architecture were set by the author; the full text was reviewed and edited by the author.