World Cup Predictions

Man with a hammer syndrome has struck again. To the man with a hammer, every problem looks like a nail. This bias causes people to use the tools they possess to solve problems, regardless of how inappropriate the tool may be for the problem at hand.

In trying to predict the outcome of the world cup, analysts at reputable firms have come out of the woodwork with their large data sets and computer models. They have been trained to analyse and manipulate data so when given the opportunity to make predictions about uncertain outcomes, they reach for their toolbox to grab their hammers. However, if the only tool in your toolbox is a hammer, you will inevitably find nails everywhere.

One large bank, who shall remain nameless, wrote a 17-page report trying to forecast the World Cup winners. They carried out 10,000 simulations and predicted the most likely winners are Germany, Brazil and Spain. Germany was further give a 68.6% chance of coming out on top in the group stage.
These two predictions of course did not pan out – Germany, Brazil and Spain did not make it past the quarter finals and Germany finished at the bottom of their group.

Analysts at another bank raised the stakes and used machine learning to run 200,000 models to forecast game outcomes based on team and player attributes. They then carried out 1 million different simulations and predicted a Brazil – Germany final. Neither of these countries made it to the final.

Why did these predictions fail? For starters, they use static and backward-looking data. Things change in real time that cannot be incorporated into the models – a keeper may get a concussion that affects his performance, team morale may be poor on the day and the referee may have an argument with his wife the night before making him more irritable and likely to card players. There is a large element of randomness that cannot be modelled.

Also, because of the low score nature of football compared to other sports, one goal can change the outcome of a game, improving the odds of the underdog.

The problem with having only one tool in your toolbox is you are limited in how you can approach the problem. Big data mining and machine learning only compound the problem because now you can run a million simulations and make predictions to 1 decimal place. Running a million simulations does not increase the accuracy of the forecast. It certainly increases the precision of the forecast but this is false precision. It is better to be approximately right than precisely wrong.

Who will win the World Cup? Who knows. One thing we can forecast is that at the next World Cup, some analyst will use larger datasets and run more simulations to predict the outcome. They will most likely get it wrong again.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.