• The South African data analytics and machine learning firm Principa is forecasting that the French will go home with the Fifa World Cup trophy on Sunday.
  • Using extensive computer models, the firm has been predicting match scores throughout the tournament. 
  • Their models have out-predicted 99.96% of human-made predictions.

Data scientists at analytics and machine learning firm Principa have been hard at work predicting scores and winners of each 2018 Fifa World Cup match.

Their predictions are so near-perfect that they're are currently outranking over 99.96% of participants on SA's popular sports predictor site, Superbru.

"At the moment, we are at number 46 out of about 170,000 in South Africa," says lead data scientist at Principa, David Hatherell.

Their latest prediction is for the highly anticipated final game on Sunday between France and Croatia. Principa holds that France will most probably win the 2018 Fifa World Cup 2-0 against Croatia.

"Our office is full of sport-loving data scientists," says the consultancy's CEO Jaco Rossouw. Employees at Principa, which uses modelling and data analysis to predict human behaviour for retailers and financial services providers, first started predicting the results of the Rugby World Cup in 2015 to hone their data science skills and see how well predictive analytics perform against the best "human-made" predictions.

They decided to use the Fifa World Cup as another opportunity for their teams to sharpen their skills and compare human and machine predictions. 

Principa's Top 6 100%, accurate predictions since start of the World Cup

(Table: Business Insider SA)/(Source: Principa)

Principa uses the databases of the teams to check variables such as which players are in each team, their recent performance, who they've played against and the general public sentiment towards the team for instance.

The data scientists also comb through social media posts on the teams or put differently, measure the market confidence of teams and the public's expectations.

The information obtained using a Bayesian Inference method is then combined to form a model that is automated to adopt a machine learning approach, in that it reselects variables and parameters every time it is run, adapting to how the World Cup is unfolding.

Results of games from previous rounds inform predictions for the next round. The machine learning component can take it a step further, though, by also learning from match results within the current round and updating every day.

The models have out-predicted 99.96% of human-made predictions - and have fallen short only slightly