Nonlinear dynamic modelling and prediction with applications to football

UoM administered thesis: Phd

  • Authors:
  • Jamie Halliday

Abstract

We present prediction methodology motivated by problems arising from a large dataset containing football match information that can be applied to a wide range of areas. Much of the thesis has been driven by the prediction of goals, however we also attempt to predict observable features such as the number of shots as well as unobservable features such as the newly popular expected goals (xG) metric. This thesis can be found in three main parts. We begin by recounting a summary of pre-existing research in the area of football modelling that motivates the concepts used in the forthcoming chapters. We also describe the data available to test and evaluate our methodology, as well as outlining plans to contribute papers to the academic community. The middle part focuses on new time series methodology with extensions to football match results found in the later sections. We first introduce the reader to some basic time series concepts before proceeding to the novel material. We provide a new technique for the estimation of nonstationary parameters in SARUMA models using the nonlinear transformation obtained from the Levinson-Durbin algorithm. By defining a set of polynomial coefficients in terms of autocorrelations we provide a stable and compact parameter space on which the estimation can occur. Then we provide a new class of multivariate time series models that are used to predict the number of goals scored by each team in a football match. The model assumes that each margin follows a conditional Poisson distribution (a common assumption for the goal scoring processes in a football match) and includes an intensity feedback mechanism that allows past form to influence the expected (future) performance. We study asymptotic properties of the model and compare different model specifications to obtain the best predictive model. Finally, we combine machine learning and statistical techniques to provide new methodologies for prediction problems. We again start with a foundation chapter containing basic machine learning concepts and algorithms. We then outline our own contributions as we investigate the covariates of a goal prediction model and their subsequent effect on match predictions. We start by creating a new class of gradient boosted models for predicting goals scored based on match statistics such as shots, passes, and an xG metric. We train several machine learning algorithms for the calculation of the metric and choose what we believe to be the most appropriate. We then gather the match statistics, which we call features, and use the boosting model framework to build the models. We analyse the importance of the included features in the best model and acknowledge that real-time match data are used to predict the result. Whilst suitable for studying feature important feature, it would be inappropriate for predicting the future results. Therefore we study a new approach to feature projection, which uses a state space approach to model the features directly and produce projections for the next match ahead of time. These projections are utilised in a variety of machine learning algorithms to assess whether the more sophisticated technique produces a higher quality of match prediction.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date1 Aug 2020