Predictive Models in Sports 2.0

Will there be another COVID-19 wave in the fall? How will the population structure change in the long term, and what are the economic implications of this? How will the CO2 concentration in the atmosphere change in the coming decades? In the realm of sports, predictive models have become an integral tool for optimizing current decisions. Whether it be in politics, economics, weather, climate, crime, energy supply, demography, or sports, the urge to accurately forecast future events appears to be deeply ingrained in our human nature and is simply essential.

In 2019, Prof. Dr. Daniel Memmert and Dr. Fabian Wunderlich from the Institute of Exercise Training and Sport Informatics at the German Sport University Cologne implemented a simulation framework based on an initial research grant from the German Research Foundation (DFG; ME 2378/29 1). This framework enables the generation of artificial data, replicating the entire prediction process in the field of sports, from network generation to the creation of predictive ratings and derived percentage-based forecasts. The advantage of artificial data lies in the ability to deliberately control and vary all underlying processes, in contrast to real data, allowing for better validation of prediction models.

The Institute of Exercise Training and Sport Informatics has now received further funding from the DFG. The scientists will receive more than 280,000 euros over a two-year project period for the sports informatics project (ME 2678/29 2). In terms of content, the research project is directly linked to the theoretical simulation framework on networks developed in the first DFG project. "While classic statistical models have already been successfully validated in our completed DFG project, the follow-up proposal now focuses on the theoretical validation and further development of prediction models based on machine learning methods," explains Prof. Dr. Daniel Memmert. Data from sport will once again serve as an application example, with the sports of soccer and tennis being considered due to the availability of complex data sets.

The scientists are focusing on methods of supervised learning, which will be integrated into the existing simulation framework and tested for functionality. "There are four model classes planned: two pure Machine Learning model classes based on Random Forest and Graph Neural Networks, as well as two hybrid model classes that combine Machine Learning methods with classical statistical components," explains Dr. Fabian Wunderlich. By analyzing the model quality and identifying the strengths and weaknesses of the models, conclusions can be drawn about the potential for further development in Machine Learning-based approaches.

The scientists are particularly interested in the situations in which machine learning, hybrid or traditional models are superior. This question is based, among other things, on the realization that, in contrast to areas such as image recognition or language models, machine learning models are traditionally superior in predictive processes (e.g. in economics).

Contact for further information

Institute for Exercise Training and Sport Informatics

Prof. Dr. Daniel Memmert

memmert@dshs-koeln.de

Phone: +49 221 4982-4330

 

Dr. Fabian Wunderlich

f.wunderlich@dshs-koeln.de

Phone: +49 221 4982-4845