Official game data in the Journal "Scientific Data"

Heatmap of Jamal Musiala in the match against Cologne (matchday 34): substituted in the 85th minute, plays more in the right half-space, only briefly in his typical position (left half-space), scores the 2:1 winning goal from there (left edge of the penalty area) to win the championship. Direction of play from left to right.

Until now, many scientists and coaches have had to rely on video footage for their match analyses in professional soccer in order to evaluate performances and study opponents. Employees of the Institute of Exercise Training and Sport Informatics at the German Sport University Cologne, in collaboration with the DFL German Football League, have now been able to partially close this gap by providing data from selected matches.

The official match data will be published in the journal Scientific Data in mid-February under the title "An integrated dataset of spatiotemporal and event data in elite soccer". "Data-driven match analysis in soccer is a growing discipline in both research and practice. However, there is hardly any public data available, which increases the barrier to entry into this field and reduces the reproducibility of methods and results," says Prof. Dr. Daniel Memmert, Head of the Institute of Exercise Training and Sport Informatics at the German Sport University Cologne, describing the initial situation. Memmert and his colleagues Manuel Bassek, Robert Rein and Hendrik Weber are therefore all the more pleased to be able to publish a data set with official match information, event and position data in a renowned journal. This involves two Bundesliga matches and five 2. Bundesliga matches in the 2022/23 season.

The match information contains metadata on the matches and their participants. The event data includes timestamps and descriptions of individual events such as passes, shots or fouls. The position data includes the x/y coordinates of each player and the ball, so that the players' routes and the movement of the ball can be traced exactly. By integrating multiple data modalities - i.e. event logs with timestamps and x/y coordinates of player and ball positions - the dataset provides a multi-dimensional view of the game dynamics.

This data set is intended to help test existing analysis techniques and develop new methods for sports analysis. "We are making the data available under the Creative Commons license CC-BY 4.0. In doing so, we want to promote innovation, reproducibility and open science in match analysis research," says Dr. Hendrik Weber from the DFL's Sports Technology & Innovation Directorate. This also ties in with a current research project by Prof. Daniel Memmert's team, which was and is funded by the Federal Ministry of Education and Research (BMBF) and the German Research Foundation (DFG): The new platform for sports data analysis "floodlight" includes data processing processes to simplify and standardize performance indicators and was also designed as a freely accessible package.

Source:

Bassek, M., Rein, R., Weber, H., & Memmert, D. (2025). An integrated dataset of spatiotemporal and event data in elite soccer. Scientific Data, 12(1), 195.