Dr. Darina Goldin, Director of Data Science at Bayes Esports, writes for Esports Insider to discuss the true value of esports data.
Data is the oil of the 21st century. Betting needs historical data to create accurate prediction models and live data to offer odds alongside it. However, not all data is created equal and that is especially true for esports. For most intents and purposes, esports data has a very limited shelf life.
Value of historic data
First of all it’s important to note that there’s not one value degradation going on, but two. The first concerns historical data — which is data of any match that has finished. Esports games regularly get patched, and the effects of these patches can go from tweaking the value of some items to completely removing or adding buildings or heroes to the game.
Most of the time patches significantly alter the metagame and require the creation of new models overnight. Historical data is, therefore, obsolete shortly after it was obtained. For Counter-Strike it is plausible to use the last year or so of matches in your dataset. For League of Legends, you can only take whatever was played since the last patch or two.
Of course, it’s possible to create models that do not depend on commonly patched values, but these models will have very clear limitations — and might still be broken by patches.
One such example is the Counter-Strike map winner model, which can be generated purely from the current score in rounds. In 2019, Valve made changes to the CS:GO economy model twice. This, in turn, changed how current winning streaks affect future round win chances.
If you do any kind of data-driven analysis, you cannot rely on a historic dataset that you obtained once. It needs to be constantly updated with the latest matches played by top-level teams.
For any proper research, the half-life of esports data is labelled in months. For some titles, it’s weeks. Companies like Bayes Esports invest a lot of resources in obtaining the most recent professional matches for training data and tracking the changes in the meta-game.
Value of live data
When it comes to live match data, a different issue surfaces. Esports matches are played on a server and broadcast over streaming platforms like Youtube or Twitch. Additionally, some publishers make live match data available through web APIs. Partly in order to prevent cheating, there is a delay between the stream or API and the game that can be anywhere from ten seconds to several minutes.
A delay of even ten seconds can create huge losses for a bookmaker if somebody finds a way to abuse it. Moreover, this is not unrealistic.
Though for the large part the matches are played online, the most prestigious events happen in a stadium with a live audience. Anyone spectating at the venue has access to undelayed data — however, the bookmaker following the delayed video stream does not.
But what about sending a scout to the stadium to watch the match? Even where it is legal, it will never be feasible due to the sheer amount of data per second that is generated and required. Esports are data-driven in a way that most professional sports never will be.
There is no need to describe player actions through data — the game is happening on a computer and the actions are data from the get-go. Data providers like Bayes Esports deliver full and accurate game information with all positions, items, and actions of each player several times per second. A scout could transcribe only a fraction of that information, and certainly not enough to develop competitive betting algorithms.
A bookmaker that wants to stay profitable long term needs to ascertain the fastest data that is available on the market. Ideally, the companies should be getting undelayed data directly from the servers through robust pipelines. This is often possible by having contracts with publishers and tournament organisers directly or by going through data entities such as Bayes Esports.
From our supporter: Bayes Esports