In the esports data industry, the term ‘scraping’ is used to describe the process of using tools that ‘watch’ live streams of esports matches to analyse the information shown on screen. This data is then used to create esports odds feeds and other products.
The biggest issue with this method, however, is that the information from these live streams is not live, thus presenting a threat to integrity.
To further explore the differences between official and scraped data, Esports Insider talked with Martin Dachselt, CEO and Managing Director of Bayes Esports and Dr. David Weller, Partner at Lubberger Lehment and legal advisor to Bayes Esports.
Point of view: Strategy
Dachselt noted that data scraping actually results in slightly higher maintenance due to the fact that scrapers need to react to anti-scraping measures or changes to the broadcast. Official data is provided right from the source, resulting in less maintenance fees.
Scraping data, according to Dachselt, won’t just affect the performance of stakeholders using this data, but it could lead to damages to the esports ecosystem. “The esports industry can be estimated to lose revenue in the two-digit millions because of data scraping,” he said.
“Rights holders being unable to properly monetise their own data means they lose out on revenue that would otherwise be invested in a better fan experience, more tournaments, or more game titles…An industry based on unofficial data cannot be sustainable.”
Dachselt said that in the grand scheme of things, more than 90% of the cost of official data is based on the licence fees paid to the rights holder, which of course are not paid by scrapers.
From a business perspective, Bayes Esports aims to educate its clients on the value of official data. For Dachselt, this means overcoming two main challenges. Firstly, many sportsbooks do not know about the differences between scraped and official data because it doesn’t really exist in traditional sports. As a result, many sportsbooks simply do not know that there are two different types.
The second challenge is poised by the scrapers themselves, through marketing efforts that showcase scraped data as live. Companies often use the terms ‘official’ and ‘live’ in addition to game logos even though those entities are not official data providers for said events or games. As a result, it’s crucial to check on a data company’s website whether they are official data partners of tournaments or game publishers.
“All operators see is a data provider that offers content for much cheaper than an official data provider, but they fail to realise that that content comes with potential risks.” Highlighted Dachselt. “Scraped data feeds are unreliable and can dry up at a moment’s notice and the inherent delay of the data raises integrity and match-fixing concerns. These risks make scraped data much more costly in the long run.”
Point of view: Technology
Scraping is, by definition, a process of extracting data from the output of a website or stream using a dedicated programme. In the case of esports data scraping, that can mean using a programme that relies on Optical Character Recognition and AI to scan the broadcast of an esports tournament. This extracts displayed information such as the game clock, match scores and player statistics, and outputs it as data points.
However, just simply exploring the usage of AI and scraping is risky. Bayes Esports noted that, in order to train an AI model to recognise what is happening on screen is to download other streams, which is a violation of the copyright right from the start. This causes a multitude of issues and concerns for the data customer as well. If legal actions are taken against the data scrapers, the data feed can be cut out at a moment’s notice, leaving the customer without any content. Furthermore, associating themselves with someone who uses illegal market practices can also reflect negatively on the data customer, leading to potentially irreparable damages to their image. Continuing on, the fact is that scraped data is lesser in overall quality than official data, since it relies solely on things shown on screen at any one time.
“While those do a mediocre job of conveying the state of the game to the fans, they are not a good, valid or accurate source of information,” stated Dachselt.
“Not everything that happens in the game can be shown on screen, and every time something other than live gameplay is shown, no data can be scraped. On top of that, broadcasts are delayed by an average of 40 seconds. Scraped data is delayed by the same amount of time. All this combined makes scraped data too slow and unreliable for modern and professional use cases.”
On the other hand, getting access to official data directly from the game servers means some sort of partnership is needed. Having access to the game’s servers allows for an accurate transcript of the entire game to be available to the end user.
Of course, partnerships such as these require resources such as the tech stack and support, as well as a relationship between the two parties. However, these deals not only benefit stakeholders — such as bookmakers — wanting the data, but the game publishers as well.
Dachselt explained: “Rights holders reinvest the revenue they generate from the official data partnerships back into their esports productions, leading to a better experience for fans and players alike. Data scraping is an illegal, unprofessional and archaic market practice that cannot be considered a valid option for data customers.”
“What we are seeing is that more and more players in the industry recognize the risks and damages caused by data scraping and take legal action. False advertisements from data scrapers can be brought to court under unfair competition laws. Furthermore, data scraping may infringe on the copyright of the rights holders if the official broadcasts are downloaded to train the scraping AI. With rights holders now realising the damages data scraping can cause, we will see a steady increase in legal actions being taken and data scrapers being driven out.”
Point of view: Legal
Bayes Esports has engaged in legal cases with companies that use data scraping before. The company’s advisor Dr.David Weller in particular highlighted a court ruling in Germany that found the use of data scraping should never be called or advertised as ‘live’.
“It’s a reflection of the fact that these grey market practices are hurting the broader esports industry,” he said.
“Legal rights holders and official data distributors are now fighting to protect esports data for the overall development of the industry. Bayes Esports has made a large move in this field… Whilst it does use more resources, official live data will always create more value to the end customer.”
Weller stated that it’s crucial that rights holders also pay attention because it is their intellectual property that is being exploited. The company has worked out strategies that include civil law action and also criminal prosecution, however, no proceedings are pending at this time.
Despite more of a focus being put on esports data practices, the fact that a large number of betting and data companies are based outside of the EU and in countries with looser regulations might prove to be a hurdle. Weller said that it is definitely possible to pursue legal action against those based abroad, but admitted that it is more difficult to enforce a judgement if the other party is not based in the EU.
Weller highlighted that the laws currently in force are sufficient, it is only a matter of using them that is the challenge. Since the esports data industry is a relatively new space, data scraping is also a new phenomenon in this already specific niche.
“Our experience has been that, although the industry and its mechanisms are new to the courts to begin with, they quickly recognise that scraping is not acceptable,” Weller exclaimed.
“Many companies are reluctant to invest money, time, and energy in litigation, but it pays off in the long run. The bigger you allow the market to get, the harder it will be to limit the damage later.”
This article is supported by Bayes Esports.