CasesDatathon cases

DSS Covid-19 Online Challenge


Data Science Society starts a community-directed cause: Covid19 Online Challenge.

The challenge is to use data science for better decision making with this (or any future pandemic). Natuarally the challenge includes developers, data scientists and domain experts. Covid-19 pandemic could be sliced in various aspects and therefore has enough challenges for all participants.

Problem statement: in handling the public health crisis hardly any public policy maker did not make errors of various kind: due to lack of reliable data, seemingly random change of public policies, lack of real understanding in the general population and badly organized communicational process. All of these issues are subject to this challenge.

The challenge is common to everyone present, there is no internal competition, but everyone is competing as one big team against the tasks. The challenge is to produce a certain number of tools and solutions but is open-ended by nature. Open-ended here means that any additional data-source, public tool or published model could be added to the projects. The goal is to create something useful for various afents – common folk, public policy decision makers, analysts, but also data scientists. There would be not exactly deadlines (unless together we decide to put some), but the idea is that we present montly (or so) whatever results we have achived by short video-presentations and maybe articles/notebooks.

In order to solve the challenge, we have to self-organize our work. We should break down every concept to tasks and use some organizational technique to work them. Everyone who finishes working on one project immediately finds himself next and joins it. All finished codes and assets accompanied with reasonably understandable documentation are published in the DSS public repo, under GNU GPL 2.0 license.

All comunication on the challenge would go through the channel #covid19_online_challenge in the (

A not exhaustive list of possible directions:
1) Purified updated global dataset on Covid19
Having reliable data is the first step to any data science project. There are currently plenty of data sources on Covid-19 (see some below), but there are plenty of issues remaining here – ranging from simple ones (comma in “Korea, South”), through human error (i.e. some departments in France actually reported negative new cases, not as a correction) and break in public datat streams (there was no official data for Serbia for 10 days at end of April), and all the way to data definition (Belgium authorities count any death with doubt for covid-19, to be a covid-19 case). The other challenge, appart from curing the data it ot provide automated daily updates.

2) Optimal public policy
The idea here is not to make just another forecast for normalization which would be invalid in two days, but to go somewhat opposite way – and to try to come up with optimal strategy for dealing with such global pandemic. Since we have the data on public policies implemented at various times in various countries and having the data for Covid-19 affected people (and maybe weighening them with the demographic and socio-economic data for each country), we now could analyse post-factum what sort of public policies gave what sort results. A closer to optimal strategy discovered here would ultimately result in closer to optimal strategy implemented by public decision makers.

3) Understandable relevant user-friendly explorables
Explorable explanations are visualized interactive simulations for better understanding a fairly complex problem typically targeted at non-specialists, who would like to obtain some informed intuition. This type of tools could easily support some part of the research on Optimal Public policy for decision makers (simulating various measures), but also they should be very useful in convincing the everyday common folk to undertake the necessary behavioral changes.

4) Infodemic
As a side product of mishandled public communications and haighly connected societies, the false information phenomennon proliferates wildly during Covid-19 situation. Putting up the right tools to weed out the plain lies and automate fact-checking.

Covid-19 datatsets

Demographic/economic data

Public policies data

Visualisation/simulation of spreading patterns ,

Reports on current misinformation

Share this

Leave a Reply