Datathons Solutions

Datathon Sofia Air Solution – The Telelink Case handled by the Urban air quality Gurus!



1. Business Understanding

Particulate matter is considered the air pollutant of greatest concern to the health of the urban population. Researches have shown that exposure to PM can lead to increased days lost from work or school, emergency room visits, hospital stays, and deaths. Both short and long-term exposures to PM can lead to the worsening of heart and lung disease. It can also cause premature death, particularly among people who have a higher risk of being affected by particle pollution.

Particulate matter can be produced from burning materials, road dust, construction, and agriculture. One of the largest sources of particulate matter is residential wood burning. Wood smoke may come from residential sources such as a fireplace or wood stove in a home, all open burning of vegetative matter or backyard burning. Other sources of particulate matter include forest fires, certain industries, furnaces, tobacco smoke, and all mobile vehicles, especially those with diesel engines. The harmful effects of tobacco smoke are well known. As a result, many countries have placed restrictions on smoking in public places.

We as real urban air quality gurus aim to predict the PM concentration in our capital city – Sofia. We trust the forecast can be used from the business for prevention. The local authorities can reduce the levels of particulate matter pollution by reducing the amount of particulate matter produced through the smoke and by reducing vehicle emissions.

Inspired by that, in this study, we focus on refined modeling for predicting daily pollutant concentrations on the basis of historical air pollution data.

2. Data Understanding

As a starting point, we focus on understanding the provided datasets which are as follows:

Official air quality measurements –
Citizen science air quality measurements –
Meteorology data –
Topography data –

Our analytical approach will involve the following activities:

    1. Data extraction from the Primary data source as well as secondary data sources

    2. Data quality check Data cleaning and data preparation

    3. Study some of the variables by exploring the data

    4. Study the variables for its relevance for the study

Considering the specifics of the data and its topological, geometric and geographic properties we start our data understanding journey conducting a spatial analysis.

3. Data Preparation

Mapping the stations


Share this

4 thoughts on “Datathon Sofia Air Solution – The Telelink Case handled by the Urban air quality Gurus!

  1. 0

    Good progress folks. Good code, good approach so far. Waiting to see how you approach the links between bias and meteorology :).

    If you want to limit you work for a test case, perhaps focus on just 1-2 main stations (e.g. IAOS/Pavlovo and Mladost or Druzhba).

  2. 0

    Business Understanding: the presented information is relevant and provides a concise description of the issue under study.
    Data Understanding: the exhibition is well-structured and the utilized sequence of data-prep steps is well-grounded. Furthermore, the code and presentation of output is well-organized.

    So far, I might say it’s a good job, guys! Looking forward to seeing your progress 🙂

  3. 0

    You have done an excellent job with the visualisation and I liked the regression approach for adjusting the civil sensor measurements, but I think it is rather too complex. I wonder if you have done a bit more on the forcasting aspect?

  4. 0

    I would like to see what would you do if you plan to continue working on this case… like using elevation data for city topology, not just for sensors due fact that air is rising/falling deppending on temperature and we can get temerature inversion especially in more polution areas…. but overall, excellent work and nice visual story

Leave a Reply