Datathons Solutions

Datathon – Sofia Air 2.0 – Solution – Wonder Gang: Articles on Particles


Techonnology and methods used:

R – tidyverse, lubridate, snakecase, sp, raster, spData, sf, leaflet, mapview, ggplot2, shiny, maps, devtools, geojsonio, rgdal, leaflet.esri, leaflet.extras

Python – s3fs, pandas, numpy, matplotlib, plotly

1. Business Understanding

Air pollution beyond the norms is a common problem in many locations. Examining the causes behind and being able to predict it would help control and reduce pollution, facilitate solving environmental issues and providing a better life to citizens.

Air pollution seems to be recurring problem in our city – Sofia. This matter can be affected by climate, topography and anthropogenic factors as well. Our aim with this project is to explain how different human activities affect the pollution and our health.

2. Data Understanding

The provided datasets consist of static data and dynamic data for 20 day in November 2016. Data is constructed as following:

  • Weather Stability data – Data on the state of the upper atmosphere from radiosonde probes of Sofia
  • Topological data – containg GPS coordinates for different points in Sofia.
  • Official meteorological data for 20 days for Sofia city
  • Industrial pollutants – 71 industrial emission devices; their location, height and annual debit of PM10.
  • Construction sites – 389 construction sights; location and types of construction
  • Household pollutants – 39944 households and questionnaire data determining what source of heating the households use.
  • Data from official measurement stations –  data from 4 stations located in Nadezhda, Hipodruma, Druzhba, Pavlovo for 20 days in November 2016.
  • Lots of Miscellaneous tables (from the Methodology of the case) which we use to calculate the pollution based on the Dispersion model.

3. Data Preparation

Firstly, we decided to determine how the Dispersion model should apply to separate pollutants. In order to do so, we should use the Weather stability and wind speed from the meteorological data. Those factors helped us to determine how the pollution from separate emission devices will spread in the atmosphere of our city. Based on Weather stability we calculated  the vertical temperature gradient in the upper atmosphere above Sofia for each of the 20 days. This helped us to determine that the Pasquill stability class of the atmosphere is type E (Slightly Stable) for the whole period.

Next step was to calculate sigmas – sigma y  and sigma z, which represent the vertical and horizontal spread of the plum rise for each pollutant.

In order to visualize how different emission devices spread the pollution we choose a radial approach. We plot separate circles with different concentration of particles around the industrial pollutants.  Those are plotted on a interactive map (.html) which you can find in our GitHub repo.

A snapshot of initial pollution with all parameters:

A snapshot of initial pollution with pollution spread in a radius of 1000 meters:

A snapshot of initial pollution with pollution spread in a radius of 10000 meters:

A snapshot of predicted pollution for 1 day ahead:

4. Modeling

To start modelling we measured the distances between industrial pollutants and official measuring stations. The data was used to define how concentration of different particles from industrial stations affects the PM10 concentrations measured in Sofia.

We succeeded to define what is the correlation between Industrial emission stations and to project what the pollution will be in the succeeding five days.

Our estimations for the industrial pollution:

Official measurement station pollution:

Due to some technical issues and inability to post our code and viz in the article you can find our work in the repo below:

Here you can find a repo with our code on R and Python.


The work continues!!! Check our progress in the repo.







Share this

15 thoughts on “Datathon – Sofia Air 2.0 – Solution – Wonder Gang: Articles on Particles

  1. 0

    Great job! Really good map (with the blur visualization it rocks). Your paper shows a great combo of R and Python in one task. Keep working and calculate the other factors.

  2. 0

    I do like of approach of “blurring” over map for pollution spreading. It would be even better (I know you didn’t had time) to take into account surounding buildings and how they impact pollution spread to get more custom “shape” of spread instead of combination of circles but this visually tells amazing story. Keep good work of “story” presentation of other factors as well.

    1. 0

      Thank you for your comment. Actually, we were thinkinkg to use topogrphic data to improve the shape of the spread visualization. Looking at the surrounding objects is definetly a good idea!

  3. 0

    Have you bowed down before the here and powerful Oz? No, I’m not referring to the fictional wizard from Dorothy’s magical land. I’m talking about mass-marketed medical guru and Oprah protégé, Dr. Oz. And while he may be powerful, the scientific community is not convinced that his recommendations are all that great.

  4. 0

    Hello everyone,

    I want to testify about Luis Carlos who help me invest my bitcoin and made me who i am today, i never believe in bitcoin investment till i met Luis Carlos, I saw so many testimony about Luis Carlos helping people to invest their bitcoin, So i decide to contact him and invested $500 and after 7 days i receive my profit of $5,000 on my bitcoin wallet, Ever since i have been investing with them and i always receive my profit without any delay, so if you want to invest your bitcoin, Luis Carlos is the best trader you can invest with and getting your profit is guarantee, So if you want to invest just contact them and they will guide you on how start your investment . Email: [email protected] or WhatsApp : +14234516435

Leave a Reply