Datathons Solutions

Datathon – Sofia Air 2.0 – Solution – Wonder Gang: Articles on Particles


Techonnology and methods used:

R – tidyverse, lubridate, snakecase, sp, raster, spData, sf, leaflet, mapview, ggplot2, shiny, maps, devtools, geojsonio, rgdal, leaflet.esri, leaflet.extras

Python – s3fs, pandas, numpy, matplotlib, plotly

1. Business Understanding

Air pollution beyond the norms is a common problem in many locations. Examining the causes behind and being able to predict it would help control and reduce pollution, facilitate solving environmental issues and providing a better life to citizens.

Air pollution seems to be recurring problem in our city – Sofia. This matter can be affected by climate, topography and anthropogenic factors as well. Our aim with this project is to explain how different human activities affect the pollution and our health.

2. Data Understanding

The provided datasets consist of static data and dynamic data for 20 day in November 2016. Data is constructed as following:

  • Weather Stability data – Data on the state of the upper atmosphere from radiosonde probes of Sofia
  • Topological data – containg GPS coordinates for different points in Sofia.
  • Official meteorological data for 20 days for Sofia city
  • Industrial pollutants – 71 industrial emission devices; their location, height and annual debit of PM10.
  • Construction sites – 389 construction sights; location and types of construction
  • Household pollutants – 39944 households and questionnaire data determining what source of heating the households use.
  • Data from official measurement stations –  data from 4 stations located in Nadezhda, Hipodruma, Druzhba, Pavlovo for 20 days in November 2016.
  • Lots of Miscellaneous tables (from the Methodology of the case) which we use to calculate the pollution based on the Dispersion model.

3. Data Preparation

Firstly, we decided to determine how the Dispersion model should apply to separate pollutants. In order to do so, we should use the Weather stability and wind speed from the meteorological data. Those factors helped us to determine how the pollution from separate emission devices will spread in the atmosphere of our city. Based on Weather stability we calculated  the vertical temperature gradient in the upper atmosphere above Sofia for each of the 20 days. This helped us to determine that the Pasquill stability class of the atmosphere is type E (Slightly Stable) for the whole period.

Next step was to calculate sigmas – sigma y  and sigma z, which represent the vertical and horizontal spread of the plum rise for each pollutant.

In order to visualize how different emission devices spread the pollution we choose a radial approach. We plot separate circles with different concentration of particles around the industrial pollutants.  Those are plotted on a interactive map (.html) which you can find in our GitHub repo.

A snapshot of initial pollution with all parameters:

A snapshot of initial pollution with pollution spread in a radius of 1000 meters:

A snapshot of initial pollution with pollution spread in a radius of 10000 meters:

A snapshot of predicted pollution for 1 day ahead:

4. Modeling

To start modelling we measured the distances between industrial pollutants and official measuring stations. The data was used to define how concentration of different particles from industrial stations affects the PM10 concentrations measured in Sofia.

We succeeded to define what is the correlation between Industrial emission stations and to project what the pollution will be in the succeeding five days.

Our estimations for the industrial pollution:

Official measurement station pollution:

Due to some technical issues and inability to post our code and viz in the article you can find our work in the repo below:

Here you can find a repo with our code on R and Python.


The work continues!!! Check our progress in the repo.







Share this

Leave a Reply