1. Business Understanding
Particulate matter is considered the air pollutant of greatest concern to the health of the urban population. Researches have shown that exposure to PM can lead to increased days lost from work or school, emergency room visits, hospital stays, and deaths. Both short and long-term exposures to PM can lead to the worsening of heart and lung disease. It can also cause premature death, particularly among people who have a higher risk of being affected by particle pollution.
Particulate matter can be produced from burning materials, road dust, construction, and agriculture. One of the largest sources of particulate matter is residential wood burning. Wood smoke may come from residential sources such as a fireplace or wood stove in a home, all open burning of vegetative matter or backyard burning. Other sources of particulate matter include forest fires, certain industries, furnaces, tobacco smoke, and all mobile vehicles, especially those with diesel engines. The harmful effects of tobacco smoke are well known. As a result, many countries have placed restrictions on smoking in public places.
We as real urban air quality gurus aim to predict the PM concentration in our capital city – Sofia. We trust the forecast can be used from the business for prevention. The local authorities can reduce the levels of particulate matter pollution by reducing the amount of particulate matter produced through the smoke and by reducing vehicle emissions.
Inspired by that, in this study, we focus on refined modeling for predicting daily pollutant concentrations on the basis of historical air pollution data.
2. Data Understanding
As a starting point, we focus on understanding the provided datasets which are as follows:
Official air quality measurements – https://drive.google.com/open?id=1yXn0TOke-Npd7qBPRGbHekX0nzwbmMzJ
Citizen science air quality measurements – https://drive.google.com/open?id=1LYhb9W8QhTkp0246hn7zi4lZ8v-4BwED
Meteorology data – https://drive.google.com/open?id=1m96H6gukjk8wKVFgRvtepQm_-ZmqbAzw
Topography data – https://drive.google.com/open?id=1INZrRjqqvp5axtCSl4bpvZ-ocvCYuwYa
Our analytical approach will involve the following activities:
1. Data extraction from the Primary data source as well as secondary data sources
2. Data quality check Data cleaning and data preparation
3. Study some of the variables by exploring the data
4. Study the variables for its relevance for the study
Considering the specifics of the data and its topological, geometric and geographic properties we start our data understanding journey conducting a spatial analysis.
3. Data Preparation
Good progress folks. Good code, good approach so far. Waiting to see how you approach the links between bias and meteorology :).
If you want to limit you work for a test case, perhaps focus on just 1-2 main stations (e.g. IAOS/Pavlovo and Mladost or Druzhba).
Business Understanding: the presented information is relevant and provides a concise description of the issue under study.
Data Understanding: the exhibition is well-structured and the utilized sequence of data-prep steps is well-grounded. Furthermore, the code and presentation of output is well-organized.
So far, I might say it’s a good job, guys! Looking forward to seeing your progress 🙂
You have done an excellent job with the visualisation and I liked the regression approach for adjusting the civil sensor measurements, but I think it is rather too complex. I wonder if you have done a bit more on the forcasting aspect?
I would like to see what would you do if you plan to continue working on this case… like using elevation data for city topology, not just for sensors due fact that air is rising/falling deppending on temperature and we can get temerature inversion especially in more polution areas…. but overall, excellent work and nice visual story