1. Business Understanding
Particulate matter is considered the air pollutant of greatest concern to the health of the urban population. Researches have shown that exposure to PM can lead to increased days lost from work or school, emergency room visits, hospital stays, and deaths. Both short and long-term exposures to PM can lead to the worsening of heart and lung disease. It can also cause premature death, particularly among people who have a higher risk of being affected by particle pollution.
Particulate matter can be produced from burning materials, road dust, construction, and agriculture. One of the largest sources of particulate matter is residential wood burning. Wood smoke may come from residential sources such as a fireplace or wood stove in a home, all open burning of vegetative matter or backyard burning. Other sources of particulate matter include forest fires, certain industries, furnaces, tobacco smoke, and all mobile vehicles, especially those with diesel engines. The harmful effects of tobacco smoke are well known. As a result, many countries have placed restrictions on smoking in public places.
We as real urban air quality gurus aim to predict the PM concentration in our capital city – Sofia. We trust the forecast can be used from the business for prevention. The local authorities can reduce the levels of particulate matter pollution by reducing the amount of particulate matter produced through the smoke and by reducing vehicle emissions.
Inspired by that, in this study, we focus on refined modeling for predicting daily pollutant concentrations on the basis of historical air pollution data.
2. Data Understanding
As a starting point, we focus on understanding the provided datasets which are as follows:
Official air quality measurements – https://drive.google.com/open?id=1yXn0TOke-Npd7qBPRGbHekX0nzwbmMzJ
Citizen science air quality measurements – https://drive.google.com/open?id=1LYhb9W8QhTkp0246hn7zi4lZ8v-4BwED
Meteorology data – https://drive.google.com/open?id=1m96H6gukjk8wKVFgRvtepQm_-ZmqbAzw
Topography data – https://drive.google.com/open?id=1INZrRjqqvp5axtCSl4bpvZ-ocvCYuwYa
Our analytical approach will involve the following activities:
1. Data extraction from the Primary data source as well as secondary data sources
2. Data quality check Data cleaning and data preparation
3. Study some of the variables by exploring the data
4. Study the variables for its relevance for the study
Considering the specifics of the data and its topological, geometric and geographic properties we start our data understanding journey conducting a spatial analysis.
3. Data Preparation