Sofia Air Pollution Case
Team BG-USA:
- Kristiyan Vachev – Bulgaria ()
- Sergey Vichev – Bulgaria ()
- Stefan Panev – Bulgaria
- Georgi Kirilov – Bulgaria
- Mike Lane – USA ()
Data Preparation
Geocoding the construction data:
The original source file can be found here. Basically, this is very very similar to geocoding as proposed in the original documentation except using Google sheets instead of google maps. The steps are as follows. (attached results for geocoding:
https://docs.google.com/spreadsheets/d/1mtWnE1289kHqo4_Kk5jrrnJEsmrYTMAC3m1VBuIhNB8/edit#gid=1094078359
)
The Geocode loop:
- Have a properly linguistically formated Address, City (optional), Country and name. By properly formatted, being in the language of the place you’re trying to geocode. For this hackathon, it needs to be either in the “English” pronunciation (aka a Romaji) or original Bulgarian spelling.
- Install Awesome Table extension for Google Sheets: https://chrome.google.com/webstore/detail/geocode-by-awesome-table/cnhboknahecjdnlkjnlodacdjelippfg?hl=en
- Click and run the geocode button with your new extension.
- Select Address is in multiple columns
- Hit Geocode
- It will come back with some address not found, click “search wider results”. This will do fuzzy matching on the address
- Create a validation step to validate the geocoded addresses latitude and longitude. (Typically google mixes up places in South America or France with places in Bulgaria) In this case, the latitude cannot be more than 22 degrees and no less than 24 degrees. This is simple if-then statement in Google Sheets.
- Create a column that extracts the word “Street” from the address line
- Use this column to filter street addresses, non-street addresses, then no address
- For street addresses replace the word street with Bulgarian word for street “ul.” then geocode
- For nonstreet address replace the word block with Bulgarian word for block then geocode
- For the rest you geocode based on only the locality and district and country name
- Finally for the ones that fail validation you use bgmaps to find the correct addresses.
A Geospatial join. Finding the height and position of every latitude and longitude in the data:
We use the python library and package suite know as QGIS. This is not supposed to be a guide or tutorial of GIS or Geospatial Joins as people literally dedicate there lives to GIS and the concept of fixing map data. What we are basically doing though is creating better more accurate map data for data modeling and analysis (where the accuracy can be controlled). Often times when manipulating lat/longitude is very easy to make mistakes in python code. These types of joins allow us to assign weights to use data modeling, regression, or even simple aggregation.
A pure mathematical representation of the GIS analysis formula we are doing:
*Deployment – optional