I have just begun my machine learning course from Andrew Ng at Coursera so I thought that this challenge would be a good test of my learnings. I apologise for the delay for article writing as I was not sure if I should have taken this challenge or not since the dataset seemed difficult to understand. After seeing a few articles around, I think I got an idea of what to do for my first step.
Week 1:
Here are things that I did for week 1 :
I imported the dataset into jupyter notebook and first thing I did was to join the the two datasets.
Then I grouped them by their geohashes and separated them into various dictionaries. Each with key as the geohash and value as the various columns of that geohash.
Then I removed the geohash column from the dataframe inside those values of the dictionary
After that, I used ‘ffill’ which is known as the forward fill for replacing 0 values from temperature, pressure and humidity. I don’t think so I should have replaced the 0 values from the temperature column but most temperature value seemed more than 0 degrees at a glance. I will change that in the future (I’m traveling now so I don’t have access to jupyter notebook)
I did this after grouping geohashes because I don’t want different geohashes’ values to get mixed up.
Now comes the hardest part which took me a lot of time and hair pulling. The visualisation.
I was using spyder all this time because it has a fantastic feature called ‘variable explorer’ which I’m a huge fan of. I wanted to visualise these data as heat map. So I started with generating a KML file. It failed miserably, then I moved on to generating a GeoJSON. That too failed horribly. Last night I was randomly searching stuff I stumbled upon a library called ‘Folium’. Lo-behold, all my problem solved !
But it required Jupyter Notebook, so I needed some time to make the switch :/
So then, I take the average of all the values in P1, P2 etc. And map them to each geohash.
That’s all for week 1. Next I’ll look into Linear regression !
8 thoughts on “Monthly Challenge – Sofia Air – Solution – [iseveryonehigh]”
Your assignments to peer review (and give feedback below the coresponding articles) for week 1 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-kiwi-team/
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-newbees/
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-kung-fu-panda/
I’d love to leave feedback if the article has any content..
I am extremely sorry for the delay !
This is my first time working with maps and ML 🙂
I think this looks like a good start!
I’m not sure (and admittedly my article doesn’t have much of this either), but I think that you may want to include more background about the project goals. The instructions (https://www.datasciencesociety.net/october-data-science-monthly-challenge/) also had some more suggestions about data cleaning that you may want to implement – things like checking for missing values and removing stations which weren’t measured in 2018.
Also, while taking care of the 0 values seems like a good idea, I’m curious what you replaced them with? I’m not familiar with ffill. Did you assign mean values for each of the “missing” data points?
I really like your heatmaps, and think that they are useful for visualizing the data before you really dive into using it.
Overall a good start!
Your assignments to peer review (and give feedback below the coresponding articles) for week 2 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/the-pumpkins/
https://www.datasciencesociety.net/data-exploration-observations-planning/
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-jacob-avila/
As per report you yet to include week 2 findings.
Your assignments to peer review (and give feedback below the coresponding articles) for week 3 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-jeremy-desir-weber/
https://www.datasciencesociety.net/sofia-air-week-1/
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-dirty-minds/
Your assignments to peer review (and give feedback below the coresponding articles) for week 4 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/sofia-air-quality-eda-exploratory-data-analysis/
https://www.datasciencesociety.net/air-sofia-pollution-case/
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-lone-fighter/